Archinect
anchor

Current Salary Poll Information in a Google Doc

bregnier

I'm doing a data visualization project for a CS course where we had to use screen scraping to collect the data, and this was, for some masochistic reason, the first source I could think of.

I've collected the data in the form as of 2/7/11 and put it into a .csv file available at

https://spreadsheets.google.com/ccc?key=0Al3TBSU6eD_AdDJfS2czX0t6STZxRzYxWnlEb3FCcHc&hl=en

There is also a cleaned up set of simplified data (and scatterplot) of just the US entries at

http://www-958.ibm.com/software/data/cognos/manyeyes/visualizations/architect-salaries-by-age

FWIW, the data that exists is _very_ messy, particularly the location and the salary entries. It wouldn't be that hard to change the form to make the data cleaner (force a numeric answer on salary, fields for city, state, country, currency). Would make it a lot easier to graph!

If any official Archinector is offended by my transcribing the data into a tabular format, please let me know. This is for a school project and I have no intention of presenting this data for profit (don't really know how that might be done...). I've also made pains to credit Archinect whenever I can.

 
Feb 8, 11 8:23 pm
citizen

Ben,

That's quite a project, and very interesting. You should know that there's a longstanding cry for Archinect.com to standardize the salary survey questionnaire for just this purpose.

Thank you for posting.

Feb 8, 11 9:21 pm  · 
 · 
Ms Beary

Wow!

Feb 9, 11 11:35 am  · 
 · 
CMNDCTRL

ok...who are these 23 year olds making 320k? these sound suspicious since there is no one making that much at an OLDER age. besides that, thanks for the info, bregnier! interesting to see.....


it is too bad there is not a 0 list for the unemployed, though.

Feb 9, 11 11:49 am  · 
 · 
toasteroven

this is impressive - I'd be curious how this starts to break down by region and/or by year of report.

Feb 9, 11 11:52 am  · 
 · 
bregnier

Yeah... only noticed those six-figure twenty year olds after I'd made the plot. A bit doubtful to say the least. I tried to remove all of the "0" answers as it's not really a true survey and thus proportions are off. What I really wanted to do was show disparities between men and women but the ManyEyes scatter plot tool doesn't differentiate between data sets with color or shape.

toasteroven, I was thinking of doing some sort of heatmap for regions, but the location section is incredibly messy -- it took me hours just to filter out the US entries (didn't want to deal with currency conversion or cost of living adjustment).

The next iteration of the survey definitely needs to have some in-form data control. For instance, the fact that the salary entry is just a text field is ridiculous... if someone enters 35 does this mean $35.000 or $35/hour?

I'd estimate it would take about 2-3 days to clean up all of the data to get all of it up to spec and delete the uncertain or bullshit entries. If anyone wants to take that on be my guest. (Google Refine is an awesome tool if you're interested.)

Feb 9, 11 12:09 pm  · 
 · 
sectionalhealing

this is awesome - great job!

by my (very rough) calculations, the line of best fit is:
2.75 x age - 30 = salary in thousands

for example:
2.75 x (25 years old) - 30 = $38,750
2.75 x (30 years old) - 30 = $52,500
2.75 x (50 years old) - 30 = $107,500


and yes, who are these 23-25 year olds making $300,000+?

Feb 9, 11 12:17 pm  · 
 · 
Ms Beary

Ignore the outliers, concentrate on evaluating the valuable data.

Feb 9, 11 12:20 pm  · 
 · 
bregnier

While that trend line looks pretty accurate, it's also important to note the wide spread of the data, regardless of age. Anybody want to calculate the standard deviation?

Feb 9, 11 12:36 pm  · 
 · 

Wow - that's a lot of work and makes the salary poll data, even if not perfect, EVER so much more usable!

Do we need to mass-email your teacher and tell him/her to give you an A? (just kidding)

Feb 9, 11 12:49 pm  · 
 · 
Rusty!

One factor that jumps out at me (after looking at the spreadsheet) is how many people chose to participate in salary poll pre- and post-recession. Unwillingness to share bad news about yourself is telling.

Which brings us to the biggest problem of charting this data: the recession has broken any meaningful patterns that could have been extracted from this model.

Back when I was 23, I was making $320k, now I hardly make half of that. I don't need 'science' to tell me things kind of suck.

Feb 9, 11 1:20 pm  · 
 · 
quizzical

A very interesting exercise, although there's clearly some "issues" with the accuracy of the data ...

For example, if one looks at some of the more suspicious outliers in the CSV raw data and compares the pay figures with the 'explanation' field, one finds the following examples:

$418,000 - Unlicensed. 10 years experience. No Health. 1% Matching IRA. Underpaid

$350,000 - B.ARCH. 1 yr. exp.. health. no paid OT

$340,000 - MArch No Experience

$320,000 - Intern no benefits 2 years exp

Looks to me like these sorts of entires have a missing decimal about one digit from the right -- then, they might appear more reasonable.

Nevertheless, a very useful way to view data ... thanks for doing this work.

Feb 9, 11 1:42 pm  · 
 · 
sectionalhealing

not to threadjack, but what software are you guys using for creating vector/illustrator compatible graphs from raw data?

Feb 9, 11 1:58 pm  · 
 · 
bregnier

IMO if you're looking for power, flexibility, and interactivity, you can't beat the Processing language (processing.org). Protovis for java is also good. If you happen to have a license of mathematica, that's a graphics powerhouse that doesn't require as much programming (although still needs a lot of math).

The ManyEyes tool above is also good for quick visualization, but lacks flexibility.

Feb 9, 11 2:05 pm  · 
 · 

quizzical, I think you're right that those people just misplaced their comma.

Feb 9, 11 3:00 pm  · 
 · 
Rusty!

Or they might have been paid in pesos. Either way, I need to see a pie chart of this information to pass judgement. I find charts confusing.

Feb 9, 11 4:48 pm  · 
 · 
bregnier
http://dilbert.com/strips/comic/2009-03-07/
Feb 9, 11 4:57 pm  · 
 · 
St. George's Fields

I was about to say.

How do we know these people aren't foreigners who use a dollar system but not necessarily the American dollar?

$10,000 USD is equal to ~$80,000 HKD (Hong Kong).

So, $50,000 USD entry-level position in Hong Kong would pay roughly $500,000 HKD a year.

Feb 9, 11 5:00 pm  · 
 · 
Rusty!
"Also, many foreign people who use the internet but might english having to be speak skills so good.."

Oh the sweet, sweet ironing.

Feb 9, 11 5:05 pm  · 
 · 
St. George's Fields

Someone deleted that post?!?!?

Feb 9, 11 5:08 pm  · 
 · 
bregnier

To strix:
The scatterplot data is US respondents only. Some of the international entries do have unclear currency designation. It is possible that some of the outports could even be my fault as the result of my methods for cleaning the data-- I tried to be careful but I am no expert at this.

Feb 9, 11 5:14 pm  · 
 · 
bregnier

That should say outLIERs. Missed the autocorrect. Should never make forum posts whilst teaching my screaming twin two year olds the value of sharing.

Feb 9, 11 5:18 pm  · 
 · 
Rusty!

Whoa. Whoever deleted Stixies embarrassing post, bring it back. We may be illiterate but we're still #1.

USA! USA!

Feb 9, 11 5:18 pm  · 
 · 

Block this user


Are you sure you want to block this user and hide all related comments throughout the site?

Archinect


This is your first comment on Archinect. Your comment will be visible once approved.

  • ×Search in: