Predicting Salary based on Stackoverflow Survey Data

Predicting Salary based on Stackoverflow Survey Data

Image source: Self made

Summary

Last time we analysed the job satisfaction of the stackoverflow survey data, and we can't get any good models out of it. This time we try the salary. If you read the first post already (that you can find here: stackoverflow job statisfaction ) - you may find some informations twice - sorry.

Problem Domain

We will be talking about job salary, with a focus on coding oriented jobs. Salaries have several requirements that have to be taken into account:

  • Years of Experience
  • Education
  • Company Size
  • Frameworks / Tools / Languges used comparision to somebody else doing a similar job
  • Location
  • and even more.

All this will get represented by a single "number" (we wil be using anual USD). Its not easy to split the salary into the parts that give a more objective measurement and comparision with others.

Stack Overflow

Thanks to the stackoverflow and their community, they release every year a survey about the tech stack somebody is using, where they are working from and how much salary they get.

Problem Statement

Job Salary is always a big topic in the business platforms as LinkedIn and the popular XING (in german speaking region). For both services you have to may an monthly subscription in order to get the required information you are interisted in.

Salary was and will always be a strong argument to search or change a job. On the opposite side, not many people want to share this informations, as they prefer to have no information before they share their salary in exchange.

At the end of this blog post, you wil find a section where you can input your data and get the predicted salary back. This application will get the features as input using the html formular and return the prediction without doing any api calls further

Privacy

All potencial privat information will stay on the browser, so you don't have to worry about the data. I trainined using tensorflow 2.0 with the keras api und use tensorflow js to load the trained model and do predictions. I plan to write a blog post in more detail how to use tensorflow js.

Predicting Salary app

Stackoveflow capstone project by Darius Murawski Welcome to your salary prediction. This data is based on the stackoverflow survey of 2020 - If you are interested in the source code, stay tuned! Pick NA when you don't want to give an answer

CompFreq

Is that compensation weekly monthly or yearly?

CurrencySymbol

Which currency do you use day-to-day? If your answer is complicated please pick the one you're most comfortable estimating in.

CurrencyDesc

Which currency do you use day-to-day? If your answer is complicated please pick the one you're most comfortable estimating in.

NEWDevOpsImpt

How important is the practice of DevOps to scaling software development?

NEWDevOps

Does your company have a dedicated DevOps person?

NEWOnboardGood

Do you think your company has a good onboarding process? (By onboarding we mean the structured process of getting you settled in to your new role at a company)

Country

Where do you live?

PurchaseWhat

What level of influence do you personally have over new technology purchases at your organization?

NEWEdImpt

How important is a formal education such as a university degree in computer science to your career?

NEWJobHunt

In general what drives you to look for a new job? Select all that apply.

JobFactors

Imagine that you are deciding between two job offers with the same compensation benefits and location. Of the following factors which 3 are MOST important to you?

NEWJobHuntResearch

When job searching how do you learn more about a company? Select all that apply.

NEWCollabToolsWorkedWith

Which collaboration tools have you done extensive development work in over the past year and which do you want to work in over the next year? (If you worked with the tool and want to continue to do so please check both boxes in that row.)

WorkWeekHrs

On average how many hours per week do you work? Please enter a whole number in the box.

YearsCodePro

NOT including education how many years have you coded professionally (as a part of your work)?

YearsCode

Including any education how many years have you been coding in total?

Improvements

I picked the most relevant 15 features to have a good model and not much data the user have to add.

You can add more features like I did do have even more columns available for training a model, but remember that this will also increase the memory and runtime requirement for your model.

As the model was good, I rejected in using historical data as well or do any hyperparameter tuning.

Feedback

Like or don't like what you got? Lets have a chat and check if we can improve it!

Code

When you want to train the model on your side without the embeding of the blog, checkout the source code on my github repository