HyperAI

Kaggle's Annual Report Is Out: Data Scientists Are Young and Wealthy, With Salaries Approaching One Million

4 years ago
Big factory news
Headlines
神经小兮
特色图像

As 2020 is coming to an end, Kaggle released its annual survey report "Machine Learning and Data Science 2020", showing us the current group portrait of data scientists.

Kaggle, a data analysis competition platform, recently conducted a survey on platform users, covering multiple dimensions such as basic information of practitioners, salary levels, and work experience.

After cleaning the feedback from 20,036 Kaggle users, Kaggle compiled this report for 13% (2,675) respondents, all of whom are working data scientists or other positions supporting data science and machine learning.

Kaggle took one month to complete this survey report

At present, the latest survey report has been released. From this report, we can get a glimpse of the current picture of machine learning and data science practitioners, as well as the employment and capital investment of related companies in this field, and get a glimpse of the latest development trends in the industry.

Note: In this survey, there is no data on Chinese data scientists. After research, Super Neuro found that Kaggle has a rule in the survey participation rules that reads:

To be eligible for the prizes in this survey, you must:

18 years of age or older or the citizenship age of the country where you are located;

I am not a national of Armenia, Cuba, Iran, Syria, North Korea, or Sudan;

Not a representative of a person or entity subject to U.S. export controls or sanctions.

It is clearly stated that the above countries, entity representatives and individuals are not eligible for the award

Since 2018, the United States has included more than 200 Chinese companies and 13 universities in the "Entity List" for export control or sanctions. We have compiled the list of these universities and some companies in the fields of technology and artificial intelligence as follows:

enterprise

Beijing Computational Science Research Center, Beijing Cloud Computing Center, SMIC, Dahua Technology, Hikvision, iFlytek, Megvii Technology, SenseTime, Yitu Technology, CloudWalk Technology, Intellifusion Technologies Co., Ltd., NetPosa Technology Co., Ltd., Beijing CloudMinds, Qihoo 360 Technology Co., Ltd., Xiamen Meiya Pico Information Co., Ltd., Yixin Technology, 38 Huawei subsidiaries, the 30th Institute of China Electronics Technology Group Corporation, the 7th Institute of China Electronics Technology Group Corporation, and Wuxi Jiangnan Institute of Computing Technology.

Colleges and universities

Beijing University of Aeronautics and Astronautics, Renmin University of China, National University of Defense Technology, Hunan University, Harbin Institute of Technology, Harbin Engineering University, Northwestern Polytechnical University, Xi'an Jiaotong University, University of Electronic Science and Technology of China, Sichuan University, Tongji University, Guangdong University of Technology, and Nanchang University.

In other words, as long as your school or company is on the Entity List, you can fill out the questionnaire, but you cannot participate in the award review. Although there is no further background check or statement, Kaggle did refuse many Chinese people by name.

Report Concise Version: Group Portrait of Data Scientists

Gender, age and education distribution 

  • There are more men than women working in this field, with a male to female ratio of about 5:1.
  • 35 years old is a watershed, most respondents are younger than 35 years old
  • More than half of the respondents hold a postgraduate degree

Education and work experience 

  • Most data scientists continue to learn new technologies after graduation
  • Most data scientists have less than 10 years of programming experience
  • More than half of data scientists have less than three years of experience in machine learning
  • Data scientists living in the United States earn significantly more than their counterparts in other countries.

Technology-related surveys 

  • More data scientists are using cloud computing than in 2019
  • Scikit-Learn is the most used machine learning tool, used by 4/5 data scientists
  • Tableau and PowerBI are the most popular business intelligence tools

Mostly male, with a master's degree per capita, India dominates the list

Gender: More than 80% are male 

There is still a huge gender imbalance among data scientists, with more than 80% being men.

Last year's survey showed that 84% were male, and this year the proportion has changed very little

Age: A large number of people born after 1995 have joined 

Data scientists are generally in their twenties or early thirties, ranging from 22 to 34 years old.Only one in five professional data scientists is over 40 years old.

Data scientists are between 25 and 34 years old

There are signs that data scientists are getting younger as “Generation Z” becomes more involved, with nearly 7%’s data scientists now aged 18-21.

This is an increase compared to last year’s 5%, so it is foreseeable that this group will become younger and younger in the future.

Country: India and the United States top the list 

Among the data scientists who participated in Kaggle's annual survey, Indian data scientists accounted for 22%, while the United States accounted for 14.5%, both far exceeding Brazil, which ranked third (less than 5%).

The report did not explicitly include China due to various factors, but the number of Others ranked third is not low, perhaps because it is intended to include Chinese users in the valid statistical results.

India and the United States have a clear advantage in the number of data scientists

Education: Graduate degree is standard 

The survey shows that, as in previous years, a graduate degree remains the norm for data scientists.More than 68%’s data scientists hold a master’s or doctoral degree.Fewer than 5%’s data scientists have no education above high school.

More than half of data scientists have a master’s degree

  Learning platforms: Coursera and Udemy are most commonly used 

Data science and machine learning are changing rapidly, so more than 90% of the respondents will continue to learn. Among them, about 30% chose traditional higher education courses, and more people learned through online resources.

In this survey,Coursera , Udemy , and Kaggle Learn are the most common learning platforms.

Many people learn on more than one platform,The survey shows that they choose 2.8 platforms per person.

Programming experience: Most have many years of programming experience 

Among the respondents,Most data scientists have at least a few years of programming experience.In fact, more than 8% data scientists have started programming since the last century, that is, at least 20 years ago, while less than 2% data scientists claim to have never written code.

Globally, American data scientists have much more programming experience. In the United States, 37% people have been engaged in programming for more than 10 years, while globally, this proportion is only 22%.

Programming experience is important for data scientists

Machine learning experience: Most are new to machine learning 

Among the respondents,Most data scientists are new to machine learning.Fewer than 6% of professional data scientists have been using machine learning for 10 years or more.

More than half of data scientists,Less than three years of experience in machine learning

Salary level: The most competitive in the United States 

Data scientists earn a very competitive salary.Data scientists in the United States have the highest salaries, averaging $120,000 to $150,000 (approximately RMB 780,000 to RMB 980,000).

Although India has a large number of data scientists, their income is not high. Nearly 90% Indian data scientists earn less than $50,000 per year, ranking only sixth in the global data scientist salary ranking.

Median salary for data scientists around the world

What integrated development environments do they use?

The report shows that JupyterLab IDE remains the tool of choice for data scientists, with about three-quarters of data scientists using it.However, this number is down from 83% last year. Visual Studio Code ranked second, just above 33%.

  What machine learning frameworks do they use?

Python-based machine learning libraries still dominate. Among them,Scikit-learn is a super weapon that works for most projects and ranks #1, with 4 out of 5 data scientists using it.

In the survey, TensorFlow and Keras are used by 50% data scientists respectively.

Among them, Xgboost created by Dr. Chen Tianqi in China ranked fourth.

Data scientist has become a hot position. Do you want to join it?

Since 2016, Kaggle has conducted such a survey every year, allowing us to see a clearer three-dimensional portrait of machine learning and data science practitioners, as well as the development trends in this field.

In the era of big data, the demand for data scientists has exploded. At the same time, the broad development prospects and generous salaries have made data scientists a dream career for many people.

According to Google Trends, in the past decade,Interest in data scientist roles surges

However, from the survey report of Kaggle, we can see that the profession of data scientist has become younger and younger, and their education level is getting higher and higher. Therefore, if you want to join this track, you will face great competition.

Kaggle report:

https://storage.googleapis.com/kaggle-media/surveys/Kaggle%20State%20of%20Machine%20Learning%20and%20Data%20Science%202020.pdf

-- over--