Machine Learning Community Profile Data
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
A detailed view of the state of data science and machine learning across the industry. It combines data gathered from annual industry-wide surveys conducted over four consecutive years, allowing users to track the evolution and changing profile of Kagglers from 2017 through to 2020. The collection was acquired and cleaned by the Kaggle Team, and subsequently merged into a single file for longitudinal analysis.
Columns
The dataset contains 12 columns detailing respondent characteristics and professional context:
- index: A unique identifier key for each record.
- Age: The age of the respondent (with the 25–29 bracket being the most frequently reported).
- Gender: The self-identified gender of the respondent (81% of respondents are Male).
- Country: The respondent’s country of residence (India is the most represented country, followed by the United States of America).
- Degree: The highest degree attained by the respondent (Master’s degree is the most common, held by 42% of respondents).
- Job Title: The professional occupation of the respondent (Student and Data Scientist are the most frequently listed titles).
- Company Size: The number of employees in the respondent’s organisation.
- Team Size: The size of the respondent’s immediate team.
- ML Status in Company: A description of the status of Machine Learning adoption within the respondent’s company.
- Compensation Status: Details regarding the salary or compensation of the respondent.
- Money Spent: The amount of money spent on ML products by the respondent’s company.
- Year: The year in which the specific survey record was conducted (ranging from 2017 to 2020).
Distribution
The data is delivered in a tabular structure, typically in CSV format. The primary file, named
kaggle_survey_17_20_v2.csv, is approximately 10.43 MB in size. It contains over 80,000 valid records representing the survey responses across the merged years.Usage
This resource is ideally suited for Data Visualization and Exploratory Data Analysis. It can be used to study macro-level shifts within the data science job market, compare professional demographics over time, and analyse global participation in the Machine Learning community.
Coverage
The data spans a time range from 2017 to 2020. Geographically, responses cover a wide global distribution, with 72 unique countries represented. Demographically, the scope includes detailed information on the age, gender, educational background, and professional environment (including team size and ML adoption status) of thousands of survey participants.
License
CC0: Public Domain
Who Can Use It
Intended users include data scientists, academic researchers, and students interested in quantitative socio-economic trends. It is particularly useful for those needing to benchmark skills, track compensation trends, or gain insight into the adoption rate of machine learning practices globally.
Dataset Name Suggestions
- Kaggle Survey 2017-2020 Merged Data
- Global Data Science Industry Trends (2017-2020)
- Machine Learning Community Profile Data
- The Evolution of Kagglers
Attributes
Original Data Source: Machine Learning Community Profile Data
Loading...
