Opendatabay APP

Machine Learning Community Profile Data

Data Science and Analytics

Tags and Keywords

Survey

Kaggle

Machine

Science

Trends

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Machine Learning Community Profile Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

A detailed view of the state of data science and machine learning across the industry. It combines data gathered from annual industry-wide surveys conducted over four consecutive years, allowing users to track the evolution and changing profile of Kagglers from 2017 through to 2020. The collection was acquired and cleaned by the Kaggle Team, and subsequently merged into a single file for longitudinal analysis.

Columns

The dataset contains 12 columns detailing respondent characteristics and professional context:
  • index: A unique identifier key for each record.
  • Age: The age of the respondent (with the 25–29 bracket being the most frequently reported).
  • Gender: The self-identified gender of the respondent (81% of respondents are Male).
  • Country: The respondent’s country of residence (India is the most represented country, followed by the United States of America).
  • Degree: The highest degree attained by the respondent (Master’s degree is the most common, held by 42% of respondents).
  • Job Title: The professional occupation of the respondent (Student and Data Scientist are the most frequently listed titles).
  • Company Size: The number of employees in the respondent’s organisation.
  • Team Size: The size of the respondent’s immediate team.
  • ML Status in Company: A description of the status of Machine Learning adoption within the respondent’s company.
  • Compensation Status: Details regarding the salary or compensation of the respondent.
  • Money Spent: The amount of money spent on ML products by the respondent’s company.
  • Year: The year in which the specific survey record was conducted (ranging from 2017 to 2020).

Distribution

The data is delivered in a tabular structure, typically in CSV format. The primary file, named kaggle_survey_17_20_v2.csv, is approximately 10.43 MB in size. It contains over 80,000 valid records representing the survey responses across the merged years.

Usage

This resource is ideally suited for Data Visualization and Exploratory Data Analysis. It can be used to study macro-level shifts within the data science job market, compare professional demographics over time, and analyse global participation in the Machine Learning community.

Coverage

The data spans a time range from 2017 to 2020. Geographically, responses cover a wide global distribution, with 72 unique countries represented. Demographically, the scope includes detailed information on the age, gender, educational background, and professional environment (including team size and ML adoption status) of thousands of survey participants.

License

CC0: Public Domain

Who Can Use It

Intended users include data scientists, academic researchers, and students interested in quantitative socio-economic trends. It is particularly useful for those needing to benchmark skills, track compensation trends, or gain insight into the adoption rate of machine learning practices globally.

Dataset Name Suggestions

  • Kaggle Survey 2017-2020 Merged Data
  • Global Data Science Industry Trends (2017-2020)
  • Machine Learning Community Profile Data
  • The Evolution of Kagglers

Attributes

Listing Stats

VIEWS

6

DOWNLOADS

1

LISTED

22/11/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in CSV Format