Data Scientist Salary & Skill Analysis
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset offers valuable insights into the data scientist job market in the United States. It was created by scraping job postings specifically for 'Data Scientist' positions from Glassdoor. Initially containing over 1000 records, the raw data was cleaned and simplified, reducing it to 742 unique entries by removing duplicates and performing further refinements. The dataset is suitable for data analysis and modelling, providing a focused view of job requirements and compensation within the data science field. The creation of this dataset was inspired by notable work in data scraping and cleaning methodologies.
Columns
The dataset contains 42 columns, each detailing a specific aspect of data scientist job postings:
- index: A numerical identifier for each record.
- Job Title: The specific title of the job, such as 'Data Scientist', 'Junior Data Scientist', or 'Senior Data Scientist'.
- Salary Estimate: An estimated salary range for the position, including the source of the estimate.
- Job Description: A detailed explanation of the required qualities and expected duties for the job.
- Rating: The company's rating.
- Company Name: The name of the company posting the job.
- Location: The geographic location of the job.
- Headquarters: The location of the company's main office.
- Size: The estimated range of employees working for the company.
- Founded: The year the company was established.
- Type of ownership: Indicates whether the company is private, public, or government-owned.
- Industry: The industry sector of the company, e.g., IT or Pharmaceutical.
- Sector: The broader economic sector in which the company operates.
- Revenue: The total annual revenue of the company.
- Competitors: A list of the company's current competitors.
- Hourly: A binary indicator (1 or 0) specifying if the salary was reported as an hourly rate.
- Employer provided: A binary indicator (1 or 0) specifying if the salary was provided directly by the employer.
- Lower Salary: The minimum salary reported for the job within that company.
- Upper Salary: The maximum salary reported for the job within that company.
- Avg Salary(K): The average salary reported for the job in the company, in thousands.
- company_txt: A text field for the company name.
- Job Location: The cleaned state of the job's location.
- Age: The age of the company in years.
- Python: A binary indicator (1 or 0) if Python skill is required.
- spark: A binary indicator (1 or 0) if Spark skill is required.
- aws: A binary indicator (1 or 0) if AWS skill is required.
- excel: A binary indicator (1 or 0) if Excel skill is required.
- sql: A binary indicator (1 or 0) if SQL skill is required.
- sas: A binary indicator (1 or 0) if SAS skill is required.
- keras: A binary indicator (1 or 0) if Keras skill is required.
- pytorch: A binary indicator (1 or 0) if PyTorch skill is required.
- scikit: A binary indicator (1 or 0) if Scikit-learn skill is required.
- tensor: A binary indicator (1 or 0) if TensorFlow skill is required.
- hadoop: A binary indicator (1 or 0) if Hadoop skill is required.
- tableau: A binary indicator (1 or 0) if Tableau skill is required.
- bi: A binary indicator (1 or 0) if Power BI skill is required.
- flink: A binary indicator (1 or 0) if Flink skill is required.
- mongo: A binary indicator (1 or 0) if MongoDB skill is required.
- google_an: A binary indicator (1 or 0) if a Google Analytics certificate is required.
- job_title_sim: A simplified version of the job title.
- seniority_by_title: Indicates the seniority level based on the job title.
- Degree: Indicates if a Master's (M) or PhD (P) degree is required or preferred based on experience years.
Note: A value of -1 in a column often indicates that the data scraping was unsuccessful or the information was not present.
Distribution
This dataset is provided in CSV format, specifically as
data_cleaned_2021.csv
, and has a file size of 3.12 MB. It contains 742 records (rows) and 42 columns. The original raw data consisted of 1000 records before duplicate removal and simplification processes.Usage
This dataset is ideal for:
- Data analysis and modelling: To identify trends in data scientist salaries and job requirements.
- Job market insights: Understanding the demand for specific skills in the data science field.
- Career planning: Helping aspiring data scientists identify required skills and potential salary expectations.
- Recruitment strategy: Assisting recruiters in benchmarking salaries and understanding common job criteria.
- Academic research: Analysing employment trends in the tech sector.
Coverage
The dataset's geographic scope is limited to the United States, as job postings were scraped specifically from Glassdoor in the USA. While a specific time range is not explicitly given beyond the file name
data_cleaned_2021.csv
, the dataset reflects job postings likely collected around 2021. It focuses exclusively on 'Data Scientist' positions, and information for certain fields might be missing, denoted by a '-1' value. The dataset is expected to be updated annually.License
CC0: Public Domain
Who Can Use It
- Data Scientists and Analysts: For market research, salary benchmarking, and skill gap analysis.
- Job Seekers: To understand salary expectations, required skills, and popular job locations for data science roles.
- Recruiters and HR Professionals: For talent acquisition strategies, salary setting, and identifying key qualifications.
- Academic Researchers: Studying labour market dynamics and skill requirements in the tech industry.
- Students: To guide their learning path by identifying in-demand skills for data science careers.
Dataset Name Suggestions
- US Data Scientist Job Market Trends
- Glassdoor Data Science Jobs (USA)
- Data Scientist Salary & Skill Analysis
- United States Data Science Employment Data
- Data Scientist Career Insights US
Attributes
Original Data Source: Data Scientist Salary & Skill Analysis