Opendatabay APP

Binary Salary Classification Data

Data Science and Analytics

Tags and Keywords

Salary

Prediction

Classification

Income

Demographic

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Binary Salary Classification Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This collection of records supports the modelling and prediction of income levels based on individual attributes such as age, work specifics, education, and financial activity. It serves as an excellent benchmark for binary classification tasks, particularly those aiming to identify feature importance in socio-economic data analysis.

Columns

  1. age: The age of the individual, ranging from 17 to 90.
  2. workclass: Indicates the type of employment setting or where the individual works (e.g., Private, Self-emp-not-inc).
  3. fnlwgt: A quantitative variable generally referred to as 'where' in the context description, with a mean of 190k.
  4. education: Describes the highest level of education attained (e.g., HS-grad, Some-college).
  5. education-num: A numerical representation of the education level, with a mean value of 10.1.
  6. marital-status: Provides information about the individual's marital situation (e.g., Married-civ-spouse, Never-married).
  7. occupation: Details the specific work or job held (e.g., Prof-specialty, Craft-repair).
  8. relationship: Describes the individual’s role in the family or household (e.g., Husband, Not-in-family).
  9. race: Classification of culture/race (e.g., White, Black).
  10. sex: Gender of the individual (Male or Female).
  11. capital-gain: Financial gain recorded, spanning from 0 up to 100.0k.
  12. capital-loss: Financial loss recorded, spanning from 0 up to 4356.
  13. hours-per-week: The number of hours worked per week, with a mean of 40.4.
  14. native-country: The individual's country of origin, with 90% being the United States.
  15. salary: The binary classification target variable, indicating salary bands of $<=50K$ or $>50K$.

Distribution

The data is contained within a file named salary.csv, with a size of 3.84 MB. It is typically distributed in a structured format suitable for analysis. It consists of 15 distinct columns and 32.6 thousand valid records. The data quality is high, reporting zero mismatched or missing values across all records, ensuring reliability for modelling. Updates are expected to occur monthly.

Usage

This data is perfectly suited for developing predictive models, particularly in the domain of binary classification, focusing on salary prediction. It can be used for:
  • Training and evaluating machine learning algorithms, such as decision trees, for income bracket classification.
  • Conducting feature importance studies to determine which socio-economic factors most significantly impact salary.
  • Educational exercises in data science and statistical analysis related to labour markets.
  • Regression problem formulation focusing on predicting numerical outcomes related to income.

Coverage

The demographic scope is broad, covering ages from 17 to 90. Detailed attributes are available for marital status, education level, occupation, and working hours. Geographically, the data shows a strong concentration in the United States (90% of records), with Mexico being the second largest represented country (2%). Demographically, the data is predominantly male (67%) and White (85%). There is no specified time range information available in the provided data sample.

License

CC0: Public Domain

Who Can Use It

  • Machine Learning Practitioners: For creating, tuning, and benchmarking binary classification models aimed at socio-economic prediction.
  • Academic Researchers: Those studying income inequality, labour economics, and the statistical relationship between demographic features and earning potential.
  • Data Science Educators: To provide robust, clean data for teaching concepts related to classification, regression, and feature engineering.

Dataset Name Suggestions

  • Income Predictor Attributes
  • Binary Salary Classification Data
  • Labour Force Income Demographics
  • Socio-Economic Salary Predictor

Attributes

Original Data Source: Binary Salary Classification Data

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

07/10/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format