Opendatabay APP

Adult Income Prediction Dataset

Data Science and Analytics

Tags and Keywords

Income

Census

Demographics

Classification

Employment

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Adult Income Prediction Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is designed for a binary classification problem focused on predicting whether an individual's income is above or below a specified threshold. It includes a variety of demographic and employment attributes such as age, education level, occupation, and hours worked per week. The objective is to construct a predictive model that categorises individuals into 'over threshold' or 'under threshold' income brackets. This task is valuable for understanding the key determinants of income levels and can inform targeted financial planning or policy development initiatives.

Columns

  • age: The age of the individual, ranging from 17 to 90 years, with a mean of 38.7 years.
  • workclass: Categorises individuals based on the nature of their employment. Common categories include 'Private' (69%) and 'Self-emp-not-inc' (8%). There are 9 unique workclass types.
  • final_weight: A numerical weighting factor associated with each individual, with values ranging from approximately 12,300 to 1.49 million, a mean of 190,000, and a standard deviation of 106,000.
  • education: Describes the type of education received by a person. 'HS-grad' is the most common (32%), followed by 'Some-college' (22%). There are 16 unique education types.
  • education-num: A numerical representation indicating the education level, with values from 1 to 16, a mean of 10.1, and a standard deviation of 2.57.
  • marital_status: Indicates the marriage status of an individual. 'Married-civ-spouse' is the most frequent (46%), followed by 'Never-married' (33%). The dataset contains 7 unique marital statuses.
  • occupation: Specifies the type of occupation held by an individual. 'Prof-specialty' and 'Exec-managerial' are the most common, both at 13%. There are 15 unique occupation types.
  • relationship: Refers to the individual's role within a household or family structure. 'Husband' is the most common role (40%), followed by 'Not-in-family' (26%). There are 6 unique relationship types.
  • race: Categorises individuals based on their racial or ethnic backgrounds. 'White' represents 86% of the dataset, with 'Black' at 10%. There are 5 unique racial categories.
  • sex: Refers to the biological sex of an individual. 'Male' accounts for 67% and 'Female' for 33%. There are 2 unique categories.
  • capital_gain: A numerical attribute representing capital gains, with a mean of approximately 1,090 and a standard deviation of 7,520. Most entries (34,789 out of 36,600) have a capital gain of 0.
  • capital_loss: A numerical attribute representing capital losses, with a mean of 85.9 and a standard deviation of 400. Most entries (34,958 out of 36,600) have a capital loss of 0.
  • hours_per_week: The number of hours worked per week, ranging from 1 to 99, with a mean of 40.4 and a standard deviation of 12.4. The most frequent entry is 40 hours per week (18,082 individuals).
  • native_country: The country of origin for the individual. 'United-States' is the most common (90%), followed by 'Mexico' (2%). There are 42 unique native countries.
  • threshold: The binary target variable for income prediction, indicating whether income is 'over' (1) or 'under' (0) the specified threshold. 27,866 individuals are 'under threshold' (0), and 8,765 are 'over threshold' (1).

Distribution

The dataset is provided in a CSV format, named adult_tr.csv, with a file size of 3.82 MB. It consists of 15 columns and 36,600 records. All columns have 100% valid entries, with no mismatched or missing data identified.

Usage

This dataset is ideal for:
  • Developing and evaluating machine learning models for binary income classification.
  • Conducting socioeconomic analysis to understand the influence of various demographic and employment attributes on income levels.
  • Informing financial planning strategies by identifying factors associated with different income brackets.
  • Supporting policy-making initiatives aimed at addressing income disparities or promoting economic welfare.

Coverage

The dataset covers demographic attributes such as age, education, marital status, relationship, race, and sex, alongside employment attributes including workclass, occupation, hours worked per week, capital gains, and capital losses. Geographically, it primarily focuses on individuals from the United-States (90%), with a smaller representation from Mexico (2%) and various other countries (8%). A specific time range for data collection is not detailed in the available sources.

License

CC0: Public Domain

Who Can Use It

  • Data Scientists and Machine Learning Engineers: For building, training, and testing income prediction models.
  • Social Scientists and Economists: To research and analyse socioeconomic factors influencing income distribution.
  • Financial Planners: To understand income determinants for advising clients or developing financial strategies.
  • Policy Makers: To formulate policies aimed at economic development, social welfare, or income equality.

Dataset Name Suggestions

  • Adult Income Prediction Dataset
  • Socioeconomic Income Classification
  • Demographic Income Predictor
  • Employment and Income Attributes Data

Attributes

Original Data Source: Adult Income Prediction Dataset

Listing Stats

VIEWS

2

DOWNLOADS

1

LISTED

06/09/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in ZIP Format