Opendatabay APP

Socio-Economic Income Classification Data

Data Science and Analytics

Tags and Keywords

Census

Income

Prediction

Socioeconomic

Demographic

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Socio-Economic Income Classification Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Provides socio-economic indicators necessary to predict individual income levels within the United States. Extracted from the 1994 US Census Database by Barry Becker, this resource offers an in-depth view of American demographic characteristics, suitable for predictive modelling exercises. The primary goal is to determine if an individual's income exceeds $50K per year, making it a valuable resource for understanding the intricate interplay between factors like age, education, and financial outcomes.

Columns

The dataset includes 15 distinct features detailing anonymised socio-economic characteristics of individuals:
  • age: The age of the person, ranging from 17 to 90.
  • workclass: The employment sector (e.g., Private, Federal-gov, Self-emp-not-inc).
  • fnlwgt: Final weight.
  • education: The level of education attained (e.g., Bachelors, HS-grad, Some-college).
  • education.num: A numerical representation of the education level (Range: 1 to 16).
  • marital.status: The individual’s marital standing (e.g., Married-civ-spouse, Never-married, Divorced).
  • occupation: The specific job role (e.g., Prof-specialty, Craft-repair, Exec-managerial).
  • relationship: The individual's role in their household (e.g., Husband, Wife, Own-child).
  • race: Racial identification (e.g., White, Black, Asian-Pac-Islander).
  • sex: Gender (Male or Female).
  • capital.gain: Capital gains recorded, up to 100k.
  • capital.loss: Capital losses recorded, up to 4356.
  • hours.per.week: The number of hours worked per week (Mean: 40.4).
  • native.country: The country of origin (90% of records are from the United-States).
  • income: The target variable, categorised into two brackets: ">50K" or "<=50K".

Distribution

The data is structured into two separate files: a training set (adult-training.txt) and a test set (adult-test.txt). These are typically provided in CSV format, with each row representing a unique individual. The dataset contains 15 columns. Statistical validation indicates that over 32,600 valid records are available for analysis.

Usage

This resource is highly valued for training and evaluating various machine learning classifiers, such as logistic regression, decision trees, random forests, and neural networks, specifically for binary classification tasks. It is ideal for practising key data preparation steps, including feature selection, handling missing values, encoding categorical variables, and evaluating predictive performance metrics. It supports research into the relationship between demographic attributes and financial outcomes.

Coverage

The scope is centred on US adult demographics based on information captured in the 1994 US Census Database. Geographic coverage is focused heavily on the United States, which accounts for the vast majority of records, although 41 other native countries are represented. The data covers individuals whose ages range from 17 up to 90 years.

License

CC0: Public Domain

Who Can Use It

  • Data Scientists and Machine Learning Engineers: For building predictive models for income classification and benchmarking algorithms.
  • Students and Educators: An excellent, widely-used resource for introductory data exploration, preprocessing, and classification projects.
  • Social Scientists and Economists: To analyse the link between education, occupation, and wealth disparity in a historical context.

Dataset Name Suggestions

  • US Adult Income Predictor
  • 1994 Census Demographic Income Dataset
  • Socio-Economic Income Classification Data
  • US Census 50K Income Predictor

Attributes

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

20/10/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in CSV Format