Socio-Economic Income Classification Data
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Provides socio-economic indicators necessary to predict individual income levels within the United States. Extracted from the 1994 US Census Database by Barry Becker, this resource offers an in-depth view of American demographic characteristics, suitable for predictive modelling exercises. The primary goal is to determine if an individual's income exceeds $50K per year, making it a valuable resource for understanding the intricate interplay between factors like age, education, and financial outcomes.
Columns
The dataset includes 15 distinct features detailing anonymised socio-economic characteristics of individuals:
- age: The age of the person, ranging from 17 to 90.
- workclass: The employment sector (e.g., Private, Federal-gov, Self-emp-not-inc).
- fnlwgt: Final weight.
- education: The level of education attained (e.g., Bachelors, HS-grad, Some-college).
- education.num: A numerical representation of the education level (Range: 1 to 16).
- marital.status: The individual’s marital standing (e.g., Married-civ-spouse, Never-married, Divorced).
- occupation: The specific job role (e.g., Prof-specialty, Craft-repair, Exec-managerial).
- relationship: The individual's role in their household (e.g., Husband, Wife, Own-child).
- race: Racial identification (e.g., White, Black, Asian-Pac-Islander).
- sex: Gender (Male or Female).
- capital.gain: Capital gains recorded, up to 100k.
- capital.loss: Capital losses recorded, up to 4356.
- hours.per.week: The number of hours worked per week (Mean: 40.4).
- native.country: The country of origin (90% of records are from the United-States).
- income: The target variable, categorised into two brackets: ">50K" or "<=50K".
Distribution
The data is structured into two separate files: a training set (
adult-training.txt) and a test set (adult-test.txt). These are typically provided in CSV format, with each row representing a unique individual. The dataset contains 15 columns. Statistical validation indicates that over 32,600 valid records are available for analysis.Usage
This resource is highly valued for training and evaluating various machine learning classifiers, such as logistic regression, decision trees, random forests, and neural networks, specifically for binary classification tasks. It is ideal for practising key data preparation steps, including feature selection, handling missing values, encoding categorical variables, and evaluating predictive performance metrics. It supports research into the relationship between demographic attributes and financial outcomes.
Coverage
The scope is centred on US adult demographics based on information captured in the 1994 US Census Database. Geographic coverage is focused heavily on the United States, which accounts for the vast majority of records, although 41 other native countries are represented. The data covers individuals whose ages range from 17 up to 90 years.
License
CC0: Public Domain
Who Can Use It
- Data Scientists and Machine Learning Engineers: For building predictive models for income classification and benchmarking algorithms.
- Students and Educators: An excellent, widely-used resource for introductory data exploration, preprocessing, and classification projects.
- Social Scientists and Economists: To analyse the link between education, occupation, and wealth disparity in a historical context.
Dataset Name Suggestions
- US Adult Income Predictor
- 1994 Census Demographic Income Dataset
- Socio-Economic Income Classification Data
- US Census 50K Income Predictor
Attributes
Original Data Source: Socio-Economic Income Classification Data
Loading...
