Social Demographics and Income Data
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
The data product facilitates learning for individuals stepping into data science and machine learning. It offers diverse sample datasets, with a key focus on social science data used to predict income based on various demographic and employment metrics. This material provides essential structure for educational purposes.
Columns
The core social science file contains 15 detailed columns:
- age: Age of the individual, measured in years.
- workclass: Employment category, featuring eight possibilities such as Private, Federal-gov, and Never-worked.
- fnlwgt: The Final Weight attribute.
- education: Academic attainment level, covering 16 types from Bachelors to Preschool.
- education-num: The calculated number of years of education received.
- marital-status: Relationship status, which includes categories like Married-civ-spouse and Never-married.
- occupation: Specifies the job role, with 14 possibilities such as Exec-managerial and Craft-repair.
- relationship: Describes the individual's role in the household (e.g., Wife, Husband, Own-child).
- race: One of five specific racial categories.
- sex: Gender of the individual (Male or Female).
- capital-gain: The monetary value gained from capital investments.
- capital-loss: The monetary value lost from capital investments.
- hours-per-week: The number of hours worked during a typical week.
- native-country: The country of origin, encompassing 41 unique categories, including United-States, India, and Germany.
- income: The target variable, indicating if income is greater than $50,000 or lesser than and equal to $50,000.
Distribution
The product contains five sample datasets. The specific
census_income.csv file detailed has 15 columns and a size of 119.99 kB. Metadata indicates 1003 total values are present for each field. Note that statistical summaries such as Mean, Min, and Max are not currently available for the numerical attributes. The data is not scheduled for future updates.Usage
Ideal applications include: developing binary classification models to predict high or low income, practicing various regression techniques using continuous variables, performing clustering analysis on demographic factors, and conducting foundational data exploration and cleaning exercises.
Coverage
The data provides detailed demographic coverage across variables like education, race, age, and relationship status. The geographical scope is broad, defined by the native-country attribute, which references 41 distinct locations globally.
License
CC0: Public Domain
Who Can Use It
- Students: For educational projects requiring structured, real-world data samples.
- New Data Scientists: Individuals looking for ready-to-use intermediate-level datasets to practice various modelling paradigms.
- Researchers: Social scientists interested in studying the relationship between demographic features and income outcomes.
Dataset Name Suggestions
- Beginner Modelling Study Set
- Intermediate Classification & Regression Data
- Social Demographics and Income Data
- Machine Learning Sample Datasets
Attributes
Original Data Source: Social Demographics and Income Data
Loading...
