Income Prediction Dataset
Finance & Banking Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset is designed for binary classification tasks, primarily to predict whether an individual's salary exceeds £50,000 [1]. It includes various predictive features such as age, work class, education, marital status, occupation, relationship, race, gender, capital gain, capital loss, hours per week, and native country, providing context for income prediction [1-8]. The dataset's goal is to enable the practice of machine learning problems, specifically classification [1].
Columns
- age: Represents the age of the individual. Values range from 17 to 90, with a mean of 38.6 years [2].
- workclass: Describes the type of employment. Common values include Private (70%), Self-emp-not-inc (9%), and others [2].
- fnlwgt: Stands for 'final weight', an anonymised population survey weighting factor. Values range from 21,472 to 857,532, with a mean of 194,000 [3].
- education: Details the highest level of education attained. HS-grad (34%) and Some-college (22%) are frequently observed [3].
- educational-num: Education level represented as an integer, ranging from 1 to 16, with a mean of 10.2 [4].
- marital-status: Indicates the individual's marital status. Married-civ-spouse (45%) and Never-married (33%) are common statuses [4].
- occupation: Describes the individual's occupation. Adm-clerical (14%) and Craft-repair (14%) are notable categories [5].
- relationship: Specifies the individual's relationship status. Husband (39%) and Not-in-family (28%) are frequent [5].
- race: Identifies the individual's race. White (86%) and Black (10%) are the most prevalent [5].
- gender: Indicates the individual's gender, with Male accounting for 67% and Female for 33% [6].
- capital-gain: Represents capital gains, with most entries showing zero, but values can go up to £100,000 [6].
- capital-loss: Represents capital losses, with most entries showing zero, but values can go up to £2,415 [7].
- hours-per-week: The number of hours worked per week. Values range from 2 to 99 hours, with a mean of 41.1 hours [7].
- native-country: The individual's native country. United-States is the most common (91%), followed by Mexico (1%) and other countries [8].
Distribution
The dataset is typically provided as a data file, often in CSV format [1, 9]. A sample file is expected to be updated separately on the platform [9]. The
test.csv
file has a size of 93.31 kB and contains 14 columns [1]. It comprises 899 valid records [2-8].Usage
This dataset is ideal for practicing machine learning problems, particularly binary classification [1]. It can be used to build models that predict whether an individual's salary is greater than £50,000 based on their demographic and professional attributes [1].
Coverage
The dataset primarily covers individuals from the United States, as 91% of the 'native-country' entries are 'United-States' [8]. It includes a range of ages from 17 to 90 years [2]. Information regarding specific time ranges for data collection or further demographic notes beyond the provided columns is not specified in the sources.
License
CC0: Public Domain
Who Can Use It
This dataset is suitable for data scientists, machine learning practitioners, and students who wish to practice and develop classification models [1]. Researchers interested in socio-economic factors influencing income brackets may also find it valuable. Its usability rating is 10.00 [1].
Dataset Name Suggestions
- Income Prediction Dataset
- Salary Classification Data
- Income Bracket Classifier
- Personal Income Predictor
- Financial Status Dataset
Attributes
Original Data Source: Income Prediction Dataset