Opendatabay APP

Income Prediction Dataset

Finance & Banking Analytics

Tags and Keywords

Income

Salary

Prediction

Classification

Demographics

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Income Prediction Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is designed for binary classification tasks, primarily to predict whether an individual's salary exceeds £50,000 [1]. It includes various predictive features such as age, work class, education, marital status, occupation, relationship, race, gender, capital gain, capital loss, hours per week, and native country, providing context for income prediction [1-8]. The dataset's goal is to enable the practice of machine learning problems, specifically classification [1].

Columns

  • age: Represents the age of the individual. Values range from 17 to 90, with a mean of 38.6 years [2].
  • workclass: Describes the type of employment. Common values include Private (70%), Self-emp-not-inc (9%), and others [2].
  • fnlwgt: Stands for 'final weight', an anonymised population survey weighting factor. Values range from 21,472 to 857,532, with a mean of 194,000 [3].
  • education: Details the highest level of education attained. HS-grad (34%) and Some-college (22%) are frequently observed [3].
  • educational-num: Education level represented as an integer, ranging from 1 to 16, with a mean of 10.2 [4].
  • marital-status: Indicates the individual's marital status. Married-civ-spouse (45%) and Never-married (33%) are common statuses [4].
  • occupation: Describes the individual's occupation. Adm-clerical (14%) and Craft-repair (14%) are notable categories [5].
  • relationship: Specifies the individual's relationship status. Husband (39%) and Not-in-family (28%) are frequent [5].
  • race: Identifies the individual's race. White (86%) and Black (10%) are the most prevalent [5].
  • gender: Indicates the individual's gender, with Male accounting for 67% and Female for 33% [6].
  • capital-gain: Represents capital gains, with most entries showing zero, but values can go up to £100,000 [6].
  • capital-loss: Represents capital losses, with most entries showing zero, but values can go up to £2,415 [7].
  • hours-per-week: The number of hours worked per week. Values range from 2 to 99 hours, with a mean of 41.1 hours [7].
  • native-country: The individual's native country. United-States is the most common (91%), followed by Mexico (1%) and other countries [8].

Distribution

The dataset is typically provided as a data file, often in CSV format [1, 9]. A sample file is expected to be updated separately on the platform [9]. The test.csv file has a size of 93.31 kB and contains 14 columns [1]. It comprises 899 valid records [2-8].

Usage

This dataset is ideal for practicing machine learning problems, particularly binary classification [1]. It can be used to build models that predict whether an individual's salary is greater than £50,000 based on their demographic and professional attributes [1].

Coverage

The dataset primarily covers individuals from the United States, as 91% of the 'native-country' entries are 'United-States' [8]. It includes a range of ages from 17 to 90 years [2]. Information regarding specific time ranges for data collection or further demographic notes beyond the provided columns is not specified in the sources.

License

CC0: Public Domain

Who Can Use It

This dataset is suitable for data scientists, machine learning practitioners, and students who wish to practice and develop classification models [1]. Researchers interested in socio-economic factors influencing income brackets may also find it valuable. Its usability rating is 10.00 [1].

Dataset Name Suggestions

  • Income Prediction Dataset
  • Salary Classification Data
  • Income Bracket Classifier
  • Personal Income Predictor
  • Financial Status Dataset

Attributes

Original Data Source: Income Prediction Dataset

Listing Stats

VIEWS

0

DOWNLOADS

1

LISTED

20/07/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in ZIP Format