Opendatabay APP

Toy Data for Prediction Models

Data Science and Analytics

Tags and Keywords

Demographics

Fictional

Analysis

Population

Prediction

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Toy Data for Prediction Models Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is a fictional resource designed for exploratory data analysis (EDA) and for evaluating simple prediction models. It serves as a toy dataset, allowing users to familiarise themselves with data analysis techniques. All the data within is simulated, with distributions engineered to facilitate straightforward statistical analysis.

Columns

  • Number: A simple sequential index assigned to each row in the dataset.
  • City: Represents the geographical location of an individual. The locations included are Dallas, New York City, Los Angeles, Mountain View, Boston, Washington D.C., San Diego, and Austin. Notably, New York City accounts for 34% of entries, while Los Angeles represents 21%. There are 8 unique city values.
  • Gender: Indicates the gender of a person, categorised as either Male or Female. Males constitute 56% of the entries, and Females make up 44%. There are 2 unique gender values.
  • Age: Specifies the age of a person, with values ranging from 25 to 65 years. The mean age is approximately 45 years, with a standard deviation of about 11.6 years.
  • Income: Details the annual income of an individual. Incomes range from approximately -674 to 177,175. The mean income is around 91.3 thousand, with a standard deviation of about 25 thousand.
  • Illness: A binary indicator representing whether a person is ill (Yes or No). It is important to note that all 150,000 entries for this column are currently marked as mismatched, and there are no valid 'Yes' or 'No' counts within the provided sample.

Distribution

The dataset is structured with 150,000 rows and 6 columns. It is typically provided in a CSV file format and has a file size of 5.74 MB. The underlying data distributions have been specifically generated to be convenient for statistical analysis.

Usage

This dataset is ideally suited for:
  • Conducting exploratory data analysis (EDA) to uncover patterns and insights.
  • Developing and testing simple prediction models.

Coverage

  • Geographic Scope: Includes data points from several major US cities: Dallas, New York City, Los Angeles, Mountain View, Boston, Washington D.C., San Diego, and Austin.
  • Demographic Scope: Features demographic attributes such as Gender (Male, Female), Age (25-65 years), and Income.
  • Time Range: Not applicable as the dataset is fictional and does not represent a specific time period.

License

CC0: Public Domain

Who Can Use It

This dataset is appropriate for:
  • Data scientists and analysts for prototyping and testing algorithms.
  • Students and educators for learning and teaching data analysis concepts.
  • Researchers looking for a synthetic dataset to validate methodologies.

Dataset Name Suggestions

  • Fictional Demographic Data
  • Synthetic Population Insights
  • Demographic Simulation Dataset
  • Toy Data for Prediction Models

Attributes

Original Data Source: Toy Data for Prediction Models

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

14/07/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format