Opendatabay APP

Health Insurance Premium Predictor

Healthcare Insurance & Costs

Tags and Keywords

Health

Insurance

Premium

Charges

Medical

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Health Insurance Premium Predictor Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset provides health insurance premium charges based on various policyholder characteristics such as gender, Body Mass Index (BMI), age, number of children, smoker status, and region. It is an openly available online dataset widely recommended for individuals who are beginning their journey in data science, particularly for practicing regression models. The dataset's exact origin and collection methodology are not specified.

Columns

  • age: The age of the policyholder, ranging from 18 to 64 years.
  • sex: The gender of the policyholder, categorised as male or female.
  • bmi: The Body Mass Index of the policyholder, with values from 15.96 to 53.13.
  • children: The number of children the policyholder has, ranging from 0 to 5.
  • smoker: A boolean field indicating whether the policyholder is a smoker (true) or not (false).
  • region: The geographical region to which the policyholder belongs, including categories like 'southeast', 'southwest', and 'other'.
  • charges: The premium charged to the policyholder, with values ranging from £1,121.87 to £63,770.43.

Distribution

The dataset is provided in a CSV file format named insurance.csv. It has a file size of 55.63 kB. The structure includes 7 columns and comprises 1,338 records or rows, with no missing values identified across any of the columns.

Usage

This dataset is ideally suited for:
  • Practicing and building regression models, especially for predicting insurance premiums.
  • Beginners in data science looking for a clear, accessible dataset for their initial modelling exercises.
  • Exploring the relationships between various personal attributes and healthcare costs.
  • Educational purposes in statistics and machine learning courses.

Coverage

The dataset covers demographic attributes of policyholders including age (18-64), gender (male/female), BMI (15.96-53.13), number of children (0-5), and smoker status. Geographically, it includes policyholders from specified regions (southeast, southwest) and an 'other' category. A specific time range for data collection is not available. The dataset maintains 1,338 valid records across all characteristics.

License

CC0: Public Domain

Who Can Use It

This dataset is primarily intended for:
  • Data science students and enthusiasts for learning and applying regression techniques.
  • Machine learning practitioners for building predictive models related to health insurance costs.
  • Researchers interested in the factors influencing insurance premiums.
  • Anyone seeking an accessible and clean dataset for analytical practice.

Dataset Name Suggestions

  • Health Insurance Premium Predictor
  • Medical Charges Dataset
  • Insurance Cost Factors
  • Policyholder Premium Data
  • Healthcare Premium Analysis

Attributes

Listing Stats

VIEWS

8

DOWNLOADS

1

LISTED

29/07/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format