Health Insurance Premium Predictor
Healthcare Insurance & Costs
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides health insurance premium charges based on various policyholder characteristics such as gender, Body Mass Index (BMI), age, number of children, smoker status, and region. It is an openly available online dataset widely recommended for individuals who are beginning their journey in data science, particularly for practicing regression models. The dataset's exact origin and collection methodology are not specified.
Columns
- age: The age of the policyholder, ranging from 18 to 64 years.
- sex: The gender of the policyholder, categorised as male or female.
- bmi: The Body Mass Index of the policyholder, with values from 15.96 to 53.13.
- children: The number of children the policyholder has, ranging from 0 to 5.
- smoker: A boolean field indicating whether the policyholder is a smoker (true) or not (false).
- region: The geographical region to which the policyholder belongs, including categories like 'southeast', 'southwest', and 'other'.
- charges: The premium charged to the policyholder, with values ranging from £1,121.87 to £63,770.43.
Distribution
The dataset is provided in a CSV file format named
insurance.csv
. It has a file size of 55.63 kB. The structure includes 7 columns and comprises 1,338 records or rows, with no missing values identified across any of the columns.Usage
This dataset is ideally suited for:
- Practicing and building regression models, especially for predicting insurance premiums.
- Beginners in data science looking for a clear, accessible dataset for their initial modelling exercises.
- Exploring the relationships between various personal attributes and healthcare costs.
- Educational purposes in statistics and machine learning courses.
Coverage
The dataset covers demographic attributes of policyholders including age (18-64), gender (male/female), BMI (15.96-53.13), number of children (0-5), and smoker status. Geographically, it includes policyholders from specified regions (southeast, southwest) and an 'other' category. A specific time range for data collection is not available. The dataset maintains 1,338 valid records across all characteristics.
License
CC0: Public Domain
Who Can Use It
This dataset is primarily intended for:
- Data science students and enthusiasts for learning and applying regression techniques.
- Machine learning practitioners for building predictive models related to health insurance costs.
- Researchers interested in the factors influencing insurance premiums.
- Anyone seeking an accessible and clean dataset for analytical practice.
Dataset Name Suggestions
- Health Insurance Premium Predictor
- Medical Charges Dataset
- Insurance Cost Factors
- Policyholder Premium Data
- Healthcare Premium Analysis
Attributes
Original Data Source: Health Insurance Premium Predictor