Actuarial Health Risk and Expenditure Data
Patient Health Records & Digital Health
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Predicting healthcare expenditure based on individual demographics and lifestyle factors provides a significant advantage for insurance providers and healthcare planners. By examining the relationship between variables such as age, body mass index, and smoking habits, this resource enables the development of models that estimate medical charges with high precision. Understanding these key drivers of cost is vital for actuarial analysis, risk management, and the creation of personalised healthcare recommendations. This data serves as a foundational tool for exploring how personal health choices and physical characteristics influence the financial side of medical care.
Columns
- age: The age of the primary beneficiary, ranging from a minimum of 18 to a maximum of 64 years.
- sex: The gender of the insurance contractor, categorised as male (51%) or female (49%).
- bmi: Body Mass Index, providing an objective index of body weight relative to height, used to assess whether an individual's weight falls within a healthy or high-risk range.
- children: The total number of children or dependents covered by the individual's health insurance plan, ranging from 0 to 5.
- smoker: A binary indicator of the beneficiary's smoking status, with 20% of the records representing active smokers.
- region: The residential area of the beneficiary, divided into four unique zones, including the Southeast and Southwest.
- charges: The target variable representing the individual medical costs billed by the health insurance provider, with values ranging from approximately 1.12k to 63.8k.
Distribution
The information is contained within a single CSV file named insurance.csv, with a file size of approximately 55.63 kB. It features 1,338 valid records across 7 distinct columns, maintaining a 100% validity rate with no missing or mismatched data entries. Due to its nature as a benchmark actuarial collection, no further updates are expected.
Usage
This resource is ideal for conducting regression analysis to predict continuous numerical values for medical expenses. It is well-suited for training machine learning models to identify which factors, such as smoking or BMI, have the most substantial impact on insurance premiums. Additionally, students and researchers can use the data to practice exploratory data analysis and feature engineering for healthcare-related predictive tasks.
Coverage
The scope of the records covers a diverse demographic of individuals aged between 18 and 64. It includes a balanced representation of genders and accounts for varied geographical locations across four primary regions. The data captures a wide range of body mass indices (16 to 53.1) and medical charges, reflecting a broad spectrum of health statuses and associated costs.
License
CC0: Public Domain
Who Can Use It
Actuaries and financial analysts can leverage these records to refine risk assessment models and pricing strategies. Data scientists and students might utilise the dataset to develop their skills in regression and predictive modelling within the health sector. Furthermore, healthcare policy researchers can find this a valuable primary source for investigating the economic burden of smoking and obesity on the insurance system.
Dataset Name Suggestions
- Medical Cost Personal Datasets for Regression
- Health Insurance Cost Prediction Archive
- Individual Medical Charges and Demographic Registry
- Actuarial Health Risk and Expenditure Data
- Demographic Drivers of Healthcare Insurance Costs
Attributes
Original Data Source:Actuarial Health Risk and Expenditure Data
Loading...
Free
Download Dataset in CSV Format
Recommended Datasets
Loading recommendations...
