Opendatabay APP

Synthetic Vehicle Premium Prediction Data

Synthetic Data Generation

Tags and Keywords

Insurance

Premium

Synthetic

Driver

Mileage

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Synthetic Vehicle Premium Prediction Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This data simulates car insurance premiums, generated synthetically using a linear formula based on critical risk factors. The collection consists of 1,000 records designed to exhibit realistic statistical patterns informed by typical influences on insurance pricing. This resource is highly valuable for developing linear regression models, conducting feature importance analysis, and practising predictive modelling specifically tailored to the insurance industry.

Columns

  • Driver Age: The age of the driver measured in years. This feature is a key determinant of the final insurance cost. Values range from 18 to 65 years.
  • Driver Experience: The total number of years the individual has been operating a vehicle, constrained to a maximum of Driver Age minus 18. Values range from 0 to 40 years.
  • Previous Accidents: The count of accidents the driver has been involved in throughout their history. The maximum recorded value is 5 accidents.
  • Annual Mileage (x1000 km): The estimated yearly distance covered by the driver, represented in thousands of kilometres. Values span from 11 to 25 thousand kilometres.
  • Car Manufacturing Year: The year the vehicle was produced. Vehicles manufactured between 1990 and 2025 are included, noting that older years may correlate with a slightly higher premium.
  • Car Age: The age of the vehicle in years, calculated relative to the year 2025. Values range from 0 to 35 years.
  • Insurance Premium (£): The final calculated insurance premium amount based on the inputs from the other features. Premiums range approximately from £477 to £508.

Distribution

The dataset, titled car_insurance_premium_dataset.csv, is structured as a single CSV file. It contains 1,000 records (rows) and 7 unique features (columns). The file size is 25.02 kB.

Usage

  • Developing and training linear regression and other predictive models.
  • Performing feature importance studies to understand which driver characteristics most influence premium cost.
  • Educational use cases for demonstrating machine learning in finance and risk assessment.
  • Benchmarking new algorithms for calculating predicted risk.

Coverage

This is synthetic data, so geographic coverage is not applicable. The data covers a demographic scope of drivers aged 18 to 65. The time range for car manufacturing years spans 1990 through 2025, with corresponding car ages up to 35 years.

License

Attribution 4.0 International (CC BY 4.0)

Who Can Use It

  • Data Scientists: For training and evaluating predictive models related to actuarial science and pricing.
  • Students/Academics: For coursework focusing on applied statistics, regression analysis, and machine learning principles.
  • Insurance Analysts: To gain insight into how simulated variables interact to influence premium pricing structures.

Dataset Name Suggestions

  • Synthetic Vehicle Premium Prediction Data
  • Driver Risk Factors and Insurance Cost Data
  • Car Insurance Premium Dataset

Attributes

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

29/10/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in ZIP Format