Opendatabay APP

Patient Cancer Risk Factors Data

Patient Health Records & Digital Health

Tags and Keywords

Cancer

Medical

Prediction

Lifestyle

Dataset

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Patient Cancer Risk Factors Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is designed for predicting cancer risk using a combination of medical and lifestyle information. It serves as a valuable resource for training and testing machine learning models in the medical domain, presenting a realistic challenge for predictive modelling. The dataset is synthetic and was generated for educational purposes, making it ideal for data science and machine learning projects focusing on health conditions. It has been preprocessed and cleaned to allow users to concentrate on developing and fine-tuning their predictive models.

Columns

  • Age: Integer values representing the patient's age, ranging from 20 to 80.
  • Gender: Binary values indicating gender, where 0 is Male and 1 is Female.
  • BMI: Continuous values representing Body Mass Index, ranging from 15 to 40.
  • Smoking: Binary values indicating smoking status, where 0 is No and 1 is Yes.
  • GeneticRisk: Categorical values representing genetic risk levels for cancer, with 0 for Low, 1 for Medium, and 2 for High.
  • PhysicalActivity: Continuous values representing the number of hours per week spent on physical activities, ranging from 0 to 10.
  • AlcoholIntake: Continuous values representing the number of alcohol units consumed per week, ranging from 0 to 5.
  • CancerHistory: Binary values indicating whether the patient has a personal history of cancer, where 0 is No and 1 is Yes.
  • Diagnosis: Binary values indicating the cancer diagnosis status, where 0 is No Cancer and 1 is Cancer. This is the main variable to predict.

Distribution

The dataset contains information for 1500 patients. It is structured as a tabular dataset, typically available in CSV format. The data features are balanced and include realistic variability in patient information. All columns are valid, with no mismatched or missing data identified.

Usage

This dataset is well-suited for various applications in machine learning and data science, including:
  • Training and evaluating machine learning models for cancer prediction.
  • Conducting feature importance analysis to identify key risk factors.
  • Benchmarking different algorithms for predictive modelling.
  • Exploring various modelling approaches and feature engineering techniques related to health and lifestyle data.

Coverage

The dataset includes demographic information such as age, ranging from 20 to 80, and gender (male and female). It focuses on medical and lifestyle factors associated with cancer risk. The dataset consists of 1500 patient records. No specific geographic or time range coverage is indicated as it is a synthetic dataset.

License

Attribution 4.0 International (CC BY 4.0)

Who Can Use It

This dataset is ideal for:
  • Data Scientists: For developing and refining predictive models.
  • Machine Learning Engineers: For algorithm benchmarking and model evaluation.
  • Researchers: For exploring relationships between lifestyle/medical factors and cancer risk.
  • Students: For educational projects in data science, machine learning, and health analytics.

Dataset Name Suggestions

  • Cancer Risk Prediction Dataset
  • Medical and Lifestyle Cancer Prediction Data
  • Synthetic Cancer Diagnosis Dataset
  • Patient Cancer Risk Factors Data

Attributes

Original Data Source: Patient Cancer Risk Factors Data

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

27/07/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format