Opendatabay APP

Lung Cancer Prediction Dataset

Patient Health Records & Digital Health

Tags and Keywords

Health

Cancer

Risk

Smoking

Symptoms

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Lung Cancer Prediction Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset provides key information on various factors that may influence an individual's risk of lung cancer. It includes demographic details, common lifestyle habits, and symptoms frequently associated with the disease. The data is ideal for exploring correlations, developing predictive models, and pinpointing potential risk factors for lung cancer.

Columns

  • GENDER: The gender of the individual, indicated as 'M' for male or 'F' for female.
  • AGE: The individual's age in years.
  • SMOKING: A binary indicator ('Yes'/'No') denoting whether the individual is a smoker.
  • YELLOW_FINGERS: A binary indicator ('Yes'/'No') showing if the individual has yellow fingers.
  • ANXIETY: A binary indicator ('Yes'/'No') for whether the individual experiences anxiety.
  • PEER_PRESSURE: A binary indicator ('Yes'/'No') for whether the individual is influenced by peer pressure.
  • CHRONIC_DISEASE: A binary indicator ('Yes'/'No') for the presence of any chronic disease.
  • FATIGUE: A binary indicator ('Yes'/'No') for whether the individual experiences fatigue.
  • ALLERGY: A binary indicator ('Yes'/'No') for whether the individual has allergies.
  • WHEEZING: A binary indicator ('Yes'/'No') for the presence of wheezing symptoms.
  • ALCOHOL_CONSUMING: A binary indicator ('Yes'/'No') for whether the individual consumes alcohol.
  • COUGHING: A binary indicator ('Yes'/'No') for the presence of coughing symptoms.
  • SHORTNESS_OF_BREATH: A binary indicator ('Yes'/'No') for whether the individual experiences shortness of breath.
  • SWALLOWING_DIFFICULTY: A binary indicator ('Yes'/'No') for whether the individual has difficulty swallowing.
  • CHEST_PAIN: A binary indicator ('Yes'/'No') for whether the individual experiences chest pain.
  • LUNG_CANCER: A binary indicator ('Yes'/'No') for whether the individual has been diagnosed with lung cancer.

Distribution

The dataset is provided in CSV format and consists of 16 distinct columns. It contains 3000 records, with all fields being valid and no missing or mismatched data reported for any column. The file size is approximately 165.27 kB.

Usage

This dataset is particularly valuable for researchers and data scientists. Ideal applications include:
  • Statistical analysis to uncover correlations between various factors and lung cancer development.
  • Building machine learning models to predict an individual's lung cancer risk based on their characteristics.
  • Identifying high-risk groups and informing the development of preventative measures.

Coverage

The dataset provides demographic information including gender (evenly split between male and female) and age, spanning individuals from approximately 30 to 80 years old, with a mean age of 55.2 years. The data does not specify a particular geographic region or time range. All 3000 records are complete and valid across all included factors.

License

CC0: Public Domain

Who Can Use It

This dataset is intended for:
  • Public health researchers investigating disease epidemiology and risk factors.
  • Medical professionals seeking to understand patient profiles associated with lung cancer.
  • Data scientists and machine learning engineers developing predictive health analytics.
  • Academics studying health conditions and chronic diseases.

Dataset Name Suggestions

  • Lung Cancer Risk Factors Data
  • Lung Cancer Prediction Dataset
  • Healthcare Lung Cancer Indicators
  • Pulmonary Cancer Risk Assessment Data

Attributes

Original Data Source: Lung Cancer Prediction Dataset

Listing Stats

VIEWS

2

DOWNLOADS

0

LISTED

22/08/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format