Opendatabay APP

Brazilian National Cancer Patient Register

Patient Health Records & Digital Health

Tags and Keywords

Cancer

Brazil

Medicine

Epidemiology

Rcbp

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Brazilian National Cancer Patient Register Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

The data offers detailed insight into cancer patients recorded across Brazil's cancer centres (RCBP - Cancer Registers with Populational Basis) spanning two decades, from 2000 to 2019. This patient-level information, collected by the National Cancer Institute (INCA), facilitates epidemiological research and public health analysis concerning cancer incidence, survival, and treatment patterns within the country. It includes essential clinical and demographic variables, utilising international coding systems such as CID-10 (classification of diseases) and CID-O3 (classification of cancers, focusing on topography and morphology). Estimated populational data for nearly all Brazilian cities, sourced from IBGE, is also integrated to support rate calculations and analysis.

Columns

The dataset contains 38 columns, including:
  • Patient.Code: An identification code for the patient.
  • RCBP.Name: The name of the specific cancer centre where data was collected (e.g., RCBP SÃO PAULO, RCBP BELO HORIZONTE).
  • Gender: Classified as Feminino (Women) or Masculino (Men).
  • Date.of.Birth: The patient's date of birth (note: this field has a high percentage of missing values, 43%).
  • Age: Patient age in years, with a mean of 60.
  • Raca.Color: Race or colour classification (branco=white, preto=black, pardo=brown, amarela=yellow, indigena=indigenous).
  • Nationality: The patient's nation of origin.
  • Naturality.State / Naturality: The Brazilian state and city where the patient was born.
  • Degree.of.Education: Education level, such as Fundamental (elementary school), Médio (High school), or Sem escolaridade (No formal education).
  • State.Civil: Marital status (e.g., casado=married, solteiro=single).
  • Code.Profession / Name.Occupation: Numeric code and name related to the patient’s profession, based on IBGE classification.
  • Status.Address / City.Address: The current state and city where the patient resides.
  • Description.of.Topography / Topography.Code: Description and code of the cancer location based on WHO CID-O3 (e.g., PROSTATA, SOE or C619).
  • Morphology.Description / Code.of.Morphology: Description and code of the tumour morphology based on WHO classification (e.g., ADENOCARCINOMA, SOE).
  • Description.of.Disease / Illness.Code: Anatomy description and CID-10 code for the illness (e.g., PROSTATA or C61).
  • Child.Illness.Description / Child.Illness.Code: Classification and code specific to children's illnesses.
  • Youth.Adult.Illness.Description / Code.of.Disease.Adult.Young.: Classification and code specific to youth and adult diseases.
  • Indicator.of.Rare.Case: A boolean indicating if the case is considered rare.
  • Diagnostic.means: How the diagnosis was confirmed (e.g., HISTOLOGIA DO TUMOR PRIMÁRIO, or SDO).
  • Extension: Indicates how far the cancer has spread (Localizado=localized; Mestatases=Metastasis).
  • Laterality: Specifies the side (Direita=right; esquerda=left; não se aplica=no laterality).
  • Statement / TNM: Staging information, describing tumour size (T), nodal involvement (N), and metastasis (M). Note: TNM staging is missing for 97% of records.
  • Status.Vital: Vital status, including Morto (dead).
  • Type.of.Death / Date.of.Death: Cause and date of death.
  • Date.of.Last.Contact: Date when the patient was lost to follow-up or when treatment concluded.
  • Date.of.Diagnostic: The date the cancer was diagnosed, ranging from 2000 to 2019.
  • Distant.metastasis: Site code of distant metastasis.
  • year: The year of diagnosis.

Distribution

The data is delivered in a CSV file (cancer_data_eng.csv) and is approximately 625.99 MB in size. The dataset contains approximately 1.7 to 1.78 million records. The data collection process involved systematic collection by RCBP centres, which were added throughout the time period covered. A noticeable data gap or reduction in collection volume appears to exist for the latter half of the time series, specifically from 2013 through 2019.

Usage

This resource is suited for answering important public health and clinical questions, such as:
  • Determining which cancers are most incident within specific populations or sub-populations.
  • Analysing trends in survival rates over the 20-year period.
  • Investigating patient migration patterns—specifically, whether people seek treatment within their home state or travel to other states.
  • Benchmarking cancer centres to determine if there are significant differences in patient outcomes based on the diagnosis location.
  • Conducting data quality assessment by quantifying the proportion of missing or unspecific values across key variables.

Coverage

Geographic Coverage: Brazil, with detailed demographic, residence, and populational data available for almost all cities. Time Range: 2000–2019. Diagnostic dates range from January 2000 to December 2019. Specific Notes: Data collection was phased, with RCBP centres being integrated throughout the period. Furthermore, the practice of patients receiving treatment outside their state of residence is noted as a factor that may influence ratio analysis.

License

CC0: Public Domain

Who Can Use It

  • Epidemiologists: To calculate incidence and mortality rates across different age, race, and geographic segments.
  • Health Policy Makers: To evaluate the effectiveness of regional cancer treatment programmes and resource distribution.
  • Data Scientists/Statisticians: To perform survival analysis and predictive modelling of cancer outcomes.
  • Clinical Researchers: To study the prevalence of specific topography and morphology codes over time.

Dataset Name Suggestions

  • Brazilian National Cancer Patient Register (2000-2019)
  • INCA RCBP Patient Data: Brazil Cancer Epidemiology
  • Brazil Oncology Data 2000-2019

Attributes

Listing Stats

VIEWS

10

DOWNLOADS

1

LISTED

02/11/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in ZIP Format