Opendatabay APP

Medical Diagnosis Prediction Dataset

Public Safety & Security

Tags and Keywords

Earth

And

Nature

Health

Conditions

Text

Nlp

Russian

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Medical Diagnosis Prediction Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is designed for preliminary diagnosis prediction, supporting patient flow logistics and the second opinion concept during patient interactions through dialogue systems. It is part of a project initiated at ITMO University in 2022. The dataset maps symptoms to diseases, offering a valuable resource for developing AI and LLM-based diagnostic tools. It comprises two main columns, detailing symptoms and their corresponding diagnoses, with 132 unique symptoms and 40 unique diagnoses identified.

Columns

  • симптомы (symptoms as list): Contains information regarding various patient symptoms, often provided as a list.
  • диагноз (disease name): Specifies the corresponding disease name or diagnosis associated with the listed symptoms.

Distribution

The dataset is typically provided in a CSV format. It structures information across two columns: symptoms and disease names. While the exact total number of rows or records is not specified, the dataset includes 132 unique symptoms and 40 unique diagnoses. This is a Version 1.0 dataset.

Usage

This dataset is ideally suited for:
  • Developing and training preliminary diagnosis prediction models.
  • Enhancing patient flow logistics in healthcare settings.
  • Supporting second opinion concepts through automated systems.
  • Building and refining dialogue systems for patient interactions.
  • Training AI and machine learning models for symptom-disease mapping.

Coverage

The dataset's scope is global, indicating its potential applicability across different regions. The project that developed these datasets has been active since 2022, suggesting the data reflects contemporary medical terminology and contexts. The dataset was listed on 26/06/2025.

License

CC-BY-NC

Who Can Use It

  • AI/LLM developers: For training and fine-tuning models in medical diagnostics and conversational AI.
  • Medical researchers: To analyse symptom-disease correlations and develop predictive tools.
  • Healthcare technology developers: For creating applications that assist with patient intake, preliminary diagnoses, and medical information systems.
  • Academic institutions: For educational and research purposes in health informatics and AI in medicine.

Dataset Name Suggestions

  • Patient Symptom-Disease Mapping Data
  • Medical Diagnosis Prediction Dataset
  • Healthcare Dialogue System Training Data
  • Symptom-Disease NLP Data
  • Clinical Symptom-Diagnosis Dataset

Attributes

Original Data Source: Patient Disease Dataset

Listing Stats

VIEWS

1

DOWNLOADS

4

LISTED

26/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free