Medical Diagnosis Prediction Dataset
Public Safety & Security
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset is designed for preliminary diagnosis prediction, supporting patient flow logistics and the second opinion concept during patient interactions through dialogue systems. It is part of a project initiated at ITMO University in 2022. The dataset maps symptoms to diseases, offering a valuable resource for developing AI and LLM-based diagnostic tools. It comprises two main columns, detailing symptoms and their corresponding diagnoses, with 132 unique symptoms and 40 unique diagnoses identified.
Columns
- симптомы (symptoms as list): Contains information regarding various patient symptoms, often provided as a list.
- диагноз (disease name): Specifies the corresponding disease name or diagnosis associated with the listed symptoms.
Distribution
The dataset is typically provided in a CSV format. It structures information across two columns: symptoms and disease names. While the exact total number of rows or records is not specified, the dataset includes 132 unique symptoms and 40 unique diagnoses. This is a Version 1.0 dataset.
Usage
This dataset is ideally suited for:
- Developing and training preliminary diagnosis prediction models.
- Enhancing patient flow logistics in healthcare settings.
- Supporting second opinion concepts through automated systems.
- Building and refining dialogue systems for patient interactions.
- Training AI and machine learning models for symptom-disease mapping.
Coverage
The dataset's scope is global, indicating its potential applicability across different regions. The project that developed these datasets has been active since 2022, suggesting the data reflects contemporary medical terminology and contexts. The dataset was listed on 26/06/2025.
License
CC-BY-NC
Who Can Use It
- AI/LLM developers: For training and fine-tuning models in medical diagnostics and conversational AI.
- Medical researchers: To analyse symptom-disease correlations and develop predictive tools.
- Healthcare technology developers: For creating applications that assist with patient intake, preliminary diagnoses, and medical information systems.
- Academic institutions: For educational and research purposes in health informatics and AI in medicine.
Dataset Name Suggestions
- Patient Symptom-Disease Mapping Data
- Medical Diagnosis Prediction Dataset
- Healthcare Dialogue System Training Data
- Symptom-Disease NLP Data
- Clinical Symptom-Diagnosis Dataset
Attributes
Original Data Source: Patient Disease Dataset