Healthcare Analytics Training Dataset
Healthcare Insurance & Costs
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This synthetic healthcare dataset serves as a valuable resource for data science, machine learning, and data analysis enthusiasts. It is designed to mimic real-world healthcare data, enabling users to practise, develop, and showcase their data manipulation and analysis skills within the healthcare industry. The inspiration behind this dataset stems from the need for practical and diverse healthcare data for educational and research purposes, addressing the challenge of accessing sensitive real-world healthcare information. Generated using Python's Faker library, it mirrors the structure and attributes commonly found in healthcare records, aiming to foster innovation, learning, and knowledge sharing in healthcare analytics.
Columns
- Name: Represents the name of the patient associated with the healthcare record.
- Age: The age of the patient at the time of admission, expressed in years.
- Gender: Indicates the gender of the patient, either "Male" or "Female."
- Blood Type: The patient's blood type, such as "A+" or "O-."
- Medical Condition: Specifies the primary medical condition or diagnosis, for example, "Diabetes," "Hypertension," or "Asthma."
- Date of Admission: The date on which the patient was admitted to the healthcare facility.
- Doctor: The name of the doctor responsible for the patient's care during their admission.
- Hospital: Identifies the healthcare facility or hospital where the patient was admitted.
- Insurance Provider: Indicates the patient's insurance provider, such as "Aetna," "Blue Cross," "Cigna," "UnitedHealthcare," or "Medicare."
- Billing Amount: The monetary amount billed for the patient's healthcare services during their admission, expressed as a floating-point number.
- Room Number: The room number where the patient was accommodated.
- Admission Type: Specifies the type of admission, which can be "Emergency," "Elective," or "Urgent."
- Discharge Date: The date on which the patient was discharged, based on the admission date and a realistic range of days.
- Medication: Identifies a medication prescribed or administered to the patient, including examples like "Aspirin," "Ibuprofen," "Penicillin," "Paracetamol," and "Lipitor."
- Test Results: Describes the results of a medical test conducted during admission, with possible values being "Normal," "Abnormal," or "Inconclusive."
Distribution
This dataset is typically provided as a data file in CSV format. It is structured with columns providing specific information about the patient, their admission, and the healthcare services received. While the exact number of rows or records is not specified, it is designed to be a synthetic dataset suitable for various data analysis and modelling tasks in the healthcare domain.
Usage
This dataset is ideal for a wide range of applications, including:
- Developing and testing healthcare predictive models.
- Practising data cleaning, transformation, and analysis techniques.
- Creating data visualisations to gain insights into healthcare trends.
- Learning and teaching data science and machine learning concepts in a healthcare context. It can specifically be treated as a Multi-Class Classification Problem for predicting 'Test Results', which contains three categories: Normal, Abnormal, and Inconclusive.
Coverage
The dataset has a global geographic region. The time range for admissions and discharges, as indicated by the 'Date of Admission' and 'Discharge Date' columns, spans across several years, with examples observed from 2019 to 2024. Demographic scope is covered by patient 'Name', 'Age', 'Gender', and 'Blood Type' information. As this is a synthetic dataset, it does not contain real patient information and is created to mirror common healthcare record structures.
License
CCO
Who Can Use It
This dataset is intended for data science, machine learning, and data analysis enthusiasts. It is particularly useful for those looking to engage in learning and experimentation within the healthcare analytics domain. The dataset encourages exploration, analysis, and sharing of findings within communities like Kaggle.
Dataset Name Suggestions
- Healthcare Dataset
- Healthcare Insurance & Costs Data
- Synthetic Patient Records
- Medical Admissions Data for Analytics
- Healthcare Analytics Training Dataset
Attributes
Original Data Source: Healthcare Dataset