PCOS Risk Indicators Survey Data
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
The data explains critical variables gathered to facilitate the development of a Polycystic Ovary Syndrome (PCOS) risk prediction system using machine learning models. This information was collected via Google Form by a team of final-year engineering students for their research project. PCOS is recognised as one of the most prevalent endocrine disorders globally, affecting an estimated 15% to 20% of reproductive-age women. The dataset includes responses pertaining to common symptoms, lifestyle factors, and diagnostic criteria associated with the condition, which is a major contributor to fertility challenges and impacts overall well-being.
Columns
The dataset contains 16 distinct columns, tracking key demographic, symptomatic, and lifestyle factors:
- Age (in Years): The respondent's age. (Mean age is 25.5)
- Weight (in Kg): The respondent's weight. (Mean weight is 59.3 kg)
- Height (in Cm / Feet): The respondent’s height.
- Blood Group: The respondent’s reported blood group.
- Period Frequency: Indicates how often periods occur (1 signifies regular/monthly).
- Recent Weight Gain: Binary indicator (0 or 1) of recent weight increase.
- Excessive body/facial hair growth: Binary indicator of hirsutism symptoms.
- Skin Darkening: Binary indicator of acanthosis nigricans symptoms.
- Hair Loss: Binary indicator of hair thinning or baldness.
- Acne/ Pimples: Binary indicator of acne presence on the face or jawline.
- Junk Food: Binary indicator of regular fast food consumption.
- Excercise: Binary indicator of exercising on a regular basis.
- PCOS: The final diagnosis status (0 or 1).
- Mood Swings: Binary indicator of experiencing mood swings.
- Regularity: Binary indicator of whether periods are regular.
- Period Length: The duration of periods, measured in days.
Distribution
The survey data is structured in a CSV file format, named
CLEAN- PCOS SURVEY SPREADSHEET.csv. The file contains 16 columns and includes 465 valid records. Based on analysis, there are no mismatched or missing data points in the available records. Updates to this dataset are expected to occur annually.Usage
This data is ideal for several applications, particularly within the field of artificial intelligence and health informatics. Primary uses include:
- Building and evaluating machine learning models designed to predict the risk of PCOS.
- Statistical analysis focused on identifying correlations between specific symptoms, lifestyle choices, and PCOS diagnosis.
- Performing detailed data analytics, including outlier analysis and model comparison, in the context of women's endocrine health.
Coverage
The data originates from a 2023 survey. It includes responses from women whose ages predominantly fall within the reproductive range (18–45), although the sampled age distribution spans from 13 up to 58 years. Demographic and time-based details are specific to the survey responses collected at that time.
License
CC0: Public Domain
Who Can Use It
Intended users include:
- Data Scientists and Machine Learning Engineers: For training, testing, and optimising predictive health models.
- Academics and University Students: For research projects and statistical assignments focusing on public health and endocrinology.
- Health Policy Researchers: To gain insights into the self-reported symptoms and lifestyle factors related to PCOS prevalence.
Dataset Name Suggestions
- PCOS Risk Indicators Survey Data
- Women's Health Prediction Features
- Polycystic Ovary Syndrome Symptoms Registry
- 2023 Endocrine Disorder Data Set
Attributes
Original Data Source: PCOS Risk Indicators Survey Data
Loading...
