Opendatabay APP

Parkinsons Disease Voice Screening

Public Health & Epidemiology

Tags and Keywords

Parkinson's

Voice

Disease

Diagnosis

Biomedical

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Parkinsons Disease Voice Screening Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Contains biomedical voice measurements crucial for the detection and diagnosis of Parkinson's Disease (PD). Parkinson's Disease is a degenerative neurological disorder impacting movement and speech due to reduced dopamine levels in the brain. Characteristic vocal features, such as dysarthria, hypophonia, and monotone, make voice recordings a valuable and non-invasive diagnostic tool. The data is designed to facilitate the application of machine learning algorithms for an effective screening process, potentially prior to a clinician's appointment, addressing the challenge of early and difficult PD diagnosis.

Columns

  • name: An ASCII subject name combined with the recording number.
  • MDVP:Fo(Hz): Represents the average vocal fundamental frequency.
  • MDVP:Fhi(Hz): Indicates the maximum vocal fundamental frequency.
  • MDVP:Flo(Hz): Shows the minimum vocal fundamental frequency.
  • MDVP:Jitter(%): A measure of variation in fundamental frequency, expressed as a percentage.
  • MDVP:Jitter(Abs): An absolute measure of jitter.
  • MDVP:RAP: The Relative Average Perturbation, another measure of vocal stability.
  • MDVP:PPQ: The Five-point Period Perturbation Quotient, related to vocal period variability.
  • Jitter:DDP: A measure of variation in fundamental frequency.
  • MDVP:Shimmer: A measure of variation in amplitude.
  • MDVP:Shimmer(dB): Shimmer expressed in decibels.
  • Shimmer:APQ3: The Three-point Amplitude Perturbation Quotient.
  • Shimmer:APQ5: The Five-point Amplitude Perturbation Quotient.
  • MDVP:APQ: A measure of variation in amplitude.
  • Shimmer:DDA: A measure of variation in amplitude.
  • NHR: The Noise-to-Harmonics Ratio, indicating vocal noise levels.
  • HNR: The Harmonics-to-Noise Ratio, indicating the clarity of vocalisation.
  • status: The health status of the individual, where '0' represents healthy and '1' represents Parkinson's Disease. This is the primary target variable for discrimination.
  • RPDE: Recurrence Period Density Entropy, a nonlinear dynamical complexity measure.
  • DFA: Detrended Fluctuation Analysis, another nonlinear dynamical complexity measure.
  • spread1: A nonlinear dynamical complexity measure.
  • spread2: A further nonlinear dynamical complexity measure.
  • D2: The Correlation Dimension, a measure of data complexity.
  • PPE: Pitch Period Entropy, related to the variability of the vocal pitch period.

Distribution

This dataset comprises biomedical voice measurements from 31 individuals, 23 of whom have Parkinson's Disease. It consists of 195 individual voice recordings. The data is structured with 24 columns, detailing various voice metrics. The data file is typically in CSV format. Specific numbers for rows/records are 195. There are no missing or mismatched values across all columns. This dataset is not expected to be updated in the future.

Usage

This dataset is ideally suited for:
  • Developing and evaluating machine learning models for early Parkinson's Disease detection.
  • Research into speech signal processing and its applications in neurological disorder diagnostics.
  • Creating screening tools that utilise voice analysis to identify individuals at risk of PD.
  • Educational projects demonstrating the use of biomedical data in predictive analytics.
  • Exploring the characteristic vocal features associated with Parkinson's Disease progression.

Coverage

The dataset includes voice measurements from 31 individuals, focusing on distinguishing healthy people from those with Parkinson's Disease. It does not explicitly state geographical or time-range coverage, nor does it provide detailed demographic breakdowns beyond the number of participants. The core focus is on the biomedical speech characteristics across these two groups.

License

CC0: Public Domain

Who Can Use It

  • Machine Learning Engineers and Data Scientists: To build and refine models for disease diagnosis.
  • Medical Researchers: To study vocal biomarkers of Parkinson's Disease.
  • Academics and Students: For teaching and learning in bioinformatics, signal processing, and AI in healthcare.
  • Healthcare Innovators: To develop new non-invasive screening technologies.
  • Public Health Professionals: To investigate new methods for population-level health screening.

Dataset Name Suggestions

  • Parkinson's Vocal Biomarkers
  • PD Voice Diagnostics Data
  • Biomedical Speech for Parkinson's
  • Parkinson's Disease Voice Screening
  • Neurological Speech Analysis Dataset

Attributes

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

08/09/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format