Speed Dating Match Prediction Data
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset captures insights from experimental speed dating events held between 2002 and 2004. Participants engaged in four-minute "first dates" and then indicated whether they would like to see their date again. The dataset's primary purpose is to enable predictions of compatibility and match outcomes between individuals. It includes participant ratings of their dates across six attributes: Attractiveness, Sincerity, Intelligence, Fun, Ambition, and Shared Interests. Additionally, it contains questionnaire data covering demographics, dating habits, self-perception on key attributes, beliefs about what others value in a mate, and lifestyle information. This rich collection of data allows for a deep exploration of factors influencing romantic compatibility and human interaction dynamics.
Columns
- wave: Indicates the experimental speed dating wave.
- gender: The gender of the participant (self), typically 'male' or 'female'.
- age: The age of the participant (self). Valid ages range from 18 to 55.
- age_o: The age of the partner. Valid ages range from 18 to 55.
- d_age: The calculated difference in age between the participant and their partner. Values range from 0 to 37.
- d_d_age: The binned difference in age. Common bins include '[1, 2]' and '[3-5]'.
- race: The self-reported race of the participant. Common categories include 'European/Caucasian-American' and 'Asian/Pacific Islander/Asian-American'.
- race_o: The race of the partner. Common categories include 'European/Caucasian-American' and 'Asian/Pacific Islander/Asian-American'.
- samerace: A binary indicator (0 or 1) stating whether the two persons have the same race.
- importance_same_race: A rating indicating how important it is to the participant that their partner is of the same race, on a scale of 0 to 10.
- importance_same_religion: A rating indicating how important it is to the participant that their partner has the same religion, on a scale of 1 to 10.
- d_importance_same_race: Binned importance rating for same race.
- d_importance_same_religion: Binned importance rating for same religion.
- field: The participant's field of study. 'Business' and 'MBA' are common examples, though many others exist.
- pref_o_attractive: How important the partner rates attractiveness, typically on a scale from 0 to 100.
- pref_o_sincere: How important the partner rates sincerity, typically on a scale from 0 to 60.
- pref_o_intelligence: How important the partner rates intelligence, typically on a scale from 0 to 50.
- pref_o_funny: How important the partner rates being funny, typically on a scale from 0 to 50.
- pref_o_ambitious: How important the partner rates ambition, typically on a scale from 0 to 53.
- pref_o_shared_interests: How important the partner rates having shared interests, typically on a scale from 0 to 30.
- d_pref_o_attractive: Binned preference rating for partner's attractiveness.
- d_pref_o_sincere: Binned preference rating for partner's sincerity.
- d_pref_o_intelligence: Binned preference rating for partner's intelligence.
Distribution
The dataset is provided in CSV format and is approximately 7.46 MB in size. It contains 24 of the original 123 columns from the full dataset. Many columns feature valid record counts of approximately 8378, with some exhibiting a small percentage of missing values (around 1-2%). Data distributions for numerical columns are available, showing mean, standard deviation, and quantiles, while categorical columns display unique values and their commonality.
Usage
This dataset is ideal for:
- Predictive Modelling: Building models to forecast whether two individuals will match in a speed dating scenario.
- Behavioural Analysis: Investigating the significance of various attributes and preferences in dating outcomes.
- Social Science Research: Studying human behaviour, attraction, and decision-making in a controlled social setting.
- Educational Purposes: Serving as a practical example for students learning binary classification, data analysis, and feature engineering.
Coverage
The data was collected from experimental speed dating events conducted between 2002 and 2004. The geographic scope is not explicitly detailed but pertains to the locations where these events were held. The demographic scope is broad, encompassing various ages, genders, and racial backgrounds of participants, along with their detailed preferences and self-perceptions, providing a nuanced view of dating populations within the specified timeframe.
License
Attribution 4.0 International (CC BY 4.0)
Who Can Use It
This dataset is suitable for:
- Data Scientists and Machine Learning Engineers: For developing and testing classification models.
- Academics and Researchers: In fields such as psychology, sociology, and economics, interested in human interaction and relationship dynamics.
- Students: As a valuable resource for projects and coursework in data science, statistics, and social sciences, particularly those focusing on classification tasks.
- Anyone interested in human relationships: Individuals curious about the underlying factors that contribute to romantic compatibility and attraction.
Dataset Name Suggestions
- Speed Dating Match Prediction Data
- Compatibility Analysis Dataset
- Dating Preferences & Outcomes
- Human Attraction Factors Data
- Experimental Dating Event Records
Attributes
Original Data Source: Speed Dating Match Prediction Data