Digital Romance Trends Dataset
Synthetic Data Generation
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset offers a synthetic representation of user behaviour on a fictional dating application, providing valuable insights into user interactions and preferences. It is designed to simulate realistic user engagements, making it an ideal resource for exploratory data analysis (EDA), machine learning model development, and social studies focusing on trends in online dating platforms. The dataset is carefully balanced and diverse, featuring a mix of categorical, numerical, and labelled variables, suitable for a range of analytical tasks.
Columns
- gender: The user’s gender identity. Examples include Female, Non-binary, and Other.
- sexual_orientation: The user’s sexual orientation. Examples include Straight, Lesbian, and Other.
- location_type: The type of geographical location associated with the user. Examples include Remote Area and Small Town.
- income_bracket: The user’s self-declared or simulated income level. Examples include High and Very High.
- education_level: The highest level of education attained by the user. Examples include Bachelor’s and MBA.
- interest_tags: A comma-separated list representing up to three key interests of the user. Common interests include Fitness, Anime, and Yoga.
- app_usage_time_min: The daily time a user spends on the app, measured in minutes. This numerical column ranges from 0 to 300 minutes, with a mean of 150 minutes.
- app_usage_time_label: A categorical label describing the user's daily app usage intensity. Examples include Extreme User and High.
- swipe_right_ratio: The ratio of right swipes (indicating interest) to the total number of swipes made by a user. This numerical column ranges from 0 to 1, with a mean of 0.5.
- swipe_right_label: A categorical description of a user's swipe behaviour. Examples include Optimistic and Balanced.
- likes_received: The total number of likes a user has received. This numerical column ranges from 0 to 200, with a mean of 99.5.
- mutual_matches: The number of mutual matches a user has achieved. This numerical column ranges from 0 to 30, with a mean of 13.9.
- profile_pics_count: The number of profile pictures uploaded by a user. This numerical column ranges from 0 to 6, with a mean of 2.99.
- bio_length: The character length of the user’s profile biography. This numerical column ranges from 0 to 500, with a mean of 250.
- message_sent_count: The total number of messages sent by a user. This numerical column ranges from 0 to 100, with a mean of 50.1.
- emoji_usage_rate: The rate at which a user incorporates emojis into their messages. This numerical column ranges from 0 to 0.94, with a mean of 0.29.
- last_active_hour: The hour (0-23) when the user was last active on the app. This numerical column ranges from 0 to 23, with a mean of 11.5.
- swipe_time_of_day: A categorical label indicating the time of day when a user typically swipes. Examples include After Midnight and Afternoon.
- match_outcome: The outcome of a match interaction. Examples include One-sided Like, Instant Match, Mutual Match, Ghosted, and Catfished.
Distribution
The dataset is provided as a CSV file (dating_app_behavior_dataset.csv) and totals 7.59 MB. It contains 50,000 records, each featuring 19 distinct attributes. The data structure incorporates categorical, numerical, and labelled variables, ensuring its utility for a broad spectrum of analytical methods.
Usage
This dataset is well-suited for several applications:
- Exploratory Data Analysis (EDA): Investigating correlations between user demographics, app usage patterns, and the success rates of matches.
- Machine Learning: Developing predictive models for match outcomes or forecasting user engagement levels.
- Social Studies Research: Analysing behavioural trends within online dating platforms across various demographic groups.
- Feature Engineering Practice: Experimenting with different techniques for transforming both categorical and numerical data.
Coverage
As a synthetic dataset, its geographic scope is not limited, with user
location_type
being a key feature. The dataset is labelled "2025," suggesting it simulates current or future user behaviour, with an expected annual update frequency. Demographically, it captures a variety of user characteristics, including diverse gender identities, sexual orientations, income brackets, and education levels, ensuring a balanced representation for analysis.License
CC0: Public Domain
Who Can Use It
This dataset is valuable for:
- Data Analysts: To conduct in-depth exploratory analysis of dating app dynamics.
- Machine Learning Engineers: To train and evaluate models for predicting user behaviour and match success.
- Social Scientists: To study and understand patterns in online dating behaviour.
- Students and Researchers: For learning and practicing data science techniques, including feature engineering.
Dataset Name Suggestions
- Dating App User Behaviour Dynamics
- Synthetic Online Dating Interactions
- Digital Romance Trends Dataset
- Match Predictive Analytics Data
- User Engagement in Dating Apps
Attributes
Original Data Source: Digital Romance Trends Dataset