Synthetic Consumer Lifestyle Predictor
Synthetic Data Generation
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
A synthetic dataset designed to facilitate the prediction of individual lifestyle categories based on a rich selection of personal, financial, and behavioral features. The data simulates nearly half a million profiles, making it an ideal resource for developing and testing machine learning algorithms within a controlled, risk-free environment. It serves as a valuable asset for research in data science, AI, and simulation studies concerning marketing and behavioral analysis. Due to its synthetic nature, careful generalization to real-world scenarios is advised.
Columns
The dataset contains 31 detailed features, including demographic, financial, and behavioral metrics:
- Gender: The gender identity of each individual (approximately 50% male and 50% female).
- First Name: The given name of the person.
- Last Name: The family name of the individual.
- City, State, Country: Geographic location features.
- Age: The age of the individual (ranging from 19 to 100).
- Annual Vacation Days: Number of vacation days available annually (mean 19.9 days).
- Average Monthly Spend on Entertainment: Typical monthly expenditure on entertainment.
- Number of Online Purchases in Last Month: Count of recent online purchases (mean 136).
- Number of Charity Donations in Last Year: Total count of donations.
- Average Weekly Exercise Hours: Weekly time spent on exercise (mean 0.49 hours).
- Investment Portfolio Value: The monetary value of the investment holdings.
- Health Consciousness Rating: A rating reflecting proactive health behavior.
- Education Level: Highest level of education attained.
- Average Daily Screen Time: Average daily time spent in front of screens.
- Environmental Awareness Rating: Measure of engagement with environmental issues.
- Social Media Influence Score: Score representing social media activity and reach.
- Risk Tolerance in Investments: Measure of willingness to accept investment risk.
- Number of Professional Trainings Attended: Count of professional training sessions completed.
- Tech-Savviness Score: Proficiency and comfort level with technology.
- Financial Wellness Index: Overall indicator of financial health.
- Lifestyle Balance Score: Assessment of balance across different life aspects.
- Entertainment Engagement Factor: Level of involvement in entertainment activities.
- Social Responsibility Index: Measure of involvement towards social issues.
- Work-Life Balance Indicator: Metric assessing professional and personal life balance.
- Investment Risk Appetite: Willingness to undertake investment risks.
- Eco-Consciousness Metric: Evaluation of ecological sustainability actions.
- Stress Management Score: Effectiveness of stress management.
- Time Management Skill: Skill level in managing time efficiently.
- Lifestyle Choice (Target Variable): Categorisation of the predominant lifestyle (e.g., Eco-Friendly, Adventure Seeker, Tech-Savvy). This column has 12 unique categories.
Distribution
The data file is typically supplied in CSV format, titled
user_data.csv, with a size of approximately 110.35 MB. It contains exactly 31 columns and features around 476,000 unique records or rows. This is a static snapshot with no expected future updates.Usage
This dataset is perfectly suited for several applications, including:
- Algorithm Training: Developing and fine-tuning machine learning models for multiclass classification tasks, particularly predicting lifestyle segmentation.
- Educational Purposes: Supporting academic research and learning initiatives in data science and artificial intelligence.
- Simulation Studies: Conducting theoretical modeling in fields such as marketing, health, and behavioral sciences without using actual sensitive personal data.
Coverage
The data provides geographic residency details, covering 21 unique countries. The United Kingdom and Ukraine are noted among the locations, with the majority of records falling into an "Other" category. The demographic scope spans individuals aged 19 up to 100, incorporating diverse financial and behavioral data points. All data is synthetically generated.
License
CC BY-SA 4.0
Who Can Use It
- Machine Learning Engineers: For building, training, and validating predictive models for user segmentation.
- Behavioral Researchers: To explore correlations between personal attributes (age, location, income features) and resultant lifestyle choices.
- Students and Educators: For practical application and coursework in data modelling and statistical analysis.
Dataset Name Suggestions
- Synthetic Consumer Lifestyle Predictor
- Half a Million Lifestyle Data
- Global Individual Profile Simulation
- Behavioral Feature Modelling Set
Attributes
Original Data Source: Synthetic Consumer Lifestyle Predictor
Loading...
