Sephora Product Feedback Analysis Data
Product Reviews & Feedback
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset offers a rich collection of information on skincare products and user reviews from the Sephora online store. Collected in March 2023 via a Python scraper, it includes details for over 8,000 beauty products and approximately 1 million user reviews across more than 2,000 products, specifically within the Skincare category. The product data encompasses names, brands, prices, ingredients, ratings, and various features, while review data includes user demographics, review ratings, and the review text itself. This makes it an excellent resource for understanding consumer behaviour and product performance in the beauty sector.
Columns
The dataset is structured into two main parts: Product data and Reviews data.
Product Data Content:
- product_id: The unique identifier for the product.
- product_name: The full name of the product.
- brand_id: The unique identifier for the product's brand.
- brand_name: The full name of the product's brand.
- loves_count: The number of users who marked this product as a favourite.
- rating: The average rating of the product based on user reviews, on a scale of 1 to 5.
- reviews: The total number of user reviews for the product.
- size: The product's size, specified in units like oz, ml, g, or packs.
- variation_type: The type of variation parameter for the product (e.g., Size, Colour).
- variation_value: The specific value of the variation parameter (e.g., 100 mL, Golden Sand).
- variation_desc: A description of the variation parameter (e.g., tone for fairest skin).
- ingredients: A list of ingredients included in the product.
- price_usd: The product's price in US dollars.
- value_price_usd: The potential cost savings of the product, as displayed on the site.
- sale_price_usd: The sale price of the product in US dollars.
- limited_edition: Indicates if the product is a limited edition (1 for true, 0 for false).
- new: Indicates if the product is new (1 for true, 0 for false).
- online_only: Indicates if the product is exclusively sold online (1 for true, 0 for false).
- out_of_stock: Indicates if the product is currently out of stock (1 for true, 0 for false).
- sephora_exclusive: Indicates if the product is exclusive to Sephora (1 for true, 0 for false).
- highlights: A list of tags or features highlighting the product's attributes.
- primary_category: The first category in the product's breadcrumb navigation.
- secondary_category: The second category in the product's breadcrumb navigation.
- tertiary_category: The third category in the product's breadcrumb navigation.
- child_count: The number of variations available for the product.
- child_max_price: The highest price among all variations of the product.
- child_min_price: The lowest price among all variations of the product.
Reviews Data Content:
- author_id: The unique identifier for the review author.
- rating: The rating given by the author for the product, on a scale of 1 to 5.
- is_recommended: Indicates if the author recommends the product (1 for true, 0 for false).
- helpfulness: The ratio of positive feedback to total feedback received on the review.
- total_feedback_count: The total number of feedback ratings (positive and negative) for the review.
- total_neg_feedback_count: The number of users who gave a negative rating for the review.
- total_pos_feedback_count: The number of users who gave a positive rating for the review.
- submission_time: The date the review was posted, in 'yyyy-mm-dd' format.
- review_text: The main body of the review written by the author.
- review_title: The title of the review written by the author.
- skin_tone: The author's reported skin tone (e.g., fair, tan).
- eye_color: The author's reported eye colour (e.g., brown, green).
- skin_type: The author's reported skin type (e.g., combination, oily).
- hair_color: The author's reported hair colour (e.g., brown, auburn).
- product_id: The unique identifier for the product associated with the review.
Distribution
The dataset typically comes in CSV format, with product information provided in
product_info.csv
. It includes 8,494 unique product entries and approximately 1 million user reviews. While most core product fields are fully populated, some columns, such as value_price_usd
, sale_price_usd
, and variation_desc
, have a significant percentage of missing values, ranging from around 11% to 97%. The product review-related columns like rating
and reviews
have a small percentage (3%) of missing data.Usage
This dataset is ideally suited for various analytical and machine learning applications:
- Exploratory Data Analysis (EDA): Investigate product categories, pricing structures, brand popularity, and ingredient trends.
- Sentiment Analysis: Determine the emotional tone of reviews (positive, negative, neutral) and identify brands or products with specific sentiment patterns.
- Text Analysis: Discover common themes, problems, or praises within customer reviews to gain insights into product performance and customer satisfaction.
- Recommender Systems: Develop systems that suggest products to users based on their past purchase history, review behaviour, and product characteristics.
- Data Visualisation: Create visualisations to highlight popular brands and products, price distributions, product ingredient similarities, and frequently used words in reviews.
Coverage
The data was collected in March 2023 from the Sephora online store. It focuses on skincare products and user reviews. While not explicitly stated, the presence of
skin_tone
, eye_color
, skin_type
, and hair_color
in the review data suggests a demographic dimension, reflecting the attributes of users contributing reviews. The data is global in nature as it is from an international online store, but prices are specified in US dollars.License
Attribution 4.0 International (CC BY 4.0)
Who Can Use It
This dataset is valuable for:
- Data Analysts and Data Scientists: For performing in-depth market analysis, identifying trends, and building predictive models related to beauty products.
- Machine Learning Engineers: For developing and training recommendation engines, sentiment analysis models, and natural language processing (NLP) applications on consumer reviews.
- Researchers: Those studying consumer behaviour, e-commerce trends, and product development in the cosmetics and skincare industry.
- E-commerce Businesses: For market research, competitive analysis, and understanding customer feedback to inform product strategy and marketing efforts.
Dataset Name Suggestions
- Sephora Skincare Product & User Reviews
- Beauty Product Consumer Insights
- E-commerce Skincare Ratings Dataset
- Sephora Product Feedback Analysis Data
- Cosmetics Product and Review Intelligence
Attributes
Original Data Source: Sephora Product Feedback Analysis Data