Opendatabay APP

Subreddit Recipes Text Analysis

Data Science and Analytics

Tags and Keywords

Recipes

Reddit

Cooking

Food

Text

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Subreddit Recipes Text Analysis Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Explore over 2,500 recipes sourced from Reddit's popular "recipes" subreddit. This text-based dataset captures a wide range of culinary discussions, instructions, and user-shared knowledge from a community of over 175 million users. Each row represents a single recipe post, offering detailed information such as the recipe content, post title, publication date, and user engagement metrics. The dataset is ideal for text mining and Natural Language Processing (NLP) applications, enabling analysis of popular ingredients, cuisines, and recipe types shared by Reddit users. Note that not every entry is a direct recipe; some may be discussions or comments about recipes.

Columns

  • date: The date the recipe or comment was published, in YYYY-MM-DD format.
  • num_comments: The total number of comments on the post.
  • title: The title of the Reddit post containing the recipe.
  • user: The Reddit nickname of the person who published the post.
  • comment: The full text content of the recipe or discussion posted in the comment.
  • n_char: The total number of characters in the comment text.

Distribution

  • Format: CSV
  • Size: 1.85 MB
  • Structure: The data consists of 6 columns. The number of rows is not explicitly stated but is over 2,500.

Usage

This dataset is well-suited for various text mining and NLP tasks. It can be used to:
  • Analyse trends in home cooking, such as the most popular ingredients or cuisines (e.g., Italian, Chinese).
  • Categorise recipes based on meal type (breakfast, dinner, dessert).
  • Develop models to extract structured recipe information from unstructured text.
  • Explore user engagement patterns related to different types of recipes.

Coverage

  • Geographic: The data is sourced from Reddit, a global platform, so geographic coverage is international and not specific to any one region.
  • Time Range: The provided sample covers the period from 19 August 2020 to 23 February 2021. The dataset is expected to be updated monthly.
  • Demographic: The data reflects the user base of the "recipes" subreddit, which is broad and diverse.

License

CC0: Public Domain

Who Can Use It

  • Data Scientists: For NLP projects, sentiment analysis, and topic modelling on food and cooking trends.
  • Food Bloggers and Marketers: To identify popular recipes and ingredients for content creation and market research.
  • Culinary Researchers: To study online food communities and the dissemination of culinary knowledge.
  • App Developers: To build applications that suggest recipes based on user preferences or available ingredients.

Dataset Name Suggestions

  • Reddit Recipe Collection
  • Subreddit Recipes Text Analysis
  • Reddit Culinary Conversations
  • Crowdsourced Recipe Dataset

Attributes

Original Data Source: Subreddit Recipes Text Analysis

Listing Stats

VIEWS

3

DOWNLOADS

0

LISTED

03/10/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in ZIP Format