Opendatabay APP

Recipe Ingredient Entity Recognition Dataset

Food & Beverage Consumption

Tags and Keywords

Food

Text

Nlp

Transformers

Nltk

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Recipe Ingredient Entity Recognition Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset provides recipe ingredients with token-level annotations, originally sourced from the research paper "A Named Entity Based Approach to Model Recipes" by Diwan, Batra, and Bagler. It is designed to facilitate the training of Named Entity Recognition (NER) models capable of extracting key entities such as ingredient names, quantities, and units from recipe text. The data was obtained from the authors' GitHub repository, offering a structured resource for advanced natural language processing in the culinary domain.

Columns

  • source: Indicates the origin of the ingredient description, either AllRecipes.com (ar) or FOOD.com (gk).
  • ingredient_id: A unique identifier for each ingredient within its respective source.
  • token_id: A numerical identifier representing the position of a token within an ingredient sequence.
  • token: The individual token from the ingredient description, serving as an input feature for models.
  • label: The predicted tag for the type of entity the token represents, acting as the target variable. Possible labels include:
    • NAME: The name of an ingredient (e.g., salt, pepper).
    • STATE: The processing state of an ingredient (e.g., ground, thawed).
    • UNIT: Measuring unit(s) (e.g., gram, cup).
    • QUANTITY: The numerical quantity associated with unit(s) (e.g., 1, 1 1/2, 2-4).
    • SIZE: Mentioned portion sizes (e.g., small, large).
    • TEMP: Temperature applied prior to cooking (e.g., hot, frozen).
    • DRY/FRESH: Indicates whether the ingredient is dry, fresh, or otherwise specified.

Distribution

The dataset is primarily composed of data from FOOD.com (gk), accounting for 78% of the content, with the remaining 22% originating from AllRecipes.com (ar). While specific row or record counts are not provided, the dataset is structured for training purposes, with token-level annotations. Data files are typically in CSV format.

Usage

This dataset is ideally suited for training and evaluating Named Entity Recognition (NER) models. It can be applied to extract specific entities from recipe ingredient descriptions, such as:
  • Identifying ingredient names.
  • Parsing quantities and their corresponding units.
  • Recognising processing states, temperatures, and other descriptive attributes of ingredients. It is valuable for knowledge mining in the food and beverage sector and for developing intelligent systems that understand recipe structures.

Coverage

The dataset's coverage is global, without specific geographical limitations mentioned for the ingredients themselves. The listed date for the dataset is 17/06/2025, which appears to be a listing date. The content is derived from two prominent recipe websites, AllRecipes.com and FOOD.com, providing a broad range of ingredient descriptions.

License

CC0

Who Can Use It

This dataset is intended for researchers, data scientists, and developers working in fields such as:
  • Natural Language Processing (NLP).
  • Machine Learning (ML) and Artificial Intelligence (AI).
  • Food science and culinary informatics.
  • Those building applications for recipe analysis, smart kitchens, or dietary planning requiring structured ingredient data.

Dataset Name Suggestions

  • Recipe Ingredient Entity Recognition Dataset
  • Culinary NER Tokenisation Data
  • Annotated Recipe Ingredients for AI
  • Food Ingredient Parsing Dataset

Attributes

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

17/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free