Opendatabay APP

Annotated Chatbot Response Dataset

Data Science and Analytics

Tags and Keywords

Text

Nlp

Hotels

And

Accommodations

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Annotated Chatbot Response Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset provides a valuable resource for evaluating and improving chatbot performance. It contains question and answering data for a knowledge-based chatbot designed for an imaginary hotel, the Montreal Hotel and Suites. Human annotators have assessed the quality of responses generated by various chatbot engines, including Botpress' OpenBook, Rasa, Google Dialogflow, and IBM Watson Assistant, given a specific knowledge base. The dataset aims to facilitate the assessment of conversational AI quality.

Columns

The dataset includes the following columns:
  • qid: A unique identifier for each question.
  • question: The text of the question posed to the chatbot.
  • related_facts: Facts from the knowledge base pertinent to the question's response.
  • answer_in_fact: A boolean indicator specifying if the answer was present in the provided facts.
  • engine: The name of the chatbot engine that provided the response.
  • engine_response: The actual response generated by the chatbot engine.
  • p1: A boolean flag indicating if the response contained excessive information.
  • p2: A boolean flag indicating if the response included unrelated information.
  • p3: A boolean flag indicating if the response contained falsehoods.
  • p4: A boolean flag indicating if the response was incorrect.
  • p5: A boolean flag indicating if the response was partially correct.
  • p6: A boolean flag indicating if the response was fully correct.
  • p7: A boolean flag indicating if the engine failed to provide an answer when it should have.
  • p8: A boolean flag indicating if the engine responded by stating it did not understand the question.
  • p9: A boolean flag indicating if the engine correctly stated it did not know the answer.
  • p10: A boolean flag indicating if the question was deemed invalid.
  • best: A boolean flag indicating if the response was considered one of the best among the four engines for that question.
  • worst: A boolean flag indicating if the response was considered one of the worst among the four engines for that question.
  • annotation_round: The round number (1 or 2) in which the annotation was performed.

Distribution

The dataset is typically provided in a CSV file format, specifically named BP_MHS_V1.csv. It features over 5000 unique questions, with responses from each chatbot engine recorded. These responses were annotated across 12 quality parameters in two distinct rounds. For instance, the qid column contains 5006 unique values, and the question column has 5000 unique values. Analysis of boolean flags shows distributions such as approximately 81% of entries having an answer available in the fact, and about 53% of responses being incorrect (p4). Conversely, only about 1% of responses contained falsehoods (p3).

Usage

This dataset is ideally suited for various applications, including:
  • Data science and analytics projects focused on natural language processing and chatbot evaluation.
  • Assessing and benchmarking the quality of different chatbot engines.
  • Developing and testing new algorithms for conversational AI.
  • Researching aspects of chatbot performance such as correctness, relevance, and information accuracy.

Coverage

The dataset's scope is centred around chatbot interactions for a hotel environment, specifically an imaginary hotel located in Montreal. The data includes performance metrics and quality annotations collected over two annotation rounds. There are no specific geographic or demographic limitations beyond the hotel-centric context, and the temporal coverage is limited to the period of the two annotation rounds.

License

CC-BY-NC

Who Can Use It

This dataset is particularly useful for:
  • Data Scientists: For building and refining models to evaluate conversational AI.
  • Natural Language Processing (NLP) Researchers: For studying chatbot effectiveness and improving response generation.
  • Chatbot Developers: For benchmarking their creations against established performance metrics.
  • AI/ML Practitioners: Seeking real-world annotated data for AI system quality assessment.

Dataset Name Suggestions

  • Chatbot Quality Assessment Data - Hotel Q&A
  • Montreal Hotel Conversational AI Performance
  • Annotated Chatbot Response Dataset
  • Botpress OpenBook Chatbot Evaluation
  • AI Chatbot Q&A Benchmarking Data

Attributes

Listing Stats

VIEWS

2

DOWNLOADS

0

LISTED

27/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free