Annotated Chatbot Response Dataset
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides a valuable resource for evaluating and improving chatbot performance. It contains question and answering data for a knowledge-based chatbot designed for an imaginary hotel, the Montreal Hotel and Suites. Human annotators have assessed the quality of responses generated by various chatbot engines, including Botpress' OpenBook, Rasa, Google Dialogflow, and IBM Watson Assistant, given a specific knowledge base. The dataset aims to facilitate the assessment of conversational AI quality.
Columns
The dataset includes the following columns:
- qid: A unique identifier for each question.
- question: The text of the question posed to the chatbot.
- related_facts: Facts from the knowledge base pertinent to the question's response.
- answer_in_fact: A boolean indicator specifying if the answer was present in the provided facts.
- engine: The name of the chatbot engine that provided the response.
- engine_response: The actual response generated by the chatbot engine.
- p1: A boolean flag indicating if the response contained excessive information.
- p2: A boolean flag indicating if the response included unrelated information.
- p3: A boolean flag indicating if the response contained falsehoods.
- p4: A boolean flag indicating if the response was incorrect.
- p5: A boolean flag indicating if the response was partially correct.
- p6: A boolean flag indicating if the response was fully correct.
- p7: A boolean flag indicating if the engine failed to provide an answer when it should have.
- p8: A boolean flag indicating if the engine responded by stating it did not understand the question.
- p9: A boolean flag indicating if the engine correctly stated it did not know the answer.
- p10: A boolean flag indicating if the question was deemed invalid.
- best: A boolean flag indicating if the response was considered one of the best among the four engines for that question.
- worst: A boolean flag indicating if the response was considered one of the worst among the four engines for that question.
- annotation_round: The round number (1 or 2) in which the annotation was performed.
Distribution
The dataset is typically provided in a CSV file format, specifically named
BP_MHS_V1.csv
. It features over 5000 unique questions, with responses from each chatbot engine recorded. These responses were annotated across 12 quality parameters in two distinct rounds. For instance, the qid
column contains 5006 unique values, and the question
column has 5000 unique values. Analysis of boolean flags shows distributions such as approximately 81% of entries having an answer available in the fact, and about 53% of responses being incorrect (p4). Conversely, only about 1% of responses contained falsehoods (p3).Usage
This dataset is ideally suited for various applications, including:
- Data science and analytics projects focused on natural language processing and chatbot evaluation.
- Assessing and benchmarking the quality of different chatbot engines.
- Developing and testing new algorithms for conversational AI.
- Researching aspects of chatbot performance such as correctness, relevance, and information accuracy.
Coverage
The dataset's scope is centred around chatbot interactions for a hotel environment, specifically an imaginary hotel located in Montreal. The data includes performance metrics and quality annotations collected over two annotation rounds. There are no specific geographic or demographic limitations beyond the hotel-centric context, and the temporal coverage is limited to the period of the two annotation rounds.
License
CC-BY-NC
Who Can Use It
This dataset is particularly useful for:
- Data Scientists: For building and refining models to evaluate conversational AI.
- Natural Language Processing (NLP) Researchers: For studying chatbot effectiveness and improving response generation.
- Chatbot Developers: For benchmarking their creations against established performance metrics.
- AI/ML Practitioners: Seeking real-world annotated data for AI system quality assessment.
Dataset Name Suggestions
- Chatbot Quality Assessment Data - Hotel Q&A
- Montreal Hotel Conversational AI Performance
- Annotated Chatbot Response Dataset
- Botpress OpenBook Chatbot Evaluation
- AI Chatbot Q&A Benchmarking Data
Attributes
Original Data Source: OpenBook by Botpress - Chatbot Q&A Data