Opendatabay APP

Dialogue-based Question Dataset

Data Science and Analytics

Tags and Keywords

Earth

Nlp

Deep

Text

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Dialogue-based Question Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset, known as HQA-data, is a Question Answer Generation dataset created from historical multi-perspective conversations. It is built upon "The Ubuntu Dialog Corpus", where user chats are organised and merged by a unique dialogue identifier (dialogueID) to form a context. Questions and their corresponding answers are then derived from these contexts, with the starting and ending positions of each answer also identified within the text. It serves as a valuable resource for data science and analytics, and is freely available from a verified data provider on the Opendatabay platform.

Columns

  • dialogueID: A unique identifier for each chat room or conversation thread.
  • Context: The merged text from a single dialogueID, forming the conversational background for question and answer pairs.
  • QuestionID: A unique identifier for each question generated.
  • Question: The question generated from the context.
  • Answer: The answer extracted from the context for the given question.
  • Answer Start: The character offset indicating where the answer begins within the Context.
  • Answer End: The character offset indicating where the answer ends within the Context.

Distribution

The dataset is available in two formats: Comma Separated Values (CSV) and JSON-formatted data. The training set contains 7323 contexts and 29150 question-answer pairs, while the test set includes 2041 contexts and 7288 question-answer pairs. In total, the dataset features 9364 contexts and 36438 question-answer pairs.

Usage

This dataset is ideal for training and evaluating models in various Natural Language Processing (NLP) tasks. Specific use cases include developing question answering systems, enabling text generation, enhancing text conversation models, and facilitating chat analysis and text extraction. It can be applied in deep learning and machine learning research and development.

Coverage

The dataset is global in its intended region of use. While it is described as "historical", specific dates or a precise time range for the conversations are not detailed in the provided information. The data originates from user chat logs, but no specific demographic details beyond "users" are provided.

License

CC0

Who Can Use It

The dataset is suitable for data scientists, researchers, and developers working on projects related to Natural Language Processing (NLP), Deep Learning, and Artificial Intelligence (AI). It can be used by those aiming to build or improve conversational agents, question-answering systems, or text summarisation tools. It is also relevant for academic studies in linguistics and human-computer interaction based on textual dialogues.

Dataset Name Suggestions

  • Conversational QA Dataset
  • Ubuntu Dialog Question Answering
  • Multi-Perspective Chat Log QA
  • Historical Chat QA Pairs
  • Dialogue-based Question Dataset

Attributes

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

17/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free