Dark Mode

Home

Data Categories

AI & ML Data

Dolly 15K AI Chat Data

FREE DATASET LIBRARY

Verified Data Provider

£0

Dolly 15K AI Chat Data

Telecommunications & Network Data

Tags and Keywords

Nlp

Text

Signal

Data

Chatbot

Dialogue

Llm

Trusted By

Dolly 15K AI Chat Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset provides over 15,000 language models and dialogues designed to power dynamic ChatGPT applications. It was created by Databricks employees, aiming to facilitate the use of large language models (LLMs) for interactive dialogue interactions. The dataset generates prompt-response pairs across eight distinct instruction categories and deliberately avoids information from external web sources, with the exception of Wikipedia for specific instruction sets. This open-source resource is ideal for exploring the boundaries of text-based conversations and uncovering new insights into natural language processing.

Columns

Instruction (Text): This field contains the text prompt intended to generate an appropriate response from a machine learning model or chatbot, utilising natural language processing techniques. It represents what one individual says in a conversation.
Context (Text): Providing additional information, the context field enhances accuracy by offering the model more detail about the ongoing conversation or request execution. Like the instruction, it captures what is said by one individual.
Response (Text): This column holds the conversational reply or what is said back by the other individual in the dialogue.
Category (Text): Each prompt-response pair is classified into one of eight distinct categories based on its content. Examples of unique category values include 'open_qa' and 'general_qa'.

Distribution

The dataset is typically provided as a data file, usually in CSV format. It contains over 15,000 language models and dialogues, with the main train.csv file consisting of this quantity of records. Each record within the dataset represents a unique prompt-response pair, or a single turn in a conversation between two individuals. The columns are all of a string data type.

Usage

This dataset is suited for a variety of applications and use cases:

Training dialogue systems by developing multiple funneling pipelines to enrich models with real-world conversations.
Creating intelligent chatbot interactions.
Generating natural language answers as part of Q&A systems.
Utilising excerpts from Wikipedia for particular subsets of instruction categories.
Leveraging the classification labels with supervised learning techniques, such as multi-class classification neural networks or logistic regression classifiers.
Developing deep learning models to detect and respond to conversational intent.
Training language models for customer service queries using natural language processing (NLP).
Creating custom dialogue agents capable of handling more intricate conversational interactions.

Coverage

The dataset has a global reach. It was listed on 17/06/2025, and its content focuses on general conversational and Q&A interactions, without specific demographic limitations.

License

CC0

Who Can Use It

This dataset is valuable for a wide range of users, including AI/ML developers, researchers, and data scientists looking to:

Build and train conversational AI models.
Develop advanced chatbot applications.
Explore new insights in natural language processing.
Create bespoke dialogue agents for various sectors, such as customer service.
Apply supervised learning to classify conversational data.

Dataset Name Suggestions

Databricks Dolly (15K) Dialogue Data
LLM Training Conversation Dataset
Dolly 15K AI Chat Data
Prompt-Response Pairs for LLMs

Attributes

Original Data Source: Databricks Dolly (15K)

Listing Stats

VIEWS

DOWNLOADS

LISTED

17/06/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

FREE DATASET LIBRARY

£0

Dolly 15K AI Chat Data

Telecommunications & Network Data

Tags and Keywords

Nlp

Text

Signal

Data

Chatbot

Dialogue

Llm

Trusted By

Free

About

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Listing Stats

Free

Download Dataset in CSV Format

RECOMMENDED DATASETS