Ethical Dialogue Dataset
Education & Learning Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
ProsocialDialog is a large-scale, multi-turn English dialogue dataset designed to teach conversational agents how to respond to problematic content in line with social norms. It addresses a variety of unethical, biased, toxic, and generally problematic situations. The dataset is notable for its focus on encouraging prosocial behaviour, which is guided by commonsense social rules, referred to as Rules-of-Thumb (RoTs). Developed through a human-AI collaborative framework, the dataset consists of 58,000 dialogues, comprising 331,000 utterances, 160,000 unique RoTs, and 497,000 dialogue safety labels, each accompanied by free-form rationales. The
test.csv
file within the ProsocialDialog dataset contains data specifically for evaluating the accuracy of a model in predicting conversation safety.Columns
The dataset includes the following columns:
- context: The context of the conversation. (String)
- response: The response to the conversation. (String)
- rots: Rules of thumb associated with the conversation. (String)
- safety_label: The safety label associated with the conversation. (String)
- safety_annotations: Annotations associated with the conversation. (String)
- safety_annotation_reasons: Reasons for the safety annotations. (String)
- source: The source of the conversation. (String)
- etc: Any additional information associated with the conversation. (String)
- dialogue_id: Unique identifier for each dialogue.
- response_id: Unique identifier for each response.
Distribution
The dataset is typically provided in a CSV file format, such as
test.csv
. It contains 58,000 dialogues, encompassing 331,000 utterances. There are 24,972 unique dialogue IDs and 24,903 unique response IDs. The dataset includes 160,000 unique Rules-of-Thumb (RoTs) and 497,000 dialogue safety labels. Specific numbers for rows or records beyond these counts are not provided in the sources.Usage
This dataset is ideally suited for several applications:
- Designing Conversational Agents: It can be used to build Natural Language Processing (NLP) models capable of recognising and classifying problematic content. The safety labels, rationales, and RoTs can train conversational agents to respond in socially acceptable ways.
- Benchmark Systems: ProsocialDialog serves as an effective benchmark for evaluating the performance of existing conversation datasets in identifying, responding to, and preventing problematic content interactions.
- Automated Moderation: The dialogue safety labels and their associated free-form rationales are valuable for technology platforms implementing automated moderation tasks, such as flagging or banning offensive messages or users.
Coverage
The ProsocialDialog dataset is in English and has a global regional coverage. It addresses general conversational scenarios involving social norms and problematic content, but specific demographic scope details or the precise time range of data collection are not explicitly outlined in the sources. The dataset was listed on 11/06/2025.
License
CCO
Who Can Use It
This dataset is beneficial for a range of users, including:
- Researchers and Developers in AI and Machine Learning: Particularly those focused on Natural Language Processing (NLP) and building sophisticated conversational AI systems.
- Organisations and Platforms: Especially those in need of automated moderation tools or aiming to ensure their conversational agents adhere to social norms and promote prosocial behaviour.
- Academics and Students: Engaged in studying dialogue safety, social psychology, or ethical AI, who can explore the safety labels, annotations, RoTs, and data sources to gain deeper insights into human conversation dynamics.
Dataset Name Suggestions
- ProsocialDialog - Problematic Content Dialogue
- Conversational Safety Norms
- Ethical Dialogue Dataset
- Social Norms AI Conversations
- Harmful Content Dialogue Dataset
Attributes
Original Data Source: ProsocialDialog - Problematic Content Dialogue