Blended Skill Dialogue Repository Dataset
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides insights into conversations between two distinct personas, designed to facilitate the creation of natural, multi-modal dialogues imbued with personality, empathy, and knowledge. It captures detailed information on conversation flow and the use of various communication tools. The data supports the measurement of technical competencies such as dialog flow management, including response times, topic control, and coherence. It is also valuable for exploring the influence of different conversational styles on user engagement. Furthermore, the dataset aids in validating distributed dialogue systems across modalities and identifying potential biases in varied contexts. It serves as a basis for benchmarking against similar datasets, fostering the development of an automated evaluation system for tactical skill talk performance over time.
Columns
The dataset contains several columns for investigating conversations and their context:
- personas: This column holds information about the two personas involved in the conversation. (String)
- additional_context: This column includes additional contextual details pertinent to the conversation. (String)
- previous_utterance: This column captures the immediate prior statement from the conversation. (String)
- context: Provides contextual information relevant to the dialogue.
- free_messages: This column contains spontaneous messages that can be used to generate dynamic conversations. (String)
- guided_messages: This column features messages intended to guide discussions.
- suggestions: This column offers suggested messages that can be used to create dynamic conversations, subtly implying how certain actions could be taken when engaging digital avatars. (String)
- guided_chosen_suggestions: This column incorporates weights for user neutrality versus sentiment engagement, aiming to reduce bias and promote fairness in disputed topics.
- label_candidates: Identifies potential labels for segments of the conversation.
- empathetic_dialogues: Relates to dialogues exhibiting empathy.
- wizard_of_wikipedia: Refers to content potentially sourced or related to Wikipedia knowledge.
Distribution
The data files are typically in CSV format, with sample files updated separately to the platform. The dataset focuses on the structure of conversations between two personas, offering detailed insight into multi-modal communication. Specific numbers for rows or records are not available, though some columns show a high number of unique values, suggesting a substantial collection of conversational exchanges.
Usage
This dataset is ideally suited for:
- Building virtual assistants with advanced conversational and multi-modal capabilities.
- Creating interactive tutorials where content adapts dynamically to user input using personas, free messages, guided messages, and suggestions.
- Developing a deeper understanding of human interaction patterns.
- Evaluating technical competencies in dialogue systems, such as response times and topic coherence.
- Investigating the impact of different conversational styles on user engagement.
- Validating distributed dialogue systems across various communication modalities.
- Identifying and addressing potential biases present within different conversational contexts.
- Establishing benchmarks for automatic evaluation systems of conversational performance.
Coverage
The dataset's coverage is global. It was listed on 11/06/2025. The data encompasses conversations between two personas, without specific demographic notes beyond this structure.
License
CCO
Who Can Use It
This dataset is intended for:
- Data Scientists and Analysts: For conducting data science and analytics, particularly in areas like NLP and business applications.
- Developers of AI and Machine Learning systems: Especially those focused on conversational AI, virtual assistants, and dialogue systems.
- Researchers: Those studying human-computer interaction, bias in AI, natural language processing, and conversational dynamics.
- Educational Content Creators: Individuals or organisations aiming to develop adaptive and interactive tutorials.
Dataset Name Suggestions
- Multi-Modal Persona Conversation Dataset
- Blended Skill Dialogue Repository
- Conversational AI Competency Benchmarking Data
- Natural Dialogue Interaction Dataset
- Persona-Based Conversational Flows
Attributes
Original Data Source: Blended Skill Talk