Customer Service Intelligent Training Data
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Supercharge your customer support AI with high-quality GitHub issues, meticulously selected by sophisticated algorithms, including GPT-4o-mini, to resemble authentic customer support tickets. This dataset allows for the training of predictive models and context-aware bots by leveraging real-world interaction data sourced from active repositories. By differentiating high-value issues from general noise, it serves as a critical resource for developing NLP models, classification systems, and automated response engines that can anticipate customer questions and prioritise tasks effectively.
Columns
- repo_name: The GitHub Repository User/Repo (e.g., godotengine/godot).
- title: The title of the issue.
- body: The body text of the Issue, asking a question or describing a problem.
- user_login: The Author's username.
- created_at: ISO Issue Creation Timestamp.
- answer_1: The first Answer/Comment.
- answer_2: The second Answer/Comment.
- answer_3: The third Answer/Comment.
- answer_4: The fourth Answer/Comment.
- answer_5: The fifth Answer/Comment.
- label_1: First Label Name (e.g., bug).
- label_2: Second Label Name.
- label_3: Third Label Name.
- label_4: Fourth Label Name.
- label_5: Fifth Label Name.
- likes: Number of plus 1 (+1) reactions.
Distribution
The dataset contains approximately 16,000 valid rows and is provided in CSV format (a_github_issues_overview_dataset.csv). The structure focuses on text-heavy columns (titles, bodies, and answers) supported by metadata such as timestamps, labels, and reaction counts to facilitate multi-modal analysis.
Usage
- Text Classification: Train machine learning models to accurately classify content into appropriate departments for efficient ticket routing.
- Priority Prediction: Develop algorithms to predict the urgency of tickets based on labels, milestones, and reactions, ensuring critical issues are addressed promptly.
- Customer Support Analysis: Analyse the dataset to gain insights into common issues and optimise support processes.
- Text-to-Text Generation: Develop Large Language Models (LLMs) to generate context-aware answers to GitHub Issues and customer tickets.
- Context-Aware Responses: Equip AI with real interaction data to generate precise, relevant answers.
Coverage
- Time Range: The data covers issues created between 05 December 2010 and 18 August 2024.
- Geographic Scope: Global.
- Content: Sourced from active GitHub repositories, including 'godotengine/godot' and 'material-components/material-components-web', ensuring the data remains current and relevant.
License
CC BY-NC-SA 4.0
Who Can Use It
- AI Developers: Perfect for building support bots that predict and respond to issues.
- Data Scientists: Ideal for training models in NLP, classification, and predictive analytics.
- Product Managers: Useful to streamline customer interactions and enhance user satisfaction.
Dataset Name Suggestions
- AI-Curated GitHub Support Tickets
- Customer Service Intelligent Training Data
- Predictive Helpdesk Issue Dataset
- GPT-Selected Technical Support Queries
- Automated Ticket Classification Archive
Attributes
Original Data Source: Customer Service Intelligent Training Data
Loading...
