Dark Mode

Home

Data Categories

AI & ML Data

Customer Service Intelligent Training Data

FREE DATASET LIBRARY

Verified Data Provider

£0

Customer Service Intelligent Training Data

Data Science and Analytics

Tags and Keywords

Nlp

Support

Helpdesk

Classification

Automation

Trusted By

Customer Service Intelligent Training Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Supercharge your customer support AI with high-quality GitHub issues, meticulously selected by sophisticated algorithms, including GPT-4o-mini, to resemble authentic customer support tickets. This dataset allows for the training of predictive models and context-aware bots by leveraging real-world interaction data sourced from active repositories. By differentiating high-value issues from general noise, it serves as a critical resource for developing NLP models, classification systems, and automated response engines that can anticipate customer questions and prioritise tasks effectively.

Columns

repo_name: The GitHub Repository User/Repo (e.g., godotengine/godot).
title: The title of the issue.
body: The body text of the Issue, asking a question or describing a problem.
user_login: The Author's username.
created_at: ISO Issue Creation Timestamp.
answer_1: The first Answer/Comment.
answer_2: The second Answer/Comment.
answer_3: The third Answer/Comment.
answer_4: The fourth Answer/Comment.
answer_5: The fifth Answer/Comment.
label_1: First Label Name (e.g., bug).
label_2: Second Label Name.
label_3: Third Label Name.
label_4: Fourth Label Name.
label_5: Fifth Label Name.
likes: Number of plus 1 (+1) reactions.

Distribution

The dataset contains approximately 16,000 valid rows and is provided in CSV format (a_github_issues_overview_dataset.csv). The structure focuses on text-heavy columns (titles, bodies, and answers) supported by metadata such as timestamps, labels, and reaction counts to facilitate multi-modal analysis.

Usage

Text Classification: Train machine learning models to accurately classify content into appropriate departments for efficient ticket routing.
Priority Prediction: Develop algorithms to predict the urgency of tickets based on labels, milestones, and reactions, ensuring critical issues are addressed promptly.
Customer Support Analysis: Analyse the dataset to gain insights into common issues and optimise support processes.
Text-to-Text Generation: Develop Large Language Models (LLMs) to generate context-aware answers to GitHub Issues and customer tickets.
Context-Aware Responses: Equip AI with real interaction data to generate precise, relevant answers.

Coverage

Time Range: The data covers issues created between 05 December 2010 and 18 August 2024.
Geographic Scope: Global.
Content: Sourced from active GitHub repositories, including 'godotengine/godot' and 'material-components/material-components-web', ensuring the data remains current and relevant.

License

CC BY-NC-SA 4.0

Who Can Use It

AI Developers: Perfect for building support bots that predict and respond to issues.
Data Scientists: Ideal for training models in NLP, classification, and predictive analytics.
Product Managers: Useful to streamline customer interactions and enhance user satisfaction.

Dataset Name Suggestions

AI-Curated GitHub Support Tickets
Customer Service Intelligent Training Data
Predictive Helpdesk Issue Dataset
GPT-Selected Technical Support Queries
Automated Ticket Classification Archive

Attributes

Original Data Source: Customer Service Intelligent Training Data

Listing Stats

VIEWS

DOWNLOADS

LISTED

08/12/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

FREE DATASET LIBRARY

£0

Customer Service Intelligent Training Data

Data Science and Analytics

Tags and Keywords

Nlp

Support

Helpdesk

Classification

Automation

Trusted By

Free

About

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Listing Stats

Free

Download Dataset in ZIP Format

RECOMMENDED DATASETS