Opendatabay APP

Customer Service Intelligent Training Data

Data Science and Analytics

Tags and Keywords

Nlp

Support

Helpdesk

Classification

Automation

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Customer Service Intelligent Training Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Supercharge your customer support AI with high-quality GitHub issues, meticulously selected by sophisticated algorithms, including GPT-4o-mini, to resemble authentic customer support tickets. This dataset allows for the training of predictive models and context-aware bots by leveraging real-world interaction data sourced from active repositories. By differentiating high-value issues from general noise, it serves as a critical resource for developing NLP models, classification systems, and automated response engines that can anticipate customer questions and prioritise tasks effectively.

Columns

  • repo_name: The GitHub Repository User/Repo (e.g., godotengine/godot).
  • title: The title of the issue.
  • body: The body text of the Issue, asking a question or describing a problem.
  • user_login: The Author's username.
  • created_at: ISO Issue Creation Timestamp.
  • answer_1: The first Answer/Comment.
  • answer_2: The second Answer/Comment.
  • answer_3: The third Answer/Comment.
  • answer_4: The fourth Answer/Comment.
  • answer_5: The fifth Answer/Comment.
  • label_1: First Label Name (e.g., bug).
  • label_2: Second Label Name.
  • label_3: Third Label Name.
  • label_4: Fourth Label Name.
  • label_5: Fifth Label Name.
  • likes: Number of plus 1 (+1) reactions.

Distribution

The dataset contains approximately 16,000 valid rows and is provided in CSV format (a_github_issues_overview_dataset.csv). The structure focuses on text-heavy columns (titles, bodies, and answers) supported by metadata such as timestamps, labels, and reaction counts to facilitate multi-modal analysis.

Usage

  • Text Classification: Train machine learning models to accurately classify content into appropriate departments for efficient ticket routing.
  • Priority Prediction: Develop algorithms to predict the urgency of tickets based on labels, milestones, and reactions, ensuring critical issues are addressed promptly.
  • Customer Support Analysis: Analyse the dataset to gain insights into common issues and optimise support processes.
  • Text-to-Text Generation: Develop Large Language Models (LLMs) to generate context-aware answers to GitHub Issues and customer tickets.
  • Context-Aware Responses: Equip AI with real interaction data to generate precise, relevant answers.

Coverage

  • Time Range: The data covers issues created between 05 December 2010 and 18 August 2024.
  • Geographic Scope: Global.
  • Content: Sourced from active GitHub repositories, including 'godotengine/godot' and 'material-components/material-components-web', ensuring the data remains current and relevant.

License

CC BY-NC-SA 4.0

Who Can Use It

  • AI Developers: Perfect for building support bots that predict and respond to issues.
  • Data Scientists: Ideal for training models in NLP, classification, and predictive analytics.
  • Product Managers: Useful to streamline customer interactions and enhance user satisfaction.

Dataset Name Suggestions

  • AI-Curated GitHub Support Tickets
  • Customer Service Intelligent Training Data
  • Predictive Helpdesk Issue Dataset
  • GPT-Selected Technical Support Queries
  • Automated Ticket Classification Archive

Attributes

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

08/12/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in ZIP Format