AG News Articles Classification
Entertainment & Media Consumption
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides a new opportunity for text classification research, well-suited for various methods in this field. It is a large, well-balanced collection of news articles, designed to facilitate studies in categorising articles, identifying sentiment, and analysing how different media outlets report news.
Columns
- text: The actual content of the news article, provided as a string.
- label: An integer representing the category or classification of the news article.
Distribution
The dataset comprises a training set of 10,000 examples and a test set of 5,000 examples. The data is balanced, with approximately 1,900 unique values for each of the following label ranges: 0.00-0.30, 0.90-1.20, 1.80-2.10, and 2.70-3.00. The data files are typically in CSV format, specifically
train.csv
and test.csv
.Usage
This dataset can be used to:
- Train a text classifier to automatically categorise news articles.
- Develop systems capable of identifying positive and negative sentiment within news articles.
- Conduct research into differences in how positive and negative news is reported by various media outlets.
Coverage
The AG News dataset is a collection of over 1 million news articles, sourced from more than 2,000 news outlets by ComeToMyHead. This academic news search engine has been active since July 2004, indicating a data collection period spanning from that time. The coverage is global, making it a comprehensive resource for news analysis.
License
CC0
Who Can Use It
This dataset is ideal for academic and research purposes. Intended users include researchers in:
- Data mining (e.g., clustering, classification).
- Information retrieval (e.g., ranking, search).
- Applications involving XML, data compression, and data streaming.
- Any other non-commercial activity related to text data analysis. It is particularly suitable for those engaged in text classification research.
Dataset Name Suggestions
- AG News Articles Classification
- News Article Sentiment Dataset
- Global News Text Corpus
- Academic News Article Data
Attributes
Original Data Source: AG News (News articles)