The Onion News Dataset
Entertainment & Media Consumption
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset comprises satirical news articles from The Onion, a renowned source of satirical content. It was collected by scraping news articles directly from The Onion's "Breaking News / News in Brief" section. The dataset is ideal for use as a complementary corpus for natural language processing (NLP) tasks, particularly those focused on classifying satirical text or identifying fake news. It includes essential details such as the news article's title, its publication time, and the full content of the news story.
Columns
- Title: The headline or title of the satirical news article. There are approximately 6,820 unique titles.
- Published Time: The date and time when the news article was originally published.
- Content: The full body text or detailed content of the satirical news story. This column contains approximately 6,789 unique values.
Distribution
The dataset is typically provided as a CSV file. It contains news articles published over a significant period. While the exact file size is not specified, the dataset spans from 17th April 1996 to 12th August 2022. The record count varies across different time periods, with the latest period (25th December 2019 to 12th August 2022) containing 1,540 records. The dataset structure aligns with common tabular data formats, making it accessible for various analytical applications.
Usage
This dataset is particularly suitable for:
- Satirical text classification: Training and testing models to identify and categorise satirical content.
- Fake news classification: Serving as a valuable resource for developing systems that distinguish genuine news from fabricated or misleading information.
- Natural Language Processing (NLP) research: Exploring language patterns, humour, and rhetorical devices used in satirical writing.
- Content analysis: Analysing trends and themes in satirical journalism over time.
Coverage
The dataset offers global coverage, as satirical news from The Onion is relevant internationally. It covers a time range from 17th April 1996 to 12th August 2022. No specific demographic scope is noted for this dataset.
License
CC-BY-NC
Who Can Use It
- Data Scientists and Machine Learning Engineers: For developing and evaluating models for text classification, especially in the domain of satire and fake news detection.
- Researchers: In the fields of linguistics, media studies, and computational social science, to study the characteristics of satirical writing and its impact.
- Educators and Students: As a practical dataset for teaching and learning about NLP, text analysis, and data science principles.
- AI and LLM Developers: To enhance the understanding of nuanced language, humour, and context within large language models.
Dataset Name Suggestions
- The Onion News Dataset
- Satirical News Corpus
- Fake News Detection Sample
- Humorous News Articles Collection
- The Onion Content Archive
Attributes
Original Data Source: Satirical News from The Onion