Opendatabay APP

The Onion News Dataset

Entertainment & Media Consumption

Tags and Keywords

News

Text

Nlp

Satirical

Onion

Trusted By
Trusted by company1Trusted by company2Trusted by company3
The Onion News Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset comprises satirical news articles from The Onion, a renowned source of satirical content. It was collected by scraping news articles directly from The Onion's "Breaking News / News in Brief" section. The dataset is ideal for use as a complementary corpus for natural language processing (NLP) tasks, particularly those focused on classifying satirical text or identifying fake news. It includes essential details such as the news article's title, its publication time, and the full content of the news story.

Columns

  • Title: The headline or title of the satirical news article. There are approximately 6,820 unique titles.
  • Published Time: The date and time when the news article was originally published.
  • Content: The full body text or detailed content of the satirical news story. This column contains approximately 6,789 unique values.

Distribution

The dataset is typically provided as a CSV file. It contains news articles published over a significant period. While the exact file size is not specified, the dataset spans from 17th April 1996 to 12th August 2022. The record count varies across different time periods, with the latest period (25th December 2019 to 12th August 2022) containing 1,540 records. The dataset structure aligns with common tabular data formats, making it accessible for various analytical applications.

Usage

This dataset is particularly suitable for:
  • Satirical text classification: Training and testing models to identify and categorise satirical content.
  • Fake news classification: Serving as a valuable resource for developing systems that distinguish genuine news from fabricated or misleading information.
  • Natural Language Processing (NLP) research: Exploring language patterns, humour, and rhetorical devices used in satirical writing.
  • Content analysis: Analysing trends and themes in satirical journalism over time.

Coverage

The dataset offers global coverage, as satirical news from The Onion is relevant internationally. It covers a time range from 17th April 1996 to 12th August 2022. No specific demographic scope is noted for this dataset.

License

CC-BY-NC

Who Can Use It

  • Data Scientists and Machine Learning Engineers: For developing and evaluating models for text classification, especially in the domain of satire and fake news detection.
  • Researchers: In the fields of linguistics, media studies, and computational social science, to study the characteristics of satirical writing and its impact.
  • Educators and Students: As a practical dataset for teaching and learning about NLP, text analysis, and data science principles.
  • AI and LLM Developers: To enhance the understanding of nuanced language, humour, and context within large language models.

Dataset Name Suggestions

  • The Onion News Dataset
  • Satirical News Corpus
  • Fake News Detection Sample
  • Humorous News Articles Collection
  • The Onion Content Archive

Attributes

Original Data Source: Satirical News from The Onion

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

24/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format