Opendatabay APP

Dutch Sarcasm Headline Dataset

Entertainment & Media Consumption

Tags and Keywords

News

Tabular

Classification

Nlp

Text

Mining

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Dutch Sarcasm Headline Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset provides a collection of Dutch news headlines for the purpose of sarcasm detection. It comprises headlines from two distinct sources: Speld.nl, which features satirical news, and Nu.nl, a platform for traditional news. Each headline is labelled to indicate whether it is sarcastic, offering a valuable resource for training machine learning models to identify sarcasm in text. The dataset also includes additional metadata such as the original source of the headline and its subject matter (politics, foreign affairs, or domestic news), making it suitable for nuanced analysis and classification tasks in natural language processing.

Columns

  • headline: The original text of the news headline.
  • link: The URL pointing to the original news article.
  • source: Identifies the origin of the headline, either 'speld.nl' for satirical content or 'nu.nl' for regular news.
  • is_sarcastic: A boolean flag where 'true' indicates a sarcastic headline (from Speld.nl) and 'false' indicates a non-sarcastic headline (from Nu.nl).
  • is_binnenland: A boolean flag indicating if the news headline is categorised under "domestic news".
  • is_buitenland: A boolean flag indicating if the news headline is categorised under "foreign news".
  • is_politiek: A boolean flag indicating if the news headline is categorised under "political news".

Distribution

The dataset is provided as a single CSV file, structured with 7 distinct columns. It contains approximately 24,522 records. It is important to note that some headlines may appear multiple times within the dataset, as they might be listed under more than one news subject category.

Usage

This dataset is ideally suited for various applications in natural language processing and machine learning. It can be used for:
  • Developing and testing sarcasm detection algorithms.
  • Training classification models to distinguish between satirical and factual news.
  • Conducting linguistic research on sarcasm, particularly in the Dutch language.
  • Exploring the nuances of sarcasm detection across different news subjects (e.g., politics, domestic, foreign news).
  • Educational purposes for teaching text analysis and machine learning concepts.

Coverage

The dataset focuses on news headlines originating from Dutch media outlets, specifically Speld.nl and Nu.nl, implying a geographic scope primarily within the Netherlands. No specific time range for the headlines' publication is detailed. The content covers general news, with specific categorisation into political, foreign, and domestic news.

License

CC0

Who Can Use It

This dataset is beneficial for a range of users interested in text analysis and machine learning, including:
  • Data Scientists: For building and refining NLP models focused on sentiment and sarcasm.
  • Machine Learning Engineers: To develop robust text classification systems.
  • Researchers: For academic studies on computational linguistics, satire, and media analysis.
  • Students: As a practical resource for learning about data preparation, feature engineering, and model training in NLP.

Dataset Name Suggestions

  • Dutch Sarcasm Headline Dataset
  • Nu.nl and De Speld Sarcasm Headlines
  • Dutch News Sarcasm Detection Dataset
  • Sarcastic Dutch News Headlines Collection

Attributes

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

21/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format