Examiner Content Farm Data
News & Media Articles
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset presents a unique archive of crowdsourced journalism from "The Examiner", a significant pseudo-news website from the 2000s digital content landscape. It contains the headlines of over 3 million articles penned by approximately 21,000 authors over six years. The Examiner, though not acclaimed for its quality, was remarkably prolific, generating thousands of articles daily and reaching its peak in 2011 with high search rankings, enormous social media shares, and up to twenty million unique mobile visitors monthly. This collection offers a vivid portrayal of trending topics during its operational period and serves as the last surviving record of a once prominent, advert-revenue-driven digital platform whose original content is now defunct.
Columns
- publish_date: The date when the article was published on The Examiner site, formatted as yyyyMMdd.
- headline_text: The actual text of the article's headline, presented in English.
Distribution
The dataset is provided in CSV format and contains 3,089,781 unique items or records. The file size is approximately 202.69 MB. While a precise breakdown of records per year is not available for all periods, the dataset spans from early 2010 to late 2015, with varying article counts across different date ranges within that period. All records are valid, with no missing or mismatched entries for either the publish date or headline text.
Usage
This dataset is ideal for:
- Analysing trends in digital content and journalism over a six-year period.
- Studying the evolution and impact of catchy headlines and clickbait strategies.
- Researching the characteristics of crowdsourced news and content farms.
- Applications in Natural Language Processing (NLP), such as text analysis, topic modelling, or sentiment analysis on news headlines.
- Exploring historical data related to online media consumption and popular topics.
Coverage
The dataset primarily covers content published by The Examiner, an online platform. The articles span a time range from 1st January 2010 to 31st December 2015. While specific geographic or demographic data on authors or readers is not explicitly detailed, the content originates from a US-based pseudo-news site, suggesting a primary focus on topics relevant to that region. The dataset contains contributions from around 21,000 authors.
License
CC0: Public Domain
Who Can Use It
- Researchers and Academics: For studies in media history, digital journalism, content monetisation strategies, and social media trends.
- Data Scientists and NLP Practitioners: For developing and testing algorithms related to text classification, topic extraction, and understanding headline virality.
- Journalism and Communication Scholars: To examine the quality and characteristics of high-volume, advert-driven online content.
- Content Strategists: To gain insights into historical content performance and headline effectiveness.
Dataset Name Suggestions
- The Examiner News Headlines Archive
- Digital Journalism: Examiner Headlines (2010-2015)
- Crowdsourced News Headline Catalog
- Examiner Content Farm Data
- Clickbait Chronicles: The Examiner
Attributes
Original Data Source: Examiner Content Farm Data