British News Sentiment and Bias Dataset
News & Media Articles
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Headlines generated by the top 15 United Kingdom news websites are captured in this dataset, covering a 20-day period in early 2023. The collection offers a snapshot of the British media landscape by aggregating content scraped from RSS feeds at 12-hour intervals. Beyond the raw text of the headlines, the data includes auxiliary information providing context on the news outlets, such as ownership, political bias, and monthly visitor statistics. This resource facilitates the analysis of media trends, sentiment, and the framing of news events across the political spectrum during the specified timeframe.
Columns
The dataset comprises two distinct components: the scraped headline data and auxiliary outlet information.
Scraped Data Columns:
- id: Integer identifier for the record.
- website: The name of the news outlet (e.g., BBC, Daily Mail, Guardian).
- timestamp scraped: The date and time when the particular headline was collected.
- headline: The full text of the news article headline.
Auxiliary Data Columns:
- website: The name of the news outlet (matches the scraped data).
- RSS URL: The direct link to the RSS feed used for scraping.
- visitors unique monthly: Monthly unique visitors in millions (sourced from Statista).
- ownership: The entity that owns the news outlet.
- political bias: The political alignment of the outlet (e.g., left-centre, right-centre).
- party support GE 2019: The political party supported by the outlet during the 2019 General Election.
- journalism style: Classification of the outlet's style (e.g., quality, tabloid).
Distribution
- Format: Tabular data (CSV).
- Size: Approximately 3.85 MB.
- Structure: The dataset contains roughly 33,200 valid records (rows) across 4 columns in the main file, with associated auxiliary data.
- Data Integrity: There are zero missing or mismatched values reported for the main columns.
Usage
This dataset is suitable for a variety of analytical and educational applications:
- Sentiment Analysis: Evaluating the emotional tone of headlines across different political spectrums.
- Topic Modelling: Identifying dominant themes in UK news during Feb–Mar 2023.
- Media Bias Detection: Comparing how different outlets frame the same events based on their political alignment.
- Trend Analysis: Tracking the lifecycle of specific news stories over the 20-day period.
- Natural Language Processing (NLP): Training or testing models on short-text data (headlines).
Coverage
- Geographic Scope: United Kingdom.
- Time Range: 13 February 2023 to 05 March 2023.
- Demographic/Source Scope: Top 15 UK news websites: BBC, Sun, Mirror, Daily Mail, Independent, Telegraph, Guardian, Manchester Evening News, Sky News, Metro, Daily Express, Times, Liverpool Echo, Birmingham Live, and Evening Standard.
- Notes: Headlines were scraped at 12-hour intervals.
License
CC BY-NC-SA 4.0
Who Can Use It
- Data Scientists and Analysts: For practising NLP techniques and exploratory data analysis.
- Journalists and Media Researchers: For investigating media plurality and bias.
- Students: As a clean, beginner-friendly dataset for learning data storytelling and tabular data manipulation.
- Sociologists: For understanding public discourse and media consumption patterns.
Dataset Name Suggestions
- UK Top 15 Media Headlines Snapshot 2023
- British News Sentiment and Bias Dataset
- 20-Day UK News RSS Feed Collection
- United Kingdom Political Media Headlines
Attributes
Original Data Source: British News Sentiment and Bias Dataset
Loading...
