Daily BBC News Text Dataset
Entertainment & Media Consumption
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset is designed for analysing news trends, performing sentiment analysis, and studying the impact of specific events over time. It offers valuable insights for those interested in media coverage, news propagation, and shifts in public interest across various topics. The dataset is particularly useful for tasks involving natural language processing (NLP), multiclass classification, and text pre-processing.
Columns
- title: The headline or title of the news article.
- pubDate: The date and time when the news article was published.
- guid: A globally unique identifier for the news article, typically presented as a URL.
- link: The direct URL link to access the full news article online.
- description: A concise summary or brief overview of the news article content.
Distribution
The dataset, named bbc_news.csv, contains 35,860 rows and 5 columns. It is typically provided in a CSV file format. The dataset includes 33,889 unique descriptions, 32,335 unique links, 33,124 unique titles, and 33,081 unique GUIDs.
Usage
This dataset is ideally suited for:
- Analysing patterns and shifts in news reporting.
- Conducting sentiment analysis on news article content.
- Investigating the influence of particular events over time.
- Developing and testing models for multiclass classification.
- Tasks requiring text pre-processing for machine learning applications.
- Research into media coverage and public engagement with news.
Coverage
The data primarily spans from 07 March 2022 to 03 July 2024. However, the full collection includes a wider range of publication dates, with some articles dating back to 2013. The distribution of articles by date range is as follows:
- 08/30/2013 - 03/16/2014: 1 article
- 06/16/2017 - 12/31/2017: 1 article
- 12/31/2017 - 07/17/2018: 1 article
- 08/17/2019 - 03/02/2020: 1 article
- 09/16/2020 - 04/02/2021: 2 articles
- 10/17/2021 - 05/03/2022: 2,477 articles
- 05/03/2022 - 11/17/2022: 8,049 articles
- 11/17/2022 - 06/03/2023: 7,334 articles
- 06/03/2023 - 12/18/2023: 8,933 articles
- 12/18/2023 - 07/04/2024: 9,061 articles The dataset covers news articles on a global scale.
License
CC-BY
Who Can Use It
This dataset is particularly beneficial for:
- Researchers: For academic studies on media, public opinion, and linguistic analysis.
- Data Scientists: For developing predictive models, text analytics, and machine learning applications.
- Journalists: For investigative reporting, trend analysis, and understanding news propagation.
- Individuals interested in natural language processing (NLP) and text-based data projects.
Dataset Name Suggestions
- BBC News Articles Collection
- BBC News Headlines & Summaries Dataset
- Daily BBC News Text Data
- BBC News Article Archive
Attributes
Original Data Source:BBC News Articles