Digital News Authenticity Dataset
News & Media Articles
Tags and Keywords
Trusted By



"No reviews yet"
Free
About
This dataset focuses on detecting fake news, built upon the FakeNewsNet platform. It comprises news articles along with relevant information, designed to help distinguish between authentic and fabricated news. The original data from FakeNewsNet has undergone a cleaning and combination process into a single file, with some column modifications to facilitate analysis. The primary purpose is to provide a collection of news articles for the task of news veracity assessment, indicated by a label column where '1' signifies real news and '0' represents fake news.
Columns
- title: The title of the news article.
- news_url: The URL where the news article was originally published. This column may contain some missing values.
- source_domain: The web domain from which the article was posted.
- tweet_num: The count of retweets for the respective news article, indicating its social media dissemination.
- real: A binary label column indicating the authenticity of the news article, with '1' for real and '0' for fake.
Distribution
The dataset is provided in CSV format and has a size of 4.31 MB. It contains approximately 23.2 thousand valid records for most columns such as title, source domain, tweet number, and the real label. The news URL column has approximately 22.9 thousand valid records. The exact number of rows or records across the entire file is approximately 23.2 thousand.
Usage
This dataset is ideally suited for a variety of applications and use cases, including:
- Developing and testing machine learning models for fake news detection.
- Conducting text classification tasks based on news content.
- Analysing news propagation patterns through social media retweets.
- Research into the characteristics distinguishing real and fake news articles.
- Practising data cleaning and preprocessing techniques on tabular and text data.
Coverage
The dataset's content is derived from news articles, implying a general scope related to public news. However, specific geographic, temporal (time range), or demographic coverage details are not explicitly provided within the source materials. The data focuses on content and its authenticity rather than a defined regional or historical period.
License
CC0: Public Domain
Who Can Use It
This dataset is valuable for a wide range of users, including:
- Data Scientists and Machine Learning Engineers: For building and evaluating models for automated fake news detection and text classification.
- Researchers: Investigating misinformation, news authenticity, and social media dynamics.
- Academics and Students: As a practical resource for learning about data analysis, natural language processing, and ethical AI in the context of news.
- Journalists and Media Analysts: For understanding patterns in news dissemination and potential indicators of fabricated content.
Dataset Name Suggestions
- Fake News Article Classifier Dataset
- News Veracity Data
- FakeNewsNet Articles Collection
- Digital News Authenticity Dataset
- Article Truthfulness Classifier
Attributes
Original Data Source: Digital News Authenticity Dataset