Fake News Authenticity Dataset
News & Media Articles
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset is designed to aid in the classification of news by type and label, primarily focusing on distinguishing between real and fake news based on its source and characteristics. In an era where social media is a vast source of content, news is accessed frequently, yet its authenticity is a major concern due to the widespread issue of misinformation [1]. Manually classifying news is a tedious, time-consuming, and potentially biased process [1]. This dataset provides a valuable resource for mitigating the spread of misinformation and informing individuals about the nature of news they consume [2]. It is particularly valuable as it incorporates crucial source information, author names, publication dates, and labels, which are essential for evaluating news trustworthiness [3].
Columns
- author: The author of the news article [4].
- published: The date on which the article was published [4].
- title: The title of the news article [5].
- text: The full text content of the article [5].
- language: The language in which the article is written [6].
- site_url: The URL of the website where the article was published [6].
- main_img_url: The URL of the main image associated with the article [7].
- type: The identified type of article, such as 'bs' (clickbait/misleading) or 'bias' [7].
- label: A categorical label indicating whether the news is 'Fake' or 'Real' [8].
- title_without_stopwords: The article title with common stop words removed for analysis [8].
- text_without_stopwords: The article text with common stop words removed for analysis [9].
- hasImage: A binary indicator showing whether the article includes an image or not [9].
Distribution
The dataset is provided as a CSV file named
news_articles.csv
, with a size of 10.97 MB [4]. It contains 12 distinct columns [4]. The dataset consists of approximately 2095 to 2096 records for most columns, with a slightly lower count for the text
column (2050 valid records) [4-9].Usage
This dataset is ideal for:
- Developing practical applications that allow users to gain insight from articles they consume, such as fact-checking websites [2].
- Creating built-in plugins and article parsers to automate the detection and flagging of misinformation [2].
- Refining and making existing tools easier to access for greater public awareness regarding news authenticity [2].
- Conducting research on source-based fake news classification and the characteristics that define untrustworthy news [1, 3].
- Training machine learning models to label data as fake or untrustworthy based on various features [3].
Coverage
The dataset primarily features articles in English (96%) and German (3%), with a small percentage of other languages [6]. The publication dates of the articles span at least from 2006 to 2016 [5]. The scope is global in its relevance, addressing the universal challenge of misinformation on social media, though specific geographic or demographic data on the origin of news content is not detailed [1].
License
CC0: Public Domain
Who Can Use It
- Data Scientists and Machine Learning Engineers: For building and training models for fake news detection, news classification, and natural language processing tasks.
- Researchers: Studying misinformation, media bias, and the impact of social media content.
- Developers: Creating fact-checking platforms, browser extensions, or news analysis tools.
- Journalists and Media Organisations: For verifying news authenticity and understanding content biases.
- Anyone interested in combating misinformation: To gain insights into how news can be classified and verified.
Dataset Name Suggestions
- Fake News Authenticity Dataset
- Social Media News Classification
- Misinformation Detection Corpus
- News Source Trustworthiness Data
Attributes
Original Data Source: Fake News Authenticity Dataset