Social Media Veracity & Rumour Tracking Set
News & Media Articles
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Uncover the dynamics of misinformation and public discourse during breaking news events with this extensive collection of Twitter data. Developed from the PHEME project, this dataset distinguishes between rumours and non-rumours, offering a vital resource for veracity classification and social media analysis. The data specifically focuses on high-impact events, primarily the Charlie Hebdo shooting and the Germanwings crash, whilst also including significant data related to Ferguson. By providing the raw text alongside rumour verification labels, this resource enables deep linguistic and behavioural analysis of how information spreads during crises.
Columns
- text: The content of the comment or tweet posted by the user.
- is_rumor: A classification label used to verify if the text is a rumour or not (0 for non-rumour, 1 for rumour).
- user.handle: The unique handle or username of the Twitter user who posted the content.
- topic: The specific breaking news event associated with the tweet (e.g., charliehebdo, ferguson, or others).
Distribution
- Format: CSV
- Size: 7.82 MB
- Structure: Rectangular data structure with approximately 62,400 valid rows and 4 columns.
- Note: The dataset contains over 60,000 rows. While the majority of records are valid, there are some missing values within the 'topic' field (approximately 20%).
Usage
- Rumour Detection Systems: Training machine learning models to automatically identify and flag potential rumours in real-time.
- Fake News Analysis: Studying the propagation patterns of unverified information versus verified news.
- Sentiment Analysis: Evaluating public emotional response during traumatic breaking news events.
- Social Media Monitoring: nuanced understanding of keyword usage and user interaction during crises.
- Natural Language Processing (NLP): Benchmarking text classification algorithms on noisy, real-world social media text.
Coverage
- Geographic/Event Scope: Global Twitter conversation focused on specific events: Charlie Hebdo (Paris), Germanwings crash (French Alps), and Ferguson (USA).
- Platform: Twitter (Internet/News).
- Demographic: Twitter users active during these specific breaking news periods.
License
CC0: Public Domain
Who Can Use It
- Data Scientists: For building and testing classification models.
- Academic Researchers: For sociolinguistic studies on misinformation.
- Journalists: For analysing historical social media reactions to major news events.
- Policy Makers: To understand information flow during emergencies.
Dataset Name Suggestions
- PHEME Rumour & Veracity Dataset
- Twitter Breaking News Misinformation Archive
- Charlie Hebdo & Germanwings Rumour Classification Data
- Social Media Veracity & Rumour Tracking Set
- PHEME Event-Based Rumour Detection CSV
Attributes
Original Data Source: Social Media Veracity & Rumour Tracking Set
Loading...
