Social Media Conspiracy Text Data
Reddit & Forum Data
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
A collection of unfiltered posts and associated comments sourced directly from the r/ConspiracyTheory subreddit on the social platform Reddit. It provides raw social media text data ideal for studying online discussions related to various conspiracy theories, spanning topics that include politics, economics, currencies, and other social narratives. The data has not been subject to pre-filtering.
Columns
The dataset contains eight key fields, detailing attributes of both posts and comments:
- title: The primary headline used for the original Reddit post.
- score: A numerical value reflecting the engagement or impact of the post or comment, ranging from -5 up to 25.
- id: A unique identification tag assigned to every post or comment record.
- url: The web address linking directly to the post thread; this field is often blank (65% missing) for comments.
- comms_num: The total count of comments directed toward the specific post, with values reaching up to 30.
- created: The date of creation presented in a standard Unix numerical format.
- body: The substantive text content of the post or the reply. Note that 19% of records are missing body text.
- timestamp: The date and time of creation in a readable datetime format.
Distribution
The information is delivered in a single CSV file, named
reddit_ct.csv, with a size of approximately 465.2 kB. It is structured across eight columns and includes 1197 valid records. The structure covers content types including both posts and subsequent comments within the threads.Usage
This data is perfectly suited for performing sentiment analysis, particularly on controversial or extreme viewpoints. It can be utilised to identify prevailing discussion topics and track how narratives evolve over time. It is also an excellent resource for researchers seeking to train natural language processing models on informal, real-world social network text.
Coverage
The temporal scope of the records is extensive, beginning in August 2012 and concluding in March 2022. The data focuses entirely on content generated within the Reddit ConspiracyTheory subreddit. While the data captures discussions on global topics (Politics, Economics), it does not contain specific demographic details about the users who created the content.
License
CC0: Public Domain
Who Can Use It
The dataset is highly relevant for social scientists and sociologists examining online radicalisation and discourse patterns. Data journalists can use it to track emerging social trends and specific narratives within fringe communities. Machine learning engineers will find it valuable for developing text classification algorithms.
Dataset Name Suggestions
- Reddit Conspiracy Post Archive
- r/ConspiracyTheory Discussions
- Social Media Conspiracy Text Data
- Unfiltered Reddit Theories
Attributes
Original Data Source: Social Media Conspiracy Text Data
Loading...
