Reddit Anti-Vaccine Sentiment Data
Social Media and Posts
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset contains posts and comments collected from the r/VaccineMyths subreddit, a public forum where individuals discuss various vaccine-related myths. Its primary purpose is to provide raw text data for analysis of public discourse surrounding vaccine misinformation. The data was collected using praw, the Python Reddit API Wrapper. Please be aware that a small percentage of the content may contain harsh language, as the posts were not filtered during collection.
Columns
- title: This field is relevant for posts and provides the title of the Reddit post.
- score: Relevant for posts, this indicates the post's impact, often related to the number of upvotes. Scores range from -12 to 1,187, with a mean of 3.69.
- id: A unique identifier for both posts and comments, with 1,602 unique values.
- url: Relevant for posts, this provides the URL of the post thread. Approximately 71% of entries in this column are missing, indicating it's primarily present for original posts.
- comms_num: Relevant for posts, this represents the number of comments associated with a particular post. The number of comments ranges from 0 to 595, with a mean of 1.84.
- created: This field provides the date of creation for the post or comment in a Unix timestamp format.
- body: Relevant for both posts and comments, this contains the actual text content of the post or comment. Approximately 23% of entries in this column are missing, typically for post titles where the main content is in the title.
- timestamp: This provides a human-readable date and time for when the post or comment was created.
Distribution
The dataset is provided in a CSV format (
reddit_vm.csv
), with a file size of 628.1 kB. It comprises 8 distinct columns and contains 1,602 records. While columns like id
, score
, comms_num
, created
, and timestamp
are fully populated, title
, url
, and body
have varying percentages of missing values, reflecting their relevance to specific data types (posts versus comments).Usage
This dataset is highly suitable for:
- Performing sentiment analysis on discussions related to vaccine myths.
- Identifying discussion topics and common themes within conversations about vaccine misinformation.
- Studying the spread and characteristics of online health discourse.
Coverage
The data encompasses discussions from the r/VaccineMyths subreddit on Reddit. The time range of the collected content spans from 13th January 2014 to 30th December 2021. There are no explicit geographical or demographic filters, as the data reflects global participation on the Reddit platform.
License
CC0: Public Domain
Who Can Use It
This dataset is ideal for:
- Researchers studying public health communication and the dynamics of misinformation online.
- Data analysts looking to apply natural language processing (NLP) techniques, such as sentiment analysis or topic modelling, to real-world social media data.
- Social scientists investigating online communities and their discourse patterns.
- Academics interested in understanding public perception and narratives around vaccines.
Dataset Name Suggestions
- Reddit Vaccine Myths Discussion Data
- r/VaccineMyths Community Discourse
- Online Vaccine Misinformation Archive
- Reddit Anti-Vaccine Sentiment Data
- Vaccine Myth Reddit Posts & Comments
Attributes
Original Data Source: Reddit Anti-Vaccine Sentiment Data