Opendatabay APP

Reddit Anti-Vaccine Sentiment Data

Social Media and Posts

Tags and Keywords

Vaccine

Reddit

Myths

Posts

Comments

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Reddit Anti-Vaccine Sentiment Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset contains posts and comments collected from the r/VaccineMyths subreddit, a public forum where individuals discuss various vaccine-related myths. Its primary purpose is to provide raw text data for analysis of public discourse surrounding vaccine misinformation. The data was collected using praw, the Python Reddit API Wrapper. Please be aware that a small percentage of the content may contain harsh language, as the posts were not filtered during collection.

Columns

  • title: This field is relevant for posts and provides the title of the Reddit post.
  • score: Relevant for posts, this indicates the post's impact, often related to the number of upvotes. Scores range from -12 to 1,187, with a mean of 3.69.
  • id: A unique identifier for both posts and comments, with 1,602 unique values.
  • url: Relevant for posts, this provides the URL of the post thread. Approximately 71% of entries in this column are missing, indicating it's primarily present for original posts.
  • comms_num: Relevant for posts, this represents the number of comments associated with a particular post. The number of comments ranges from 0 to 595, with a mean of 1.84.
  • created: This field provides the date of creation for the post or comment in a Unix timestamp format.
  • body: Relevant for both posts and comments, this contains the actual text content of the post or comment. Approximately 23% of entries in this column are missing, typically for post titles where the main content is in the title.
  • timestamp: This provides a human-readable date and time for when the post or comment was created.

Distribution

The dataset is provided in a CSV format (reddit_vm.csv), with a file size of 628.1 kB. It comprises 8 distinct columns and contains 1,602 records. While columns like id, score, comms_num, created, and timestamp are fully populated, title, url, and body have varying percentages of missing values, reflecting their relevance to specific data types (posts versus comments).

Usage

This dataset is highly suitable for:
  • Performing sentiment analysis on discussions related to vaccine myths.
  • Identifying discussion topics and common themes within conversations about vaccine misinformation.
  • Studying the spread and characteristics of online health discourse.

Coverage

The data encompasses discussions from the r/VaccineMyths subreddit on Reddit. The time range of the collected content spans from 13th January 2014 to 30th December 2021. There are no explicit geographical or demographic filters, as the data reflects global participation on the Reddit platform.

License

CC0: Public Domain

Who Can Use It

This dataset is ideal for:
  • Researchers studying public health communication and the dynamics of misinformation online.
  • Data analysts looking to apply natural language processing (NLP) techniques, such as sentiment analysis or topic modelling, to real-world social media data.
  • Social scientists investigating online communities and their discourse patterns.
  • Academics interested in understanding public perception and narratives around vaccines.

Dataset Name Suggestions

  • Reddit Vaccine Myths Discussion Data
  • r/VaccineMyths Community Discourse
  • Online Vaccine Misinformation Archive
  • Reddit Anti-Vaccine Sentiment Data
  • Vaccine Myth Reddit Posts & Comments

Attributes

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

08/07/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format