Reddit Viral Disease Conversations
Public Health & Epidemiology
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset contains a collection of Reddit topics related to the monkeypox outbreak, gathered using the Reddit API and praw library. It offers insights into public discussions surrounding the viral disease, which was confirmed in May 2022 and declared a Public Health Emergency of International Concern (PHEIC) by the World Health Organization (WHO) on 23 July 2022. The outbreak marked the first time monkeypox spread widely outside Central and West Africa. This tabular dataset is valuable for understanding online community discussions during a global health event.
Columns
- title: The main title of the Reddit topic. This column has 962 unique values.
- score: The numerical score assigned to the topic, indicating its popularity or upvotes. Scores range from 0 to 811, with a mean of 64.7.
- id: The unique identifier for each Reddit topic. All 965 entries are unique.
- url: The specific URL linking to the Reddit topic. There are 962 unique URLs recorded.
- comms_num: The total number of comments associated with the topic. The number of comments varies, with a mean of 27.8 and a maximum of 570.
- created: The Unix timestamp indicating when the topic was created.
- body: The Markdown formatted content of the text submission. Note that 82% of entries in this column are missing.
- year-month-day: The date (year, month, day) when the topic was created, ranging from 1st June 2022 to 21st August 2022.
- hour-min-sec: The precise time (hour, minute, second) when the topic was created.
Distribution
This dataset is provided as tabular data in a CSV file, specifically named
reddit_monkeypox.csv
, and has a size of 345.12 kB. It comprises 9 columns and 965 records or rows, offering a structured collection of Reddit topic details.Usage
This dataset is ideal for natural language processing (NLP) projects, such as building algorithms to classify or predict content related to public health topics. It can be used for sentiment analysis of online discussions, trend analysis of public interest in monkeypox, or for tracking the evolution of discourse around a public health emergency. Researchers could utilise it to study how viral outbreaks are discussed on social media platforms.
Coverage
The dataset's content covers Reddit topics related to the monkeypox outbreak, predominantly from June to August 2022. The data reflects discussions from an online global community, focusing on the period when the outbreak significantly spread outside its endemic regions and was declared a PHEIC.
License
CC0: Public Domain
Who Can Use It
This dataset is suitable for data scientists, researchers in public health and social sciences, NLP practitioners, and students learning about data analysis or machine learning applications in real-world scenarios. Potential users can leverage it for academic research, developing predictive models for public health messaging, or understanding public perception during health crises.
Dataset Name Suggestions
- Monkeypox Reddit Discussions
- Reddit Monkeypox Outbreak Topics
- Social Media Monkeypox Chronicle
- Reddit Viral Disease Conversations
- Monkeypox Online Discourse Data
Attributes
Original Data Source: Reddit Viral Disease Conversations