Reddit Coronavirus Discourse Data
Social Media and Networking
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides a detailed collection of public discussions surrounding the Coronavirus on Reddit. It gathers posts and comments from the popular r/Coronavirus subreddit, which boasts over 2.4 million subscribers. This resource is invaluable for analysing social media trends, understanding public sentiment, and developing Natural Language Processing (NLP) applications related to health crises and online discourse.
Columns
- title: The primary headline or subject of a Reddit post.
- score: A numerical indicator of a post's popularity, derived from its impact and the number of comments it receives.
- id: A unique identifier assigned to each individual post or comment.
- url: The direct web address linking to the original Reddit post thread.
- comms_num: The total number of comments made in response to a particular post.
- created: The date on which the post or comment was originally created.
- body: The full textual content of either a Reddit post or a comment.
- timestamp: A numerical representation of the date and time of content creation.
Distribution
The dataset comprises both Reddit posts and their associated comments. Data files are typically provided in CSV format. While precise total row counts are not specified, the data includes a substantial range of values for fields like score and timestamp, reflecting a sizeable collection of discussion points from an active online community.
Usage
This dataset is ideally suited for a variety of analytical and research applications, including:
- Public health research: To study public perceptions, concerns, and information dissemination regarding the Coronavirus pandemic.
- Social media analysis: For tracking evolving discussions, identifying key themes, and monitoring engagement on large online platforms.
- Natural Language Processing (NLP): To train and validate models for tasks such as sentiment analysis, topic modelling, and text classification within health-related discourse.
- Behavioural science: For observing and understanding how large online communities interact and respond during global health events.
Coverage
- Geographic: The data is global in scope, reflecting discussions from a diverse, worldwide user base.
- Time Range: The included data spans from at least 13th November 2021 to 18th December 2021.
- Demographic Scope: Represents the collective voice and interactions of over 2.4 million Reddit subscribers engaged with Coronavirus-related topics.
License
CC0
Who Can Use It
- Academics and researchers specialising in social media studies, public health, or Natural Language Processing.
- Data scientists and analysts aiming to build tools for social listening, trend prediction, or public opinion analysis.
- Organisations and governmental bodies interested in understanding public sentiment and engagement with health information online.
Dataset Name Suggestions
- Reddit Coronavirus Discourse Data
- COVID-19 Reddit Conversations
- Public Health Reddit Posts & Comments
- Social Media Coronavirus Data
- Reddit Pandemic Discussion Log
Attributes
Original Data Source: Coronavirus on Reddit