Opendatabay APP

Reddit Coronavirus Discourse Data

Social Media and Networking

Tags and Keywords

Social

Networks

Nlp

Public

Health

Coronavirus

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Reddit Coronavirus Discourse Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset provides a detailed collection of public discussions surrounding the Coronavirus on Reddit. It gathers posts and comments from the popular r/Coronavirus subreddit, which boasts over 2.4 million subscribers. This resource is invaluable for analysing social media trends, understanding public sentiment, and developing Natural Language Processing (NLP) applications related to health crises and online discourse.

Columns

  • title: The primary headline or subject of a Reddit post.
  • score: A numerical indicator of a post's popularity, derived from its impact and the number of comments it receives.
  • id: A unique identifier assigned to each individual post or comment.
  • url: The direct web address linking to the original Reddit post thread.
  • comms_num: The total number of comments made in response to a particular post.
  • created: The date on which the post or comment was originally created.
  • body: The full textual content of either a Reddit post or a comment.
  • timestamp: A numerical representation of the date and time of content creation.

Distribution

The dataset comprises both Reddit posts and their associated comments. Data files are typically provided in CSV format. While precise total row counts are not specified, the data includes a substantial range of values for fields like score and timestamp, reflecting a sizeable collection of discussion points from an active online community.

Usage

This dataset is ideally suited for a variety of analytical and research applications, including:
  • Public health research: To study public perceptions, concerns, and information dissemination regarding the Coronavirus pandemic.
  • Social media analysis: For tracking evolving discussions, identifying key themes, and monitoring engagement on large online platforms.
  • Natural Language Processing (NLP): To train and validate models for tasks such as sentiment analysis, topic modelling, and text classification within health-related discourse.
  • Behavioural science: For observing and understanding how large online communities interact and respond during global health events.

Coverage

  • Geographic: The data is global in scope, reflecting discussions from a diverse, worldwide user base.
  • Time Range: The included data spans from at least 13th November 2021 to 18th December 2021.
  • Demographic Scope: Represents the collective voice and interactions of over 2.4 million Reddit subscribers engaged with Coronavirus-related topics.

License

CC0

Who Can Use It

  • Academics and researchers specialising in social media studies, public health, or Natural Language Processing.
  • Data scientists and analysts aiming to build tools for social listening, trend prediction, or public opinion analysis.
  • Organisations and governmental bodies interested in understanding public sentiment and engagement with health information online.

Dataset Name Suggestions

  • Reddit Coronavirus Discourse Data
  • COVID-19 Reddit Conversations
  • Public Health Reddit Posts & Comments
  • Social Media Coronavirus Data
  • Reddit Pandemic Discussion Log

Attributes

Original Data Source: Coronavirus on Reddit

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

27/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format