Opendatabay APP

COVID-19 Twitter Activity

Health Information Systems & Technology

Tags and Keywords

Internet

Online

Email

Diseases

Nlp

Trusted By
Trusted by company1Trusted by company2Trusted by company3
COVID-19 Twitter Activity Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset contains a collection of tweets from Twitter users related to the Coronavirus and COVID-19 pandemic. It aims to provide insights into public discourse, health information dissemination, and online community interactions during this global health event. The dataset is valuable for analysing public sentiment, observing social media activity correlations with disease spread, and understanding general online behaviour during a health crisis.

Columns

  • text: The content of individual tweets.
  • accounts: Details pertaining to the Twitter accounts from which the tweets originated.
  • hashtags: A list of hashtags used within each tweet.
  • locations: Geographic locations associated with the Twitter accounts.
  • retweet_count: A numerical count of retweets for each original tweet, though retweets themselves are not included in the dataset.
  • country: The name of the country, potentially matched with country codes.
  • country_code: The associated code for each country.

Distribution

The dataset is structured as a series of files, with new uploads typically containing a single day's worth of data. Due to the high volume of tweets, files are split into manageable groups, approximately covering half a month each, to ensure optimal accessibility. It does not include retweets, but a count is provided for each tweet. The data is part of a larger series, with subsequent datasets covering different time periods, such as early and late April.

Usage

This dataset is ideal for various applications, including:
  • Natural Language Processing (NLP): For sentiment analysis, topic modelling, and text mining on health-related social media content.
  • Public Health Research: To monitor public response, identify misinformation, and track trends related to the pandemic.
  • Social Science Studies: Analysing online community dynamics, information diffusion, and public perception during a global crisis.
  • Data Analytics: Investigating the relationship between social media activity and reported disease cases.

Coverage

The dataset offers a global geographic scope, including location data for accounts. The time coverage begins around 17 March and extends through late April, spanning across multiple sequential dataset files. While aiming for broad coverage, specific notes indicate that due to the immense volume of tweets, there may be some gaps in data capture for certain hashtags. Less frequently used hashtags may cover a longer historical period than more popular ones like "#coronavirus".

License

CCO

Who Can Use It

  • Academics and Researchers: Studying epidemiology, public health communications, social behaviour, and digital humanities.
  • Data Scientists and Analysts: Developing predictive models, sentiment analysis tools, or visualisations of social media trends.
  • Organisations and Non-profits: Monitoring public discourse related to health crises and understanding information flow.

Dataset Name Suggestions

  • COVID-19 Twitter Activity
  • Pandemic Tweets Collection
  • Global Coronavirus Discourse
  • Social Media COVID-19 Insights
  • Health Crisis Twitter Data

Attributes

Original Data Source: Coronavirus (covid19) Tweets

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

08/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format