Opendatabay APP

COVID-19 Vaccine Twitter Discourse

Health Information Systems & Technology

Tags and Keywords

Health

Nlp

Healthcare

Public

Coronavirus

Text

Trusted By
Trusted by company1Trusted by company2Trusted by company3
COVID-19 Vaccine Twitter Discourse Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset captures recent tweets related to various COVID-19 vaccines used worldwide on a large scale [1]. It aims to provide a resource for studying public discourse and sentiment surrounding these vaccines [1]. The data collection process involves using the tweepy Python package to access the Twitter API [1]. It began by merging tweets about the Pfizer/BioNTech vaccine, then expanded to include tweets about Sinopharm, Sinovac (both Chinese-produced vaccines), Moderna, Oxford/AstraZeneca, Covaxin, and Sputnik V [1]. Initially collected twice a day, the frequency stabilised to once daily during morning hours (GMT) to manage tweet quotas [1].

Columns

  • id: Unique identifier for the tweet [2].
  • user_name: The name of the user who posted the tweet [2].
  • user_location: The location provided by the user [2].
  • user_description: The user's profile description [2].
  • user_created: The date the user account was created [2].
  • user_followers: The number of followers the user has [2].
  • user_friends: The number of friends (accounts followed) the user has [2].
  • user_favourites: The number of tweets the user has marked as favourites [2].
  • user_verified: A boolean indicating if the user is verified (true/false) [2].
  • date: The date and time the tweet was posted [2].

Distribution

The data is typically available in CSV format [3]. The dataset includes a substantial number of tweets, with tweet counts per weekly interval ranging from a low of 354 to a high of 11,489 [2, 4-18]. User data includes counts for user followers, friends, and favourites, with the largest group of users having between 0 and 327,060.96 followers totalling 221,968 accounts [19, 20]. Similarly, for user friends, the largest group (0 to 11,649.22) accounts for 226,631 records [20, 21], and for user favourites, the largest group (0 to 25,992.00) accounts for 202,999 records [22, 23]. Approximately 8% of users are verified, while 92% are not [13].

Usage

This dataset is suitable for various operations and analytical tasks, including:
  • Studying the subjects discussed in recent tweets about different vaccine producers [1].
  • Performing Natural Language Processing (NLP) tasks such as topic modelling and sentiment analysis [1].
  • Analysing the relationship between vaccination progress (which can be observed through other datasets like COVID-19 World Vaccination Progress) and discussions about vaccines on social media [1].

Coverage

The dataset's region of coverage is global [24]. It specifically covers tweets related to the following COVID-19 vaccines: Pfizer/BioNTech, Sinopharm, Sinovac, Moderna, Oxford/AstraZeneca, Covaxin, and Sputnik V [1]. The tweet data spans from 12th December 2020 to 23rd November 2021 [18]. User creation dates, reflecting the age of user accounts contributing to the dataset, range from 15th July 2006 to 22nd November 2021 [25].

License

CCO

Who Can Use It

  • Researchers: To conduct studies on public health communication, social media trends, and vaccine perception.
  • Data Scientists: For developing and testing NLP models, performing sentiment analysis, and topic modelling.
  • Public Health Analysts: To monitor public opinion and discussions related to COVID-19 vaccines.
  • Academics: For educational purposes and academic research projects focusing on social media data analysis.

Dataset Name Suggestions

  • COVID-19 Vaccine Twitter Discourse
  • Global Vaccine Social Media Insights
  • Public Opinion on COVID-19 Vaccines (Twitter)
  • COVID-19 Vaccine Tweet Repository

Attributes

Original Data Source: COVID-19 All Vaccines Tweets

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

08/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format