COVID-19 Vaccine Twitter Discourse
Health Information Systems & Technology
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset captures recent tweets related to various COVID-19 vaccines used worldwide on a large scale [1]. It aims to provide a resource for studying public discourse and sentiment surrounding these vaccines [1]. The data collection process involves using the tweepy Python package to access the Twitter API [1]. It began by merging tweets about the Pfizer/BioNTech vaccine, then expanded to include tweets about Sinopharm, Sinovac (both Chinese-produced vaccines), Moderna, Oxford/AstraZeneca, Covaxin, and Sputnik V [1]. Initially collected twice a day, the frequency stabilised to once daily during morning hours (GMT) to manage tweet quotas [1].
Columns
- id: Unique identifier for the tweet [2].
- user_name: The name of the user who posted the tweet [2].
- user_location: The location provided by the user [2].
- user_description: The user's profile description [2].
- user_created: The date the user account was created [2].
- user_followers: The number of followers the user has [2].
- user_friends: The number of friends (accounts followed) the user has [2].
- user_favourites: The number of tweets the user has marked as favourites [2].
- user_verified: A boolean indicating if the user is verified (true/false) [2].
- date: The date and time the tweet was posted [2].
Distribution
The data is typically available in CSV format [3]. The dataset includes a substantial number of tweets, with tweet counts per weekly interval ranging from a low of 354 to a high of 11,489 [2, 4-18]. User data includes counts for user followers, friends, and favourites, with the largest group of users having between 0 and 327,060.96 followers totalling 221,968 accounts [19, 20]. Similarly, for user friends, the largest group (0 to 11,649.22) accounts for 226,631 records [20, 21], and for user favourites, the largest group (0 to 25,992.00) accounts for 202,999 records [22, 23]. Approximately 8% of users are verified, while 92% are not [13].
Usage
This dataset is suitable for various operations and analytical tasks, including:
- Studying the subjects discussed in recent tweets about different vaccine producers [1].
- Performing Natural Language Processing (NLP) tasks such as topic modelling and sentiment analysis [1].
- Analysing the relationship between vaccination progress (which can be observed through other datasets like COVID-19 World Vaccination Progress) and discussions about vaccines on social media [1].
Coverage
The dataset's region of coverage is global [24]. It specifically covers tweets related to the following COVID-19 vaccines: Pfizer/BioNTech, Sinopharm, Sinovac, Moderna, Oxford/AstraZeneca, Covaxin, and Sputnik V [1].
The tweet data spans from 12th December 2020 to 23rd November 2021 [18].
User creation dates, reflecting the age of user accounts contributing to the dataset, range from 15th July 2006 to 22nd November 2021 [25].
License
CCO
Who Can Use It
- Researchers: To conduct studies on public health communication, social media trends, and vaccine perception.
- Data Scientists: For developing and testing NLP models, performing sentiment analysis, and topic modelling.
- Public Health Analysts: To monitor public opinion and discussions related to COVID-19 vaccines.
- Academics: For educational purposes and academic research projects focusing on social media data analysis.
Dataset Name Suggestions
- COVID-19 Vaccine Twitter Discourse
- Global Vaccine Social Media Insights
- Public Opinion on COVID-19 Vaccines (Twitter)
- COVID-19 Vaccine Tweet Repository
Attributes
Original Data Source: COVID-19 All Vaccines Tweets