Opendatabay APP

COVID-19 Vaccine Twitter Conversation Data

Patient Health Records & Digital Health

Tags and Keywords

Covid

Vaccine

Twitter

Sentiment

Health

Trusted By
Trusted by company1Trusted by company2Trusted by company3
COVID-19 Vaccine Twitter Conversation Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

A continually updated collection of trending Twitter conversations centred around the COVID-19 vaccine. The data provides social network activity scraped using the Twitter API and a Python script, specifically targeting the #covid vaccine hashtag. It captures public discourse regarding the pandemic and vaccination efforts, offering material for social listening, trend analysis, and social media dynamics studies. Collection of this social activity started on 22 October 2020.

Columns

The product file contains 36 columns detailing various attributes of the collected tweets. Key fields include:
  • id: The unique identifier for the tweet.
  • conversation_id: The identifier linking related tweets within a specific discussion.
  • created_at, date, and time: Timestamp information showing when the tweet was posted.
  • user_id, username, name: Identifiers and display names of the user who published the post.
  • tweet: The primary text content of the social media post.
  • language: The detected language of the tweet content, with English being highly prevalent (96%).
  • mentions: A list of user handles included in the post.
  • urls: Any external links or URLs embedded within the post.
  • replies_count, retweets_count, likes_count: Metrics quantifying user engagement and interaction with the tweet.
  • hashtags and cashtags: Supplementary tagging information.
  • video and photos: Indicators for attached multimedia content.

Distribution

The raw data is available in a single CSV file, measuring approximately 111.9 MB. The resource contains a large list of more than 2 lakh (210,000) collected tweets. Each record is structured across 36 distinct attributes. While collection was managed daily, future updates are expected annually.

Usage

This resource is highly suitable for various analytical purposes:
  • Evaluating public sentiment towards the COVID-19 vaccine and related policies.
  • Identifying emerging subjects and trending discussion topics linked to the main hashtag.
  • Tracking temporal trends in social media activity surrounding vaccine development.
  • Researching online communities and health-related communication strategies.
  • Developing tools for geopolitical conversation analysis, provided techniques account for the scarcity of explicit geolocation data.

Coverage

The data captures tweets over a period ranging from 12 February 2020 up to 22 October 2020, which was the start date of continuous collection. The majority of the content is in English. While geolocation fields like place, geo, and near are largely missing, the data frequently indicates a single dominant timezone (530, potentially India Standard Time), which accounts for almost all included tweets.

License

CC0: Public Domain

Who Can Use It

  • Social Scientists: To analyse discourse patterns and emotional valence (sentiment) in public health crises.
  • Public Health Agencies: To monitor public reactions and identify potential areas of concern regarding vaccination campaigns.
  • Technology Developers: For building and testing Natural Language Processing (NLP) models focused on identifying specific trends or user behaviours on social platforms.
  • Academic Researchers: To study digital sociology, health informatics, and internet phenomena.

Dataset Name Suggestions

  • COVID-19 Vaccine Twitter Conversation Data
  • Social Media Discourse on Vaccine (Oct 2020)
  • Global Vaccine Trending Tweets
  • Online Health Dialogue Data Set

Attributes

Listing Stats

VIEWS

9

DOWNLOADS

0

LISTED

20/10/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in CSV Format