Annotated COVID-19 Vaccine Tweets
Healthcare Providers & Services Utilization
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset contains a collection of tweets related to Covid-19 vaccines, each with a manually annotated sentiment. The sentiments are categorised as negative, neutral, or positive, with numerical labels for each category. It serves as a valuable resource for understanding public opinion and sentiment surrounding Covid-19 vaccines on social media. The tweet IDs were initially gathered from a dataset by Gabriel Preda and then hydrated to obtain the full tweet texts. The original collection included tweets about various vaccines, such as Pfizer/BioNTech, Sinopharm, Sinovac, Moderna, Oxford/Astra-Zeneca, Covaxin, and Sputnik V.
Columns
- tweet_id: The unique identifier for each tweet.
- label: The manually annotated sentiment label for the tweet. A value of 1 indicates a negative sentiment, 2 for neutral, and 3 for positive.
- tweet_text: The full text content of the tweet.
Distribution
The dataset is typically provided in a CSV file format, with a sample file available separately on the platform. While the exact total number of rows or records is not specified, information on label counts across various tweet ID ranges is available, indicating a substantial collection of tweets. For example, there are 5991 unique tweet IDs, with sentiment label distributions showing, for instance, 420 negative labels (1.00 - 1.20), 3,680 neutral labels (2.00 - 2.20), and 1,900 positive labels (2.80 - 3.00).
Usage
This dataset is ideal for various applications including Natural Language Processing (NLP) tasks, sentiment analysis, and public health research. It can be used to monitor and analyse public perception of Covid-19 vaccines, identify trends in sentiment, and support research into social media dynamics during a public health crisis.
Coverage
The dataset's coverage is global, encompassing tweets from various regions. While specific timeframes for tweet collection are not explicitly detailed, the provided ranges of tweet IDs suggest coverage over a certain period. Demographic information specific to the tweet authors is not included within the dataset itself.
License
CC-BY-SA
Who Can Use It
This dataset is suitable for a wide range of users, including:
- Researchers and academics studying public health communication and social media sentiment.
- Data scientists and NLP specialists working on sentiment analysis models and text classification.
- Public health organisations and policymakers interested in understanding public perception and addressing vaccine hesitancy.
- Journalists and media analysts tracking public discourse on health topics.
Dataset Name Suggestions
- COVID-19 Vaccine Public Sentiment Tweets
- Vaccine Sentiment Analysis Dataset
- Social Media Vaccine Perception Data
- Annotated COVID-19 Vaccine Tweets
Attributes
Original Data Source: Covid-19 Vaccine Tweets with Sentiment Annotation