Dark Mode

Home

Data Categories

Medical & Healthcare Data

Global Covid-19 Tweets with Sentiment Analysis

FREE DATASET LIBRARY

Verified Data Provider

£0

Global Covid-19 Tweets with Sentiment Analysis

Data Science and Analytics

Tags and Keywords

Nlp

Deep

Coronavirus

Text

Ensembling

Trusted By

Global Covid-19 Tweets with Sentiment Analysis Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset captures Twitter activity related to Covid-19, focusing on the initial phase of the pandemic from April to June 2020 [1, 2]. It comprises 235,240 worldwide tweets in English, streamed live at a rate of approximately 10,000 tweets per day after the World Health Organisation declared Covid-19 a pandemic [1, 2]. The tweets were collected using relevant hashtags such as #covid-19, #coronavirus, #covid, #covaccine, #lockdown, #homequarantine, #quarantinecenter, #socialdistancing, #stayhome, and #staysafe [1, 2].

The data has undergone pre-processing, which involved converting all tweets to lowercase, removing extra white spaces, numbers, special characters, ASCII characters, URLs, punctuations, and stopwords [2]. Additionally, all instances of 'covid' were converted to 'covid19', and stemming was applied to reduce inflected words to their root forms [2]. Sentiment analysis has been performed on each cleaned tweet using an NLTK-based Sentiment Analyser, providing sentiment scores for positive, negative, and neutral categories, and a compound sentiment score [2]. Tweets are classified as Positive, Negative, or Neutral based on these scores [2].

Columns

id: Unique identifier for the tweet [1].
Tweet ID: Unique identifier for the tweet [2]. (Note: Appears to be the same as 'id')
created_at: The date and time when the tweet was created [1].
Creation Date & Time: The date and time when the tweet was created [2]. (Note: Appears to be the same as 'created_at')
source: The source link from which the tweet was posted [1].
Source Link: The source link from which the tweet was posted [2]. (Note: Appears to be the same as 'source')
original_text: The full text of the original tweet [1].
Original Tweet: The full text of the original tweet [2]. (Note: Appears to be the same as 'original_text')
lang: The language of the tweet [1].
favorite_count: The number of times the tweet was favourited [1].
Favorite Count: The number of times the tweet was favourited [2]. (Note: Appears to be the same as 'favorite_count')
retweet_count: The number of times the tweet was retweeted [1].
Retweet Count: The number of times the tweet was retweeted [2]. (Note: Appears to be the same as 'retweet_count')
original_author: The original author of the tweet [3].
Original Author: The original author of the tweet [2]. (Note: Appears to be the same as 'original_author')
hashtags: Hashtags included in the tweet [3].
Hashtags: Hashtags included in the tweet [2]. (Note: Appears to be the same as 'hashtags')
user_mentions: User mentions within the tweet [3].
User Mentions: User mentions within the tweet [2]. (Note: Appears to be the same as 'user_mentions')
Place: Location associated with the tweet [2].

Distribution

The dataset consists of 235,240 tweets from the first phase of collection [1, 2]. Data files are typically provided in CSV format [4]. The tweets were collected from 19th April to 20th June 2020 [1].

Usage

This dataset is ideal for various data science and analytics applications, including Natural Language Processing (NLP), Deep Learning, Text Classification, and Ensembling [2]. Its pre-processed nature and included sentiment scores make it particularly useful for sentiment analysis research related to public opinion during the Covid-19 pandemic [2].

Coverage

The dataset covers a time range from 19th April to 20th June 2020 [1]. It includes worldwide tweets [2] and is limited to English language content [2]. Tweet sources are primarily Twitter for Android (31%) and Twitter for iPhone (28%), with 41% originating from other sources [5].

License

CC-BY-SA

Who Can Use It

Data Scientists and Analysts: For conducting social media analysis, trend identification, and public sentiment tracking during the pandemic [2].
Researchers in NLP and Machine Learning: To train and evaluate text classification models, conduct deep learning experiments, and explore ensembling techniques [2].
Public Health Researchers: To understand public response, concerns, and sentiment towards Covid-19, lockdowns, and vaccines [2].
Academics and Students: For academic projects, dissertations, and learning about real-world social media data analysis and sentiment classification [2].

Dataset Name Suggestions

COVID-19 Twitter Sentiment (Apr-Jun 2020)
Pandemic Twitter Activity Dataset (Phase 1)
Global Covid-19 Tweets with Sentiment Analysis
Social Media Response to Covid-19: April-June 2020
Twitter Covid-19 Discourse (Early Pandemic)

Attributes

Original Data Source: Covid-19 Twitter Dataset

Listing Stats

VIEWS

DOWNLOADS

LISTED

05/06/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

FREE DATASET LIBRARY

£0

Global Covid-19 Tweets with Sentiment Analysis

Data Science and Analytics

Tags and Keywords

Nlp

Deep

Coronavirus

Text

Ensembling

Trusted By

Free

About

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Listing Stats

Free

Download Dataset in CSV Format

RECOMMENDED DATASETS