Twitter COVID-19 Hashtag Data
Public Health & Epidemiology
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset comprises tweets collected using the Twitter API, specifically filtered for the #covid19 hashtag. The collection initiated on 25th July 2020, starting with an initial batch of 17,000 tweets and is designed for daily updates. The data aims to provide insights into subjects related to this hashtag, allowing for analysis of geographical distribution, sentiment evaluation, and trend identification.
-
Columns
-
user_name: The name of the Twitter user.
-
user_location: The declared location of the user.
-
user_description: The biographical description provided by the user.
-
user_created: The date and time when the user's account was created.
-
user_followers: The count of followers the user has.
-
user_friends: The count of accounts the user is following.
-
user_favourites: The total number of likes (favourites) accumulated by the user.
-
user_verified: A boolean indicating whether the user's account is verified by Twitter.
-
date: The date and time the tweet was posted.
-
text: The full content of the tweet.
-
hashtags: A list of hashtags included in the tweet.
-
source: The application or platform used to post the tweet (e.g., Twitter Web App, Twitter for Android).
-
is_retweet: A boolean indicating if the tweet is a retweet (all tweets in this dataset are original, not retweets).
-
Distribution
The dataset is provided in CSV format and is approximately 68.71 MB in size. It contains 13 columns and includes a total of 179,108 records. Updates are expected on a monthly basis.
-
Usage
This dataset is ideal for analysing public discourse and sentiment surrounding the COVID-19 pandemic on Twitter. It can be utilised to explore themes associated with the #covid19 hashtag, investigate the geographical spread of discussions, perform sentiment analysis on tweets, and identify emerging trends related to the pandemic.
-
Coverage
The data collection began on 25th July 2020 and extends through to 30th August 2020. While user locations are included, approximately 21% are null, and the most common location specified is India (2%), with the majority categorised as 'Other' (77%), indicating a global yet varied geographical scope.
-
License
CC0: Public Domain
-
Who Can Use It
This dataset is suitable for:
-
Researchers studying social media trends and public health communication.
-
Data scientists performing natural language processing (NLP) and sentiment analysis.
-
Public health officials seeking insights into public perception and information dissemination during a pandemic.
-
Social scientists investigating online communities and their interactions during significant global events.
-
Dataset Name Suggestions
-
COVID-19 Twitter Activity Dataset
-
#COVID19 Tweets Public Opinion Data
-
Global COVID-19 Tweet Analysis
-
Twitter COVID-19 Hashtag Data
-
Pandemic Social Media Discourse
-
Attributes
Original Data Source: COVID19 Tweets