COVID-19 Twitter Engagement Data
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset focuses on Twitter engagement metrics related to the Coronavirus disease (COVID-19), an infectious disease caused by the SARS-CoV-2 virus [1]. It provides a detailed collection of tweets, including their text content, the accounts that posted them, any hashtags used, and the geographical locations associated with the accounts [1]. The dataset is valuable for understanding public discourse, information dissemination, and engagement patterns on Twitter concerning COVID-19, particularly for analysing how people experience mild to moderate symptoms and recover, or require medical attention [1].
Columns
- Datetime: Represents the exact date and time a tweet was posted [2].
- Tweet Id: A unique identifier assigned to each tweet [2].
- Text: The actual content of the tweet [2].
- Username: The display name of the tweet author [2].
- Permalink: The direct link to the tweet on Twitter [2].
- User: A link to the author's Twitter account [2].
- Outlinks: Any external links included within the tweet [2].
- CountLinks: The number of links present in the tweet [2].
- ReplyCount: The total number of replies to that specific tweet [2].
- RetweetCount: The total number of retweets of that specific tweet [2].
- DateTime Count: A daily count of tweets, aggregated by date ranges [2].
- Label Count: A count associated with specific ranges of tweet IDs or other engagement metrics, indicating the distribution of tweets within those ranges [3-5].
Distribution
The dataset is structured with daily tweet counts and covers a period from 10 January 2020 to 28 February 2020 [2, 6, 7]. It includes approximately 179,040 daily tweet entries during this timeframe, derived from the sum of daily counts and tweet ID counts [2, 3, 6-11]. Tweet activity shows distinct peaks, with notable increases in late January (e.g., 6,091 tweets between 23-24 January 2020) [2] and a significant surge in late February, reaching 47,643 tweets between 26-27 February 2020, followed by 42,289 and 44,824 in subsequent days [7, 10, 11].
The distribution of certain tweet engagement metrics, such as replies or retweets, indicates that a substantial majority of tweets (over 152,500 records) fall within lower engagement ranges (e.g., 0-43 or 0-1628.96), with fewer tweets showing very high engagement (e.g., only 1 record between 79819.04-81448.00) [4, 5]. The data file would typically be in CSV format [12].
Usage
This dataset is ideal for:
- Data Science and Analytics projects focused on social media [1].
- Visualization of tweet trends and engagement over time.
- Exploratory data analysis to uncover patterns in COVID-19 related discussions [1].
- Natural Language Processing (NLP) tasks, such as sentiment analysis or topic modelling on tweet content [1].
- Data cleaning and preparation exercises for social media data [1].
Coverage
The dataset has a global geographic scope [13]. It covers tweet data from 10 January 2020 to 28 February 2020 [2, 6, 7]. The content is specific to the Coronavirus disease (COVID-19) [1].
License
CC0
Who Can Use It
This dataset is particularly useful for:
- Data scientists and analysts interested in social media trends and public health discourse [1].
- Researchers studying information spread and public sentiment during health crises.
- Developers building AI and LLM data solutions [13].
- Individuals interested in exploratory analysis and data visualization of real-world social media data [1].
Dataset Name Suggestions
- COVID-19 Twitter Engagement Data
- SARS-CoV-2 Tweet Activity Log
- Pandemic Social Media Discourse
- Coronavirus Tweets Analytics
- Global COVID-19 Tweet Metrics
Attributes
Original Data Source: Covid_19 Tweets Dataset