Opendatabay APP

Log4Shell Twitter Conversation Data

Social Media and Posts

Tags and Keywords

Log4shell

Twitter

Vulnerability

Log4j

Cybersecurity

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Log4Shell Twitter Conversation Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This collection captures social media discourse surrounding the critical zero-day Log4Shell vulnerability (CVE-2021-44228), which affects the popular Java logging framework Log4j. The data tracks public and professional interest in this severe arbitrary code execution flaw. The original purpose of the collection was to determine if interest in the vulnerability was beginning to subside.

Columns

The dataset includes twelve distinct columns, providing detailed information about the tweets and their authors:
  • status_id: The unique identifier for the tweet status.
  • status_date: The specific date when the status was tweeted.
  • text: The content (body) of the tweet itself.
  • favourite_count: The number of times the tweet was liked by users.
  • retweet_count: The number of times the tweet was shared (retweeted).
  • user_name: The display name of the user who posted the tweet.
  • screen_name: The user's Twitter handle (e.g., @name).
  • user_follower_count: The count of followers associated with the user.
  • user_friends_count: The number of users the poster is following.
  • user_created_date: The date the user's account was initially created.
  • user_location: The location provided by the user, if available.
  • source: The platform or application used to post the tweet (e.g., Twitter Web App, Twitter for iPhone).

Distribution

The data is provided in a CSV format file named log4shell_tweets.csv, which has a size of 23.08 MB. It contains twelve columns and approximately 81.0k valid records. The data is statically collected, with an expected update frequency of 'Never'. While most fields show 100% validity, approximately 22% of entries are missing information in the user_location column.

Usage

This data is ideal for performing sentiment analysis and tracking public response to major cybersecurity incidents. It can be utilised by security researchers to study the diffusion and decline of information regarding zero-day vulnerabilities like Log4Shell. It is also suitable for analysing social media activity patterns, measuring engagement metrics, and determining the influence of different users during high-profile security events.

Coverage

The dataset captures tweets generated between 9 December 2021 and 24 December 2021. The total number of valid records is approximately 81,000. Geographic scope is global, although the explicitly noted location of "United States" accounts for the highest percentage (1%) of available location data. User accounts span a wide creation date range, from 29 April 2006 up to the data collection date of 24 December 2021.

License

CC0: Public Domain

Who Can Use It

  • Data Scientists: For training models that detect technical security discussions or monitor public risk perception.
  • Cybersecurity Analysts: To track community engagement, influence, and the lifespan of social media attention on specific vulnerabilities.
  • Social Media Researchers: To conduct studies on user demographics (follower counts, account age) and how different sources (e.g., mobile vs. web app) contribute to the conversation.

Dataset Name Suggestions

  • Log4Shell Twitter Conversation Data
  • CVE-2021-44228 Social Media Analysis
  • Log4J Vulnerability Tweet Stream
  • Zero-Day Twitter Discourse 2021

Attributes

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

26/10/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in CSV Format