Opendatabay APP

Arabic Tweet Engagement Dataset

Social Media and Networking

Tags and Keywords

Tabular

Text

Intermediate

Nlp

Global

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Arabic Tweet Engagement Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset was assembled with the aim of understanding the characteristics that contribute to a tweet achieving viral status. It contains 100,000 viral tweets in Arabic, collected between 17 June 2022 and 20 September 2022. Tweets are classified as viral if they have a minimum of 10 replies, 500 likes, and 10 retweets. The collection includes both the actual tweet content and associated engagement metrics, as well as separate information about the user profiles. It is an ideal resource for developing models that predict social media trends and virality.

Columns

The dataset primarily consists of two CSV files: Tweets.csv and Users.csv. The Tweets.csv file provides details about each tweet and includes the following columns:
  • Date: The date and time the tweet was posted.
  • User: The username of the individual who published the tweet.
  • Tweet: The actual text content of the tweet.
  • Likes: The numerical count of likes the tweet received.
  • Retweets: The numerical count of retweets the tweet received.
  • Replies: The numerical count of replies the tweet received.
The Users.csv file provides information related to each user's profile.

Distribution

The dataset is structured in a tabular format, typically provided as CSV files. It contains 100,000 viral tweets. The data covers a period from 17 June 2022 to 20 September 2022. There are 91,495 unique users identified within the dataset.

Usage

This dataset is well-suited for various applications, particularly in the fields of Natural Language Processing (NLP), Artificial Intelligence (AI), and Machine Learning (ML). Potential uses include:
  • Developing models to predict tweet virality.
  • Analysing social media trends and user engagement patterns in the Arabic language.
  • Training algorithms for content recommendation.
  • Researching linguistic features of viral content.
  • Sentiment analysis and topic modelling on social media data.

Coverage

The dataset's scope is global, focusing on Arabic language tweets. It covers a specific time range from 17 June 2022 until 20 September 2022. There are no specific notes on data availability for certain groups or years beyond this general description.

License

CC-BY-SA

Who Can Use It

This dataset is valuable for a range of users, including:
  • Data Scientists and Analysts: For building predictive models and extracting insights from social media data.
  • Researchers: In linguistics, social sciences, and computer science, particularly for studies on online communication and virality.
  • Machine Learning Engineers: For training and evaluating NLP models on real-world text data.
  • Marketing and Social Media Strategists: To understand what makes content viral and inform campaign strategies.

Dataset Name Suggestions

  • Arabic Viral Tweets Archive
  • MENA Social Media Virality Data
  • Arabic Tweet Engagement Dataset

Attributes

Original Data Source: Arabic Viral Tweets (عربي)

Listing Stats

VIEWS

2

DOWNLOADS

0

LISTED

11/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free