Arabic Tweet Engagement Dataset
Social Media and Networking
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset was assembled with the aim of understanding the characteristics that contribute to a tweet achieving viral status. It contains 100,000 viral tweets in Arabic, collected between 17 June 2022 and 20 September 2022. Tweets are classified as viral if they have a minimum of 10 replies, 500 likes, and 10 retweets. The collection includes both the actual tweet content and associated engagement metrics, as well as separate information about the user profiles. It is an ideal resource for developing models that predict social media trends and virality.
Columns
The dataset primarily consists of two CSV files:
Tweets.csv
and Users.csv
. The Tweets.csv
file provides details about each tweet and includes the following columns:- Date: The date and time the tweet was posted.
- User: The username of the individual who published the tweet.
- Tweet: The actual text content of the tweet.
- Likes: The numerical count of likes the tweet received.
- Retweets: The numerical count of retweets the tweet received.
- Replies: The numerical count of replies the tweet received.
The
Users.csv
file provides information related to each user's profile.Distribution
The dataset is structured in a tabular format, typically provided as CSV files. It contains 100,000 viral tweets. The data covers a period from 17 June 2022 to 20 September 2022. There are 91,495 unique users identified within the dataset.
Usage
This dataset is well-suited for various applications, particularly in the fields of Natural Language Processing (NLP), Artificial Intelligence (AI), and Machine Learning (ML). Potential uses include:
- Developing models to predict tweet virality.
- Analysing social media trends and user engagement patterns in the Arabic language.
- Training algorithms for content recommendation.
- Researching linguistic features of viral content.
- Sentiment analysis and topic modelling on social media data.
Coverage
The dataset's scope is global, focusing on Arabic language tweets. It covers a specific time range from 17 June 2022 until 20 September 2022. There are no specific notes on data availability for certain groups or years beyond this general description.
License
CC-BY-SA
Who Can Use It
This dataset is valuable for a range of users, including:
- Data Scientists and Analysts: For building predictive models and extracting insights from social media data.
- Researchers: In linguistics, social sciences, and computer science, particularly for studies on online communication and virality.
- Machine Learning Engineers: For training and evaluating NLP models on real-world text data.
- Marketing and Social Media Strategists: To understand what makes content viral and inform campaign strategies.
Dataset Name Suggestions
- Arabic Viral Tweets Archive
- MENA Social Media Virality Data
- Arabic Tweet Engagement Dataset
Attributes
Original Data Source: Arabic Viral Tweets (عربي)