World Cup Sentiment Dataset
Sports & Recreation
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides a collection of tweets about the FIFA World Cup 2022 from Twitter, focusing on the initial day of the event. It is designed for various analytical tasks, particularly Natural Language Processing (NLP) projects such as sentiment analysis and text classification. Data science applications, including data visualisation, data preprocessing, and machine learning analysis, are also well-suited for this dataset. It comprises 30,000 tweets and was created using the Snscrape tool alongside the cardiffnlp/twitter-roberta-base-sentiment-latest model from Hugging Face Hub. The tweets are in English and include the hashtag #WorldCup2022.
Columns
The dataset contains the following key columns:
- Id: A unique identifier for each tweet.
- Date Created: The date and time when the tweet was originally posted.
- Number of Likes: The total count of likes a tweet received.
- Source of Tweet: Indicates the platform or device from which the tweet was published, such as "Twitter for iPhone" or "Twitter for Android."
- Tweet: The full text content of the tweet.
- Sentiment: The inferred sentiment of the tweet, categorised as positive, neutral, or other.
Distribution
The dataset is provided as a Comma Separated Values (CSV) file, specifically named "fifa_world_cup_2022_tweets.csv." It contains 30,000 individual tweets. The data covers a range of likes, with values up to 22,523, and shows tweet activity predominantly on 20th and 21st November 2022. Regarding tweet sources, 42% originate from Twitter for iPhone, 30% from Twitter for Android, and 28% from other sources. Sentiment distribution shows 38% positive, 37% neutral, and 26% categorised as other.
Usage
This dataset is ideal for:
- NLP projects: Performing sentiment analysis to understand public opinion during the World Cup, or text classification for categorising tweets.
- Data science projects: Conducting data visualisation to explore tweet trends, data preprocessing for preparing text data, and machine learning analysis for building predictive models based on social media interactions.
Coverage
The dataset's scope is global, as it pertains to the FIFA World Cup 2022, a major international sporting event held in Qatar. The temporal coverage focuses on the first day of the tournament, with specific dates observed as 20th November and 21st November 2022. The tweets included are specifically in English. There are no specific demographic notes beyond the language of the tweets.
License
CCO
Who Can Use It
This dataset is suitable for a wide range of users, including:
- Data scientists looking for real-world social media data for machine learning and deep learning applications.
- Researchers interested in public sentiment and social media trends during major events.
- Developers building applications that leverage natural language processing.
- Students learning about data analysis, NLP, and machine learning.
Dataset Name Suggestions
- FIFA World Cup 2022 Twitter Analysis
- Qatar 2022 Tweets Dataset
- World Cup Sentiment Data
- Football Social Media Insights
- FIFA 2022 Twitter Activity
Attributes
Original Data Source: FIFA World Cup 2022 Tweets