Social Media Political Sentiment
Social Media and Posts
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This collection focuses on social media interaction related to Donald Trump during the highly scrutinised US Election of 2020, an election that occurred during a global pandemic. The data was specifically created using a Twitter crawler implementation, leveraging the Twitter API and Tweepy library, as part of an Artificial Intelligence coursework project. It is perfectly suited for performing sentiment analysis and understanding public reaction in the immediate post-voting period. The dataset is a raw output designed for academic and learning purposes, requiring data cleaning before any deep analysis can be performed.
Columns
The dataset includes 13 distinct features, focusing on both the tweet content and the user who generated it:
- username: The unique Twitter handle of the user.
- accDesc: The description provided on the user’s profile.
- location: The reported location from which the tweet originated, though this field has a high degree of missing data.
- following: The total count of accounts the user is following.
- followers: The total number of followers the user possesses.
- totaltweets: The overall number of tweets created by the user.
- usercreated: The date the user originally registered their Twitter account.
- tweetcreated: The specific date and time the tweet was posted.
- favouritecount: The tweet's 'heart' count, which is equivalent to a 'like' on platforms such as Facebook.
- retweetcount: The total number of times the tweet was retweeted, equivalent to a 'share' on Facebook.
- text: The main body of the tweet content, which is the key field for sentiment analysis.
- tweetsource: The device or application used to create the tweet (e.g., "Twitter for iPhone").
- hashtags: Associated hashtags in JSON format.
Distribution
This raw dataset contains a total of 247,500 rows of entries and 13 columns, resulting in 3,217,500 cells of data. The data file is typically supplied in a CSV format. The gathering process involved deploying the crawler to collect 2500 tweets every 15 minutes. It is important to note that because the data was gathered at minimum 15-minute intervals, the date distribution is not equal and may not capture every tweet generated within the covered time span.
Usage
The primary use case for this data is Sentiment Analysis, allowing researchers to gauge the public mood and emotional response to political events and figures following the election. It is also highly suitable for machine learning training, academic research projects, and other non-commercial learning activities, especially in the field of Artificial Intelligence.
Coverage
The data covers tweets captured between 4th November 2020 and 11th November 2020, focusing on the immediate post-voting day period of the US Election. Only tweets that contained specific keywords, including "Trump," "DonalTrump," or "realDonalTrump," were captured. Geographically, the location data is highly inconsistent, with a significant majority of entries missing this detail.
License
CC0: Public Domain
Who Can Use It
Intended users include students, academics, researchers, and data science enthusiasts looking for real-world, politically charged textual data. The dataset is explicitly designated for academic, learning, and non-commercial purposes, such as building classification models or analysing political communication patterns.
Dataset Name Suggestions
- Trump Tweets Sentiment 2020
- Post-Election Twitter Data
- US Election 2020 Trump Tweets
- Social Media Political Sentiment
Attributes
Original Data Source: Social Media Political Sentiment
Loading...
