Meghan Markle Twitter Sentiment
Entertainment & Media Consumption
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset captures Twitter reactions and discussions surrounding the highly publicised interview of Prince Harry and Meghan Markle with Oprah. It provides raw tweet data, which can be particularly useful for those looking to apply or improve their Natural Language Processing (NLP) skills. The data collection involved a multi-day extraction process to ensure a substantial volume of tweets was gathered efficiently. It serves as a valuable resource for analysing public sentiment, identifying key discussion topics such as racism or mental health, and classifying tweets based on specific themes related to the interview.
Columns
The dataset includes the following columns:
- User: The Twitter user's display name.
- User ID: A unique identifier for each user. There are 26,945 distinct user IDs in the dataset.
- Location: The self-reported location of the user. About 30% of users have no specified location, 2% are from London, England, and the remaining 68% are from various other locations (36,009 unique entries).
- Tweet: The full text of the tweet itself.
- Num of Friend: The number of friends (accounts followed) the user has.
- Num of Followers: The number of followers the user has.
- Total Tweets by user: The total count of tweets posted by the user.
- Account Created at: The date when the user's Twitter account was created.
- Tweet Created at: The specific date and time when the tweet was posted.
- Num of Retweet: The number of times the tweet has been retweeted. The majority of tweets (over 52,000) have up to 606 retweets, with a few reaching over 12,000.
- hashtags in the tweet: Any hashtags embedded within the tweet text. Approximately 27% of tweets contain hashtags, with "MeghanMarkle" being a common one (4%).
Distribution
The data file is typically in CSV format. This dataset contains a significant volume of tweets, estimated to be over 52,700 records, based on the distribution of retweet counts. The extraction process for this dataset took approximately six hours. Specific numbers for records are derived from label counts provided for various metrics.
Usage
This dataset is ideal for:
- Sentiment analysis of public opinion before, during, and after the interview (e.g., positive, neutral, negative sentiments, and emoticon usage).
- Topic modelling to identify the most discussed subjects, such as racism or mental health, related to the interview.
- Classifying tweets based on main topics discussed during the interview.
- Understanding social media engagement metrics like the number of retweets for specific content.
- Practising and enhancing NLP skills through real-world social media data.
Coverage
The dataset is global in its user reach, although user location data indicates that a notable portion of users do not specify a location, and a small percentage are from London, England. The Twitter accounts included in the dataset were created between 16th July 2006 and 12th March 2021. The tweets themselves were created within a more focused timeframe, specifically from 4th March 2021 to 12th March 2021.
License
CC0
Who Can Use It
This dataset is suitable for:
- Data Scientists and Analysts interested in social media analytics and trend identification.
- Researchers studying public opinion, media impact, or celebrity culture.
- Natural Language Processing (NLP) practitioners looking for real-world text data for sentiment analysis, topic extraction, or text classification model training.
- Students undertaking projects in data science, social sciences, or communication studies.
Dataset Name Suggestions
- Meghan Markle Twitter Sentiment
- Royal Interview Twitter Reactions
- Meghan Markle Tweets Analysis
- Oprah Interview Twitter Data
- Celebrity Social Media Pulse
Attributes
Original Data Source: MeghanMarkle Tweets