Public Tweet Analysis Dataset
Social Media and Posts
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Contains a sample of tweets collected during July and August 2019, providing a snapshot of Twitter activity over this period. The data was gathered using tweepy, a Python library designed for accessing the Twitter API. This dataset offers insights into tweet content, user interactions, and engagement metrics, making it valuable for analysing social media trends and user behaviour.
Columns
- Tweet Id: A unique identifier for each individual tweet.
- Tweet URL: The direct web address pointing to the tweet on Twitter.
- Tweet Posted Time (UTC): The timestamp indicating when the tweet was published, specified in Coordinated Universal Time. The dates range from 1 July 2019 to 30 August 2019.
- Tweet Content: The actual text and any associated media links or hashtags within the tweet.
- Tweet Type: Categorises the tweet, predominantly as "ReTweet" (72%) or "Tweet" (25%).
- Client: Identifies the application or platform used to post the tweet, such as "Twitter for iPhone" (45%) or "Twitter for Android" (16%).
- Retweets Received: The total count of times the tweet was retweeted by other users.
- Likes Received: The total count of 'likes' or 'favourites' the tweet accumulated.
- Tweet Location: The geographical location from which the tweet was posted, with "Brussels" being a common entry among the 105 unique locations. Approximately 16% of these values are missing.
- Tweet Language: The detected language of the tweet, with English accounting for 98% of the entries.
- User Id: A unique identifier assigned to the Twitter user who posted the tweet.
- Name: The display name of the Twitter user.
- Username: The user's unique Twitter handle (e.g., @username).
- User Bio: The short descriptive text from the user's Twitter profile.
- Verified or Non-Verified: Indicates whether the user's Twitter account is officially verified.
- Profile URL: The direct link to the user's Twitter profile page.
- Protected or Non-protected: States whether the user's Twitter account is set to private.
- User Followers: The number of followers the user has on Twitter.
- User Following: The number of accounts the user is following on Twitter.
- User Account Creation Date: The date when the user's Twitter account was established.
- Impressions: The total number of times the tweet was viewed.
Distribution
The data is provided as a single CSV file, named
sample.csv
, with a size of 253.93 kB. It comprises 386 rows (records) and 21 columns, detailing various aspects of the tweets and their associated user information.Usage
This dataset is ideally suited for:
- Social Media Analysis: Investigating trends, popular topics, and user engagement patterns on Twitter.
- Text Mining and Natural Language Processing (NLP): Developing and testing algorithms for sentiment analysis, topic modelling, and content categorisation.
- Data Cleaning Exercises: Practising data preparation techniques, especially for unstructured text data.
- Understanding User Behaviour: Examining how different factors, such as tweet content or user characteristics, correlate with engagement metrics like retweets and likes.
Coverage
The data spans a time range from 1 July 2019 to 30 August 2019. Geographically, while locations are varied, "Brussels" is notably frequent among the tweet locations. The primary language of the tweets is English (98%), with a small percentage in German and other languages. The dataset does not provide specific demographic breakdowns but includes user-level attributes that could be used for inferring user profiles.
License
CC0: Public Domain
Who Can Use It
- Researchers: To conduct studies on social media dynamics, public opinion, and communication patterns.
- Data Scientists and Analysts: For developing and refining models for social media content analysis, engagement prediction, and user segmentation.
- Students and Educators: As a practical dataset for learning about data collection, cleaning, and analysis in the context of social media.
- Developers: To prototype applications that process or visualise real-time social media data.
Dataset Name Suggestions
- Twitter Sample Tweets (July-August 2019)
- Social Media Activity Snapshot 2019
- Public Twitter Data (Jul-Aug 2019)
- Historical Tweet Interactions Dataset
Attributes
Original Data Source: Public Tweet Analysis Dataset