Public Elon Tweets Daraset
News & Media Articles
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides a detailed collection of public tweets by Elon Musk, a highly influential figure and one of the most followed users on Twitter with over 100 million followers [1]. Due to his constant tweeting, the generated content is highly interesting for various analyses [1]. The dataset is collected daily using tweepy and the Twitter API, ensuring a regularly updated source of his public discourse [1, 2]. It offers a valuable resource for understanding his communication patterns and public engagement.
Columns
The dataset comprises 16 distinct columns, each offering unique insights into Elon Musk's Twitter activity and profile details [1, 2]:
- ID: A unique identifier for each tweet [1].
- User name: The name associated with the tweeting account. This is predominantly "Elon Musk" (99%), with a small percentage appearing as "Mr. Tweet" (1%) [1, 3].
- User location: The stated geographical location of the user. Notably, 82% of entries are null, with "Trøllheim" being the most common non-null location (6%) [2, 3].
- User description: The user's profile description. Around 76% of entries are null, and "nothing" is a common entry (10%) [2, 4].
- User created: The date when the user's Twitter account was created. This consistently shows as 3rd June 2009 [2, 4].
- User followers: The number of followers the user has at the time of the tweet. Values range from approximately 101 million to 143 million, with a mean of 127 million [2, 4, 5].
- User friends: The number of accounts the user is following. Values vary, with a mean of 193 [2, 5, 6].
- User favourites: The total number of tweets the user has favourited. Values range from around 13.5 thousand to 25.7 thousand, with a mean of 18.7 thousand [2, 6, 7].
- User verified: Indicates whether the user's account is verified. Approximately 69% of the entries show "true" [2, 7].
- Date: The date the tweet was posted, ranging from 5th July 2022 to 13th June 2023 [2, 7, 8].
- Text: The actual content of the tweet, with 5,831 unique values [2, 8].
- Hashtags: Hashtags included in the tweet. Almost all entries (100%) are null, with a minimal presence of 'FreeSpeech' [2, 8].
- Source: The client application used to post the tweet, primarily "Twitter for iPhone" (99%) [2, 9].
- Retweets: The number of times a tweet has been retweeted. Values range from 0 to 360,000, with a mean of 5,500 [2, 9].
- Is retweet: Indicates if the tweet is a retweet. All entries (100%) show "false", meaning the dataset contains only original tweets or replies from Elon Musk [2, 10].
Distribution
This dataset is provided in a CSV file format, specifically
elon_musk_tweets.csv
, and has a size of 1.32 MB [2]. It contains 5,904 records or rows, each with 16 columns [2, 11]. The data is designed for daily updates, ensuring its recency and relevance [2].Usage
This dataset is ideal for Natural Language Processing (NLP) enthusiasts and researchers [2]. It can be effectively utilised to test and develop skills in various NLP tools and techniques, given the rich textual content of the tweets [2]. Potential applications include sentiment analysis, topic modelling, trend analysis, and social media analytics related to Elon Musk's public statements.
Coverage
The dataset's time range spans from 5th July 2022 to 13th June 2023 [7, 8]. Geographically, while a "User location" column is present, a large majority of entries (82%) are null, limiting extensive geographic analysis [3]. The demographic scope is primarily focused on a single public figure, Elon Musk, providing deep insights into his digital footprint and interactions.
License
CC0: Public Domain
Who Can Use It
This dataset is particularly useful for:
- Data scientists and machine learning engineers seeking real-world text data for NLP model training and evaluation.
- Researchers interested in public figure communication, social media influence, and digital discourse.
- Academics studying online behaviour, public opinion, and the impact of influential individuals on social platforms.
- Students and learners wanting to practice and apply NLP techniques on a dynamic, high-profile dataset.
Dataset Name Suggestions
- Elon Musk Daily Tweets
- Elon Musk Twitter Activity
- Public Elon Tweets
- Elon Musk Social Data
Attributes
Original Data Source: Public Elon Tweets Daraset