Global New Year Tweets Dataset
Social Media and Networking
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset contains a collection of approximately 100,000 tweets scraped from the Twitter API, specifically mentioning keywords related to "New Year" [1]. The tweets were collected during the evening and night of 31st December 2021 [1, 2]. The scraping process was conducted over several hours to prevent a concentration of tweets from a single timezone or country, aiming for a broad geographical representation [1]. To ensure focus on original content, retweets and quote tweets from other users were intentionally excluded [1, 2]. This dataset is ideal for analysing public sentiment and social trends around the New Year period [1].
Columns
- Tweet number in the dataset: An internal tracking number for tweets within this specific dataset, provided to offer a smaller identifier compared to the large numerical Twitter IDs [1, 2].
- author_id: The unique identification number assigned to the author of each tweet by Twitter [1, 2].
- id: The unique identification number assigned to the tweet itself by Twitter [1, 2].
- text: The full content of the tweet. This column may include various elements such as emojis, external links, and mentions of other users [1, 2].
- username: The publicly visible username of the tweet's author [1, 2].
Distribution
The dataset is typically provided in CSV format [3]. It comprises approximately 110,000 records [1, 4, 5], representing a significant volume of social media posts. For instance, the 'Tweet number in the dataset' column has over 110,000 unique values [5].
Usage
This dataset is particularly suitable for:
- Conducting sentiment analysis to understand public opinion and feelings about the start of the New Year [1].
- Natural Language Processing (NLP) tasks, such as topic modelling, text classification, and entity recognition.
- Social media trend analysis specific to the New Year period.
- Research into public discourse during significant global events.
Coverage
- Time Range: Data was collected on the evening and night of 31st December 2021 [1, 2].
- Geographic Scope: The collection methodology, involving scraping over several hours, aimed to avoid geographical clustering, suggesting a worldwide coverage of tweets from various time zones [1].
- Demographic Scope: The dataset represents public tweets from general Twitter users. Specific demographic details of the authors are not available.
License
CC0
Who Can Use It
- Data scientists and machine learning engineers for developing and testing NLP models.
- Academic researchers studying social media behaviour, public opinion, and linguistic patterns.
- Marketing and PR professionals seeking insights into consumer sentiment during holiday periods.
- Analysts interested in event-driven social media activity.
Dataset Name Suggestions
- New Year's Eve Tweets 2021
- 2021 New Year Twitter Data
- New Year Sentiment Tweets
- Global New Year Tweets
Attributes
Original Data Source: New Years 2021 Tweets