Opendatabay APP

Global New Year Tweets Dataset

Social Media and Networking

Tags and Keywords

Online

Text

Social

Nlp

Newyear

Sentiment

Twitter

Tweets

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Global New Year Tweets Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset contains a collection of approximately 100,000 tweets scraped from the Twitter API, specifically mentioning keywords related to "New Year" [1]. The tweets were collected during the evening and night of 31st December 2021 [1, 2]. The scraping process was conducted over several hours to prevent a concentration of tweets from a single timezone or country, aiming for a broad geographical representation [1]. To ensure focus on original content, retweets and quote tweets from other users were intentionally excluded [1, 2]. This dataset is ideal for analysing public sentiment and social trends around the New Year period [1].

Columns

  • Tweet number in the dataset: An internal tracking number for tweets within this specific dataset, provided to offer a smaller identifier compared to the large numerical Twitter IDs [1, 2].
  • author_id: The unique identification number assigned to the author of each tweet by Twitter [1, 2].
  • id: The unique identification number assigned to the tweet itself by Twitter [1, 2].
  • text: The full content of the tweet. This column may include various elements such as emojis, external links, and mentions of other users [1, 2].
  • username: The publicly visible username of the tweet's author [1, 2].

Distribution

The dataset is typically provided in CSV format [3]. It comprises approximately 110,000 records [1, 4, 5], representing a significant volume of social media posts. For instance, the 'Tweet number in the dataset' column has over 110,000 unique values [5].

Usage

This dataset is particularly suitable for:
  • Conducting sentiment analysis to understand public opinion and feelings about the start of the New Year [1].
  • Natural Language Processing (NLP) tasks, such as topic modelling, text classification, and entity recognition.
  • Social media trend analysis specific to the New Year period.
  • Research into public discourse during significant global events.

Coverage

  • Time Range: Data was collected on the evening and night of 31st December 2021 [1, 2].
  • Geographic Scope: The collection methodology, involving scraping over several hours, aimed to avoid geographical clustering, suggesting a worldwide coverage of tweets from various time zones [1].
  • Demographic Scope: The dataset represents public tweets from general Twitter users. Specific demographic details of the authors are not available.

License

CC0

Who Can Use It

  • Data scientists and machine learning engineers for developing and testing NLP models.
  • Academic researchers studying social media behaviour, public opinion, and linguistic patterns.
  • Marketing and PR professionals seeking insights into consumer sentiment during holiday periods.
  • Analysts interested in event-driven social media activity.

Dataset Name Suggestions

  • New Year's Eve Tweets 2021
  • 2021 New Year Twitter Data
  • New Year Sentiment Tweets
  • Global New Year Tweets

Attributes

Original Data Source: New Years 2021 Tweets

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

17/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free