Opendatabay APP

Disaster Tweets Classification Dataset

Social Media and Networking

Tags and Keywords

Text

Nlp

Binary

Disaster

Tweets

Classification

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Disaster Tweets Classification Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset contains over 11,000 tweets meticulously collected based on keywords associated with various disaster events, such as "crash", "quarantine", and "bush fires" [1, 2]. Each tweet includes its location and the specific keyword found within the text [1, 2]. A key feature of this dataset is that each tweet has been manually classified to indicate whether it genuinely refers to a real disaster event or is a non-disaster-related mention, such as a joke or a movie review [1, 2]. It serves as a valuable resource for developing and testing binary classification models aimed at discerning authentic disaster reports from incidental mentions in social media [1].

Columns

  • id: A unique identifier assigned to each individual tweet [2].
  • keyword: The particular keyword from the tweet that led to its inclusion in the dataset [2].
  • location: The geographical location from which the tweet was sent, though this field may be blank for some entries [3].
  • text: The full textual content of the tweet itself [3].
  • target: A binary label indicating whether the tweet is about a real disaster (1) or not (0) [3].

Distribution

The dataset is typically provided in a CSV (Comma Separated Values) format [4]. It comprises approximately 11,369 records, each representing a single tweet [1, 2, 5]. The data is structured in a clear tabular format, with distinct columns as described above [6].

Usage

This dataset is ideally suited for a variety of applications and use cases, including:
  • Natural Language Processing (NLP) tasks, particularly text classification and binary classification [1].
  • Training and evaluating machine learning models to detect and categorise real disaster events from social media streams [1].
  • Social media monitoring for crisis management and real-time event analysis.
  • Developing algorithms to filter out irrelevant or non-disaster-related content from large volumes of tweets.

Coverage

The dataset's geographic scope is global [7]. While location data is included, approximately 30% of the tweets may have blank location fields [3, 5]. Among the identified locations, about 1% are from the United States, with the remaining 69% categorised as 'Other' based on the unique values present [5]. The specific time range during which these tweets were collected is not detailed in the available information.

License

CCO

Who Can Use It

This dataset is suitable for a broad range of users:
  • Data scientists and machine learning engineers: For building, training, and refining models that classify textual data.
  • Researchers: In fields such as natural language processing, social computing, and disaster informatics.
  • Organisations involved in disaster response: To develop tools for real-time social media intelligence.
  • Students: Undertaking projects related to text mining, classification, and big data analysis.

Dataset Name Suggestions

  • Disaster Tweets Classification Dataset
  • Social Media Disaster Event Classifier
  • Real vs. Fake Disaster Tweets
  • Crisis Tweet Text Data

Attributes

Original Data Source: COVID-19 All Vaccines Tweets

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

08/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format