Opendatabay APP

NLP Disaster Communication Dataset

Telecommunications & Network Data

Tags and Keywords

Text

Email

Messaging

Nlp

Natural

Disasters

Trusted By
Trusted by company1Trusted by company2Trusted by company3
NLP Disaster Communication Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset contains approximately 30,000 messages collected from various disaster events, including earthquakes in Haiti and Chile in 2010, floods in Pakistan in 2010, and Superstorm Sandy in the U.S.A. in 2012, alongside news articles covering hundreds of other disasters over several years. It serves as a valuable resource for understanding disaster response communications. The data is encoded with 36 distinct categories related to disaster response and has been carefully processed to remove any sensitive information. It includes untranslated messages in their original languages alongside their English translations, featuring dozens of classes for message content, noted with simple binary labels.

Columns

The dataset includes the following columns:
  • id: A unique identifier for each message.
  • split: Indicates the dataset split (e.g., Train, Validation, Test).
  • message: The English translation of the original message.
  • original: The message in its original language.
  • genre: The genre of the message.
  • related: A message class label.
  • PII: A message class label, indicating absence or presence of specific content.
  • request: A message class label, indicating if the message is a request.
  • offer: A message class label, indicating if the message is an offer.
  • aid_related: A message class label, indicating if the message is aid-related. The dataset also contains many other binary columns for various content categories, beyond those explicitly listed here.

Distribution

This dataset comprises roughly 30,000 unique messages, with approximately 30.3k total records. The structure includes both original language messages and their English translations, alongside multiple categorical labels. Specific file format details are not provided in the sources for this dataset, but it is suitable for structured data analysis.

Usage

This dataset is particularly useful for text analytics and natural language processing (NLP) tasks and models. It is well-suited for text categorisation, content classification, and developing models for disaster response communication analysis.

Coverage

The dataset's geographic scope covers events in Haiti (2010), Chile (2010), Pakistan (2010), and the U.S.A. (2012), with additional news articles spanning a large number of years and hundreds of global disasters. It features multilingual content, making it applicable for global studies.

License

CC0

Who Can Use It

This dataset is ideal for data scientists, machine learning engineers, researchers, and students. It is especially useful for those involved in developing text analytics, natural language processing, and text classification models, particularly within the domain of disaster relief and humanitarian aid. It has been highlighted for use in educational contexts, such as Udacity courses on Data Science and AI4ALL summer schools.

Dataset Name Suggestions

  • Multilingual Disaster Response Messages
  • Global Disaster Relief Messages
  • Emergency Text Analytics Data
  • NLP Disaster Communication Dataset
  • Humanitarian Aid Message Corpus

Attributes

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

24/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in ZIP Format