Turkish Disaster Response NLP Dataset
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Detailed collection of Turkish tweets posted immediately following the 2020 Izmir earthquake, specifically focusing on urgency calls and rescue requests. This repository contains three distinct components: a file containing 4,000 tweets manually annotated by three individuals to indicate if the text is an urgency call; a dataset automatically annotated using a fine-tuned BERT model; and a large collection of earthquake-relevant tweets spanning the first week of the event. These resources are designed to aid in the development of classification models for disaster response and crisis communication.
Columns
- Datetime: The timestamp of the tweet (GMT +3).
- Text: The actual content of the tweet (Turkish language).
- coordinates: Geospatial data (longitude and latitude), though missing in approximately 97% of records.
- retweetCount: The number of times the tweet was shared.
- likeCount: The number of likes the tweet received.
- rescue: A binary label indicating if the tweet contains an urgency call (available in the manually annotated file).
- preds: A predicted label generated by the BERT model (available in the auto-annotated file).
Distribution
- Format: CSV
- Size: 190.56 MB (Main file)
- Structure: Three separate CSV files (annotated, auto-annotated, and full collection).
- Records: Approximately 971,000 valid entries in the main collection.
- Update Frequency: Never
Usage
- Training Natural Language Processing (NLP) classifiers to detect urgency in social media text.
- Analysing sentiment and public reaction during natural disasters.
- Developing automated systems to filter rescue requests for humanitarian aid.
- Studying the temporal distribution of crisis-related social media activity.
Coverage
- Geographic Scope: Izmir, Turkey.
- Time Range: 30 October 2020 – 07 November 2020.
- Demographics: Twitter users posting in Turkish regarding the earthquake.
- Data Availability Notes: Coordinates are null for 97% of the data; Retweet and Like counts are null for approximately 84-85% of the data.
License
CC0: Public Domain
Who Can Use It
- Data Scientists and NLP Engineers building disaster response tools.
- Academic researchers studying crisis informatics.
- Humanitarian organisations and NGOs seeking to understand social media dynamics during emergencies.
- Government agencies improving digital response strategies.
Dataset Name Suggestions
- Izmir 2020 Earthquake Turkish Tweets & Annotations
- Turkish Disaster Response NLP Dataset
- Izmir Earthquake Urgency Classification Data
- Social Media Crisis Communications: Izmir 2020
Attributes
Original Data Source: Turkish Disaster Response NLP Dataset
Loading...
