Opendatabay APP

Twitter Mental Health Dataset

Mental Health & Wellness

Tags and Keywords

Text

Nlp

Psychology

Mental

Binary

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Twitter Mental Health Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is designed for detecting depression and anxiety in tweets, specifically in the Indonesian language. It was created and published as part of a Bangkit capstone project, building upon a research paper by David Owen, Jose Camacho Collados, and Luis Espinosa-Anke. The dataset has been successfully used to train a bidirectional LSTM model, achieving 95% accuracy, 75% precision, and 79% recall (F1 score 0.766) in predicting emotional distress. It serves as a valuable resource for mental health research and natural language processing applications focusing on social media content.

Columns

  • text: Contains the full text of the tweet.
  • label: A binary indicator where '1' signifies that the user potentially shows signs of anxiety or depression, and '0' indicates otherwise.

Distribution

The dataset is provided in CSV format and comprises three distinct files: datd_train.csv, datd_test.csv, and datd_rand.csv. The datd_train and datd_test files were utilised for model training and testing respectively. The datd_rand file was used for final evaluation and includes positive entries from datd_test alongside random tweets that do not contain specific keywords often associated with depression or anxiety. These random tweets are all labelled as negative. The combined dataset contains approximately 6,980 records, with 6,247 instances labelled as '0' (no sign of distress) and 733 instances labelled as '1' (potential sign of distress).

Usage

This dataset is ideally suited for:
  • Developing and evaluating machine learning models for binary classification tasks, particularly in mental health detection.
  • Conducting Natural Language Processing (NLP) research related to sentiment analysis and emotional state recognition from text.
  • Supporting academic studies in psychology and social media behaviour.
  • Building applications for early detection or monitoring of mental health indicators from social media.

Coverage

The dataset focuses on tweets in the Indonesian language. Its geographic scope is global, as Twitter data is not inherently restricted by region. There are no specific notes on time range or demographic scope beyond the language focus.

License

CC-BY-NC

Who Can Use It

  • Researchers in computer science, psychology, and public health for academic studies.
  • Data Scientists and Machine Learning Engineers developing predictive models for mental health.
  • Students undertaking projects in NLP, social media analysis, or AI for social good.
  • Organisations interested in analysing social media trends related to mental well-being.

Dataset Name Suggestions

  • Indonesian Twitter Mental Health Dataset
  • Tweet Emotional Distress (ID)
  • Social Media Anxiety & Depression Classifier
  • Bahasa Indonesia Mental Wellness Tweets

Attributes

Listing Stats

VIEWS

2

DOWNLOADS

1

LISTED

11/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format