Twitter Mental Health Dataset
Mental Health & Wellness
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset is designed for detecting depression and anxiety in tweets, specifically in the Indonesian language. It was created and published as part of a Bangkit capstone project, building upon a research paper by David Owen, Jose Camacho Collados, and Luis Espinosa-Anke. The dataset has been successfully used to train a bidirectional LSTM model, achieving 95% accuracy, 75% precision, and 79% recall (F1 score 0.766) in predicting emotional distress. It serves as a valuable resource for mental health research and natural language processing applications focusing on social media content.
Columns
- text: Contains the full text of the tweet.
- label: A binary indicator where '1' signifies that the user potentially shows signs of anxiety or depression, and '0' indicates otherwise.
Distribution
The dataset is provided in CSV format and comprises three distinct files:
datd_train.csv
, datd_test.csv
, and datd_rand.csv
. The datd_train
and datd_test
files were utilised for model training and testing respectively. The datd_rand
file was used for final evaluation and includes positive entries from datd_test
alongside random tweets that do not contain specific keywords often associated with depression or anxiety. These random tweets are all labelled as negative. The combined dataset contains approximately 6,980 records, with 6,247 instances labelled as '0' (no sign of distress) and 733 instances labelled as '1' (potential sign of distress).Usage
This dataset is ideally suited for:
- Developing and evaluating machine learning models for binary classification tasks, particularly in mental health detection.
- Conducting Natural Language Processing (NLP) research related to sentiment analysis and emotional state recognition from text.
- Supporting academic studies in psychology and social media behaviour.
- Building applications for early detection or monitoring of mental health indicators from social media.
Coverage
The dataset focuses on tweets in the Indonesian language. Its geographic scope is global, as Twitter data is not inherently restricted by region. There are no specific notes on time range or demographic scope beyond the language focus.
License
CC-BY-NC
Who Can Use It
- Researchers in computer science, psychology, and public health for academic studies.
- Data Scientists and Machine Learning Engineers developing predictive models for mental health.
- Students undertaking projects in NLP, social media analysis, or AI for social good.
- Organisations interested in analysing social media trends related to mental well-being.
Dataset Name Suggestions
- Indonesian Twitter Mental Health Dataset
- Tweet Emotional Distress (ID)
- Social Media Anxiety & Depression Classifier
- Bahasa Indonesia Mental Wellness Tweets
Attributes
Original Data Source: Depression and Anxiety in Twitter (ID)