Opendatabay APP

Twitter Threads

Social Media and Networking

Tags and Keywords

Internet

Online Communities

Social Science

Social Networks

Email and Messaging

Linguistics

NLP

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Twitter Threads Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Context When Twitter introduced its thread functionality, a debate emerged: "If you're gonna write a f*ck ton of tweets at once, why not write a blog post instead of cluttering my feed?"… "It's easier and user-friendlier to share ideas in a single app"…
I'm not getting into that debate. Both blog posts and Twitter threads have their own advantages.
But I noticed a phenomenon while reading threads on Twitter: the engagement—retweets, likes and replies—drops with each subsequent tweet!
Now, this has some logical explanations. Like, people don't want to retweet or like every tweet in a thread, because that'd be annoying. But this trend kept appearing in every single thread I read.
It was bugging me, so I had to gather some data.
Content The dataset is divided into five parts:
five_ten.csv: data of threads 5-10 tweets long ten_fifteen.csv: data of threads 10-15 tweets long fifteen_twenty.csv: data of threads 15-20 tweets long twenty_twentyfive.csv: data of threads 20-25 tweets long twentyfive_thirty.csv: data of threads 25-30 tweets long They all contain the same data:
id: Tweet ID (maybe I should remove it to anonymize the data?) thread_number: Thread identifier, used for grouping each thread and its tweets timestamp: Creation date of each tweet text: The content of each tweet retweets: Retweet count for each tweet likes: Like count for each tweet replies: Reply count for each tweet Each "bin" contains around 100 threads… so in total there are ~500 threads.
Acknowledgements The threads were manually gathered using Thread Reader (both the web page and the bot).
Disclaimer The content of the threads/tweets did not had any influence in choosing a thread or not. The only parameter was the length of the thread (5-30 tweets tops). The tweets collected date from October 2017 to May 2018.
Inspiration Some things I noticed while gathering the data was that political threads have a steadier engagement than, say, art threads. So context might influence thread engagement, and it'd be interesting to do some NLP to figure that out.
Also it'd be cool to find a "formula" for better engagement in Twitter threads, like how long should a thread be? or maybe a probability of engagement based on the success of the initial tweet?
Finally, this whole issue reminds me of the headline problem: most people don't go beyond the headline. Maybe Twitter threads suffer from that too.

License

CC0
Original Data Source: Twitter Threads

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

17/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free