Opendatabay APP

Reddit vs Twitter Discourse Data

Social Media and Posts

Tags and Keywords

Misinformation

Twitter

Reddit

Nlp

Tweets

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Reddit vs Twitter Discourse Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Tracking the spread of misinformation and social media dynamics, this dataset captures a weekly refreshed collection of tweets from the account @reddit_lies. Gathered using a Tweepy notebook, the data provides a chronological archive of posts that highlight or critique Reddit content on the Twitter platform. It is particularly valuable for natural language processing (NLP), sentiment analysis, and studying the cross-platform propagation of information between Reddit and Twitter.

Columns

  • date: The timestamp indicating when the tweet was posted (DateTime format).
  • text: The full content of the tweet, which typically includes commentary and links to Reddit threads.
  • hashtags: Any hashtags included in the tweet (Note: This column has a high percentage of missing values, approximately 98%).
  • source: The specific client or device used to post the tweet (e.g., Twitter for iPhone, Twitter Web App).

Distribution

The dataset is provided in a CSV format with a file size of approximately 1.05 MB. It contains 7,341 valid records (rows) representing individual tweets.

Usage

  • Misinformation Analysis: Tracking patterns in how misinformation is identified and flagged across platforms.
  • Natural Language Processing (NLP): Training models on short-form social media text and external link structures.
  • Social Media Behaviour: Analysing the frequency and timing of posts to understand user engagement cycles.
  • Sentiment Analysis: Evaluating the tone of commentary regarding Reddit threads.

Coverage

  • Time Range: The data covers the period from 10 May 2022 to 25 Jan 2023.
  • Geographic Scope: Global (Internet/Social Media).
  • Data Availability: The dataset represents a continuous collection with no missing values in the 'date', 'text', or 'source' columns. However, the 'hashtags' column is largely empty.

License

CC0: Public Domain

Who Can Use It

  • Data Scientists: For NLP projects and text classification tasks.
  • Academic Researchers: Studying social media ecosystems and cross-platform interactions.
  • Journalists: Investigating the discourse surrounding Reddit content on other social networks.
  • Social Media Analysts: Monitoring engagement metrics and content virality.

Dataset Name Suggestions

  • Reddit Lies Tweet Archive
  • Cross-Platform Misinformation Tweets
  • Reddit vs Twitter Discourse Data
  • Social Media Content Tracking Log

Attributes

Original Data Source: Reddit vs Twitter Discourse Data

Listing Stats

VIEWS

2

DOWNLOADS

1

LISTED

07/12/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in CSV Format