Opendatabay APP

WallStreetBets Reddit Posts

Reddit & Forum Data

Tags and Keywords

Reddit

Stocks

Investing

Wallstreetbets

Trading

Trusted By
Trusted by company1Trusted by company2Trusted by company3
WallStreetBets Reddit Posts Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset captures discussions from WallStreetBets (r/wallstreetbets, also known as WSB), a notable online community on Reddit where participants engage in stock and option trading discussions. The subreddit has gained recognition for its often profane nature and has been associated with allegations of users manipulating securities, particularly during events like the surge in GameStop shares. The data provides insight into the community's dynamic conversations, recent trends, and shared sentiment around various financial instruments. It consists of Reddit posts downloaded via the Python Reddit API Wrapper (praw) and may contain some harsh language, as posts were not filtered.

Columns

  • title: The title of the Reddit post. This column contains 52,049 unique values out of 53.2k valid entries, with "AMC" being a frequently occurring term.
  • score: The score of the Reddit post, indicating its popularity or upvote count. Scores range from 0.00 to 348,241.00. The mean score is approximately 1.38k, with a standard deviation of 8k.
  • id: The unique identifier for each Reddit post. There are 53,187 unique IDs across 53.2k valid entries.
  • url: The URL linking directly to the Reddit post. This column has 53,172 unique URLs across 53.2k valid entries.
  • comms_num: The number of comments associated with each post. Values span from 0.00 to 93,268.00. The average number of comments is 263, with a standard deviation of 2.53k.
  • created: The creation timestamp of the post, provided in Unix epoch format. Timestamps range from 1,601,340,416.00 to 1,629,095,180.00.
  • body: The main message content or body of the Reddit post. Approximately 53% of entries in this column are missing, while 24.7k entries are valid. The most common entry refers to a daily trading discussion thread.
  • timestamp: A human-readable date and time representation of the post's creation. This column covers posts from 29 September 2020 to 16 August 2021.

Distribution

The dataset is provided as a CSV file named reddit_wsb.csv. It has a total size of 43.73 MB and comprises 8 columns. Most columns contain 53.2k valid records, offering a substantial collection of posts.

Usage

This dataset is ideal for:
  • Performing sentiment analysis on financial discussions to gauge market mood.
  • Identifying discussion topics and popular narratives within the WallStreetBets community.
  • Following trends and observing the appearance of specific keywords, such as GME, AMP, and NOK, that reflect market interest.

Coverage

The dataset primarily covers discussions from the WallStreetBets subreddit. The temporal scope ranges from 29 September 2020 to 16 August 2021. While there is no explicit geographic or specific demographic coverage mentioned, the data represents discussions among participants interested in stock and option trading globally through an online platform. It is important to note that the dataset may contain some strong language as the posts were not pre-filtered.

License

CC0: Public Domain

Who Can Use It

This dataset is suitable for:
  • Researchers studying online financial communities, social media's influence on markets, and investor behaviour.
  • Data analysts and financiers seeking to understand retail investor sentiment and emerging market trends.
  • Academics interested in natural language processing (NLP) applications for highly informal and domain-specific text.
  • Developers building applications for trend prediction or sentiment monitoring in online trading communities.

Dataset Name Suggestions

  • WallStreetBets Reddit Posts
  • WSB Community Discussions
  • Reddit Stock & Option Trading Data
  • GameStop Era WallStreetBets Posts

Attributes

Original Data Source: WallStreetBets Reddit Posts

Listing Stats

VIEWS

2

DOWNLOADS

0

LISTED

08/07/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format