WallStreetBets Reddit Posts
Reddit & Forum Data
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset captures discussions from WallStreetBets (r/wallstreetbets, also known as WSB), a notable online community on Reddit where participants engage in stock and option trading discussions. The subreddit has gained recognition for its often profane nature and has been associated with allegations of users manipulating securities, particularly during events like the surge in GameStop shares. The data provides insight into the community's dynamic conversations, recent trends, and shared sentiment around various financial instruments. It consists of Reddit posts downloaded via the Python Reddit API Wrapper (praw) and may contain some harsh language, as posts were not filtered.
Columns
- title: The title of the Reddit post. This column contains 52,049 unique values out of 53.2k valid entries, with "AMC" being a frequently occurring term.
- score: The score of the Reddit post, indicating its popularity or upvote count. Scores range from 0.00 to 348,241.00. The mean score is approximately 1.38k, with a standard deviation of 8k.
- id: The unique identifier for each Reddit post. There are 53,187 unique IDs across 53.2k valid entries.
- url: The URL linking directly to the Reddit post. This column has 53,172 unique URLs across 53.2k valid entries.
- comms_num: The number of comments associated with each post. Values span from 0.00 to 93,268.00. The average number of comments is 263, with a standard deviation of 2.53k.
- created: The creation timestamp of the post, provided in Unix epoch format. Timestamps range from 1,601,340,416.00 to 1,629,095,180.00.
- body: The main message content or body of the Reddit post. Approximately 53% of entries in this column are missing, while 24.7k entries are valid. The most common entry refers to a daily trading discussion thread.
- timestamp: A human-readable date and time representation of the post's creation. This column covers posts from 29 September 2020 to 16 August 2021.
Distribution
The dataset is provided as a CSV file named
reddit_wsb.csv
. It has a total size of 43.73 MB and comprises 8 columns. Most columns contain 53.2k valid records, offering a substantial collection of posts.Usage
This dataset is ideal for:
- Performing sentiment analysis on financial discussions to gauge market mood.
- Identifying discussion topics and popular narratives within the WallStreetBets community.
- Following trends and observing the appearance of specific keywords, such as GME, AMP, and NOK, that reflect market interest.
Coverage
The dataset primarily covers discussions from the WallStreetBets subreddit. The temporal scope ranges from 29 September 2020 to 16 August 2021. While there is no explicit geographic or specific demographic coverage mentioned, the data represents discussions among participants interested in stock and option trading globally through an online platform. It is important to note that the dataset may contain some strong language as the posts were not pre-filtered.
License
CC0: Public Domain
Who Can Use It
This dataset is suitable for:
- Researchers studying online financial communities, social media's influence on markets, and investor behaviour.
- Data analysts and financiers seeking to understand retail investor sentiment and emerging market trends.
- Academics interested in natural language processing (NLP) applications for highly informal and domain-specific text.
- Developers building applications for trend prediction or sentiment monitoring in online trading communities.
Dataset Name Suggestions
- WallStreetBets Reddit Posts
- WSB Community Discussions
- Reddit Stock & Option Trading Data
- GameStop Era WallStreetBets Posts
Attributes
Original Data Source: WallStreetBets Reddit Posts