WallStreetBets Market Sentiment Data
Stock & Market Data
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset contains posts and comments collected from the r/wallstreetbets subreddit, primarily focusing on content from 2022. It provides a valuable resource for understanding trends initiated by the Reddit community, performing sentiment analysis on user-generated content, and extracting key topics discussed within the subreddit.
Columns
- title: The title of a post.
- score: The score (upvotes minus downvotes) of a post or comment, with values ranging from -152 to 105k.
- id: A unique identifier for each post or comment, with approximately 1.1 million unique values.
- url: The URL associated with a post; approximately 91% of values are null, with around 95.2k unique URLs.
- comms_num: The number of comments a post has received, ranging from 0 to approximately 39.9k.
- created: A timestamp indicating when the post or comment was created, represented as a Unix timestamp.
- body: The main body content of a post or comment; approximately 4% of values are null or contain image emotes.
- timestamp: A datetime representation of when the post or comment was created, ranging from 27th January 2021 to 28th March 2025.
Distribution
The dataset is provided as a CSV file, named
wallstreetbets_2022.csv
, with a size of 221.47 MB. It consists of 8 columns and approximately 1.1 million records. The data is collected and merged daily.Usage
This dataset is ideal for various applications, including:
- Understanding Market Trends: Analyse the collective sentiment and discussions to identify emerging trends among the "Reddit educated crowd".
- Sentiment Analysis: Perform sentiment analysis on posts and comments to gauge public mood towards specific stocks or market events.
- Topic Modelling: Extract significant topics from the extensive text data to uncover key areas of interest and discussion.
- Social Network Analysis: Investigate interactions and influence within the WallStreetBets community.
Coverage
This dataset primarily covers content from the WallStreetBets subreddit from 2022. However, the timestamps within the data indicate a broader collection period, ranging from 27th January 2021 to 28th March 2025. The data reflects contributions from the Reddit community within this specific subreddit. It is updated daily.
License
CC0: Public Domain
Who Can Use It
This dataset is suitable for:
- Financial Analysts and Researchers: To study social media influence on market behaviour and investment trends.
- Data Scientists and NLP Practitioners: For developing and testing sentiment analysis models, topic modelling algorithms, and text classification systems.
- Academics: Conducting research on online communities, collective intelligence, and financial phenomena driven by social media.
- Market Strategists: Gaining insights into retail investor sentiment and potential market movements.
Dataset Name Suggestions
- WallStreetBets 2022 Posts and Comments
- Reddit WallStreetBets Daily Feed
- WSB Community Discussions (2021-2025)
- WallStreetBets Market Sentiment Data
Attributes
Original Data Source: WallStreetBets Market Sentiment Data