Ask Reddit Community Conversations
Social Media and Networking
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset features posts and comments sourced from r/AskReddit, one of Reddit's largest online communities. It serves as a rich repository of questions and answers covering a diverse array of random topics in the English language. The data is unfiltered, making it a valuable resource for sentiment analysis and identifying discussion themes within social media contexts.
Columns
- title: The title of the post (relevant for posts).
- score: The score of the post, indicative of its impact and the number of comments it received (relevant for posts).
- id: A unique identifier for each post or comment.
- url: The URL of the post thread (relevant for posts).
- comms_num: The total number of comments associated with a particular post (relevant for posts).
- created: The date on which the post or comment was created.
- body: The main text content of the post or comment.
- timestamp: A numerical timestamp indicating the time of creation.
Distribution
The dataset is typically provided in CSV format. It contains both posts and comments from the r/AskReddit subreddit. While specific overall record counts are not stated, the data includes over 44,900 unique entries for 'score' and features a wide range of timestamps, from approximately 1.63 billion to 1.64 billion, indicating a significant volume of data collected over time. Data collection occurs daily.
Usage
This dataset is ideally suited for various applications, including:
- Performing sentiment analysis on user-generated content.
- Identifying trending discussion topics and popular themes.
- Developing and testing Natural Language Processing (NLP) models.
- Analysing patterns in online communities and social networks.
Coverage
The dataset's coverage is global in region and is collected daily. It encompasses content in the English language. Sample data indicates content from dates ranging from at least September 2021 to January 2022. The dataset captures contributions from a wide variety of users within the r/AskReddit community, without specific demographic filtering mentioned.
License
CC0
Who Can Use It
This dataset is beneficial for:
- Data scientists and researchers focused on text analysis and NLP.
- Social media analysts aiming to understand community dynamics and public sentiment.
- AI and Machine Learning developers creating models for content classification or topic extraction.
- Academics studying online communication patterns and user behaviour.
Dataset Name Suggestions
- Ask Reddit Community Conversations
- Reddit Public Discourse Dataset
- AskReddit NLP Data Collection
- Global Forum Q&A Archive
Attributes
Original Data Source: Ask Reddit