Dark Mode

Home

Data Categories

AI & ML Data

Ask Reddit Community Conversations

FREE DATASET LIBRARY

Verified Data Provider

£0

Ask Reddit Community Conversations

Social Media and Networking

Tags and Keywords

Text

Online

Social

Nlp

Askreddit

Trusted By

Ask Reddit Community Conversations Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset features posts and comments sourced from r/AskReddit, one of Reddit's largest online communities. It serves as a rich repository of questions and answers covering a diverse array of random topics in the English language. The data is unfiltered, making it a valuable resource for sentiment analysis and identifying discussion themes within social media contexts.

Columns

title: The title of the post (relevant for posts).
score: The score of the post, indicative of its impact and the number of comments it received (relevant for posts).
id: A unique identifier for each post or comment.
url: The URL of the post thread (relevant for posts).
comms_num: The total number of comments associated with a particular post (relevant for posts).
created: The date on which the post or comment was created.
body: The main text content of the post or comment.
timestamp: A numerical timestamp indicating the time of creation.

Distribution

The dataset is typically provided in CSV format. It contains both posts and comments from the r/AskReddit subreddit. While specific overall record counts are not stated, the data includes over 44,900 unique entries for 'score' and features a wide range of timestamps, from approximately 1.63 billion to 1.64 billion, indicating a significant volume of data collected over time. Data collection occurs daily.

Usage

This dataset is ideally suited for various applications, including:

Performing sentiment analysis on user-generated content.
Identifying trending discussion topics and popular themes.
Developing and testing Natural Language Processing (NLP) models.
Analysing patterns in online communities and social networks.

Coverage

The dataset's coverage is global in region and is collected daily. It encompasses content in the English language. Sample data indicates content from dates ranging from at least September 2021 to January 2022. The dataset captures contributions from a wide variety of users within the r/AskReddit community, without specific demographic filtering mentioned.

License

CC0

Who Can Use It

This dataset is beneficial for:

Data scientists and researchers focused on text analysis and NLP.
Social media analysts aiming to understand community dynamics and public sentiment.
AI and Machine Learning developers creating models for content classification or topic extraction.
Academics studying online communication patterns and user behaviour.

Dataset Name Suggestions

Ask Reddit Community Conversations
Reddit Public Discourse Dataset
AskReddit NLP Data Collection
Global Forum Q&A Archive

Attributes

Original Data Source: Ask Reddit

Listing Stats

VIEWS

DOWNLOADS

LISTED

17/06/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

FREE DATASET LIBRARY

£0

Ask Reddit Community Conversations

Social Media and Networking

Tags and Keywords

Text

Online

Social

Nlp

Reddit

Askreddit

Trusted By

Free

About

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Listing Stats

Free

Download Dataset in CSV Format