Cryptocurrency Discussion Sentiment Dataset
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides insights into public opinion regarding Bitcoin, derived from comments posted on the /r/Bitcoin subreddit during June 2022 [1, 2]. It is designed to help users track current trends and developments within the cryptocurrency world [2]. The data includes the actual body text of the comments, alongside their assigned sentiment, making it a valuable resource for understanding the evolving landscape of Bitcoin [1, 2].
Columns
The dataset includes several key columns for each comment:
- type: Describes the type of post, stored as a String [1-3].
- subreddit.name: The name of the subreddit, which is "/r/Bitcoin" in this case, stored as a String [1-3].
- subreddit.nsfw: Indicates whether the subreddit is Not Safe For Work (NSFW), a Boolean value [1-4]. The sources indicate that almost all entries (170,032 out of 170,036) are marked as 'false' for NSFW [4].
- created_utc: The timestamp when the post was created, allowing for chronological analysis [1-8].
- permalink: The permanent link to the original post or comment on Reddit, a String [1-3].
- score: The score of the post, an Integer value, typically reflecting upvotes or downvotes [1, 2].
- body: The main text content of the comment, stored as a String [1-3]. Notably, about 7% of comments are "[removed]" and 3% are "[deleted]" [8].
- sentiment: The assigned sentiment of the post, a String. This column also appears to have numerical values ranging from -1.00 (most negative) to 1.00 (most positive), with detailed label counts across various ranges [1, 3, 8-10]. A significant portion of comments, 32,903, fall into the -0.04 to 0.00 sentiment range [9].
Distribution
This dataset focuses on comments from the /r/Bitcoin subreddit from June 2022 [1, 2]. It contains approximately 170,035 unique comment entries [4]. The timestamps for
created_utc
are distributed across June 2022, with varying numbers of comments per time interval, for example, 12,392 comments were recorded between 1655544958.04 and 1655596797.80 [6]. The sentiment analysis is detailed across numerous bins, showing a wide spread of positive, negative, and neutral sentiments [8-10].Usage
This dataset is ideal for data science and analytics [2]. Potential uses include:
- Tracking cryptocurrency trends: Staying up-to-date with the latest developments in Bitcoin [2].
- Sentiment analysis: Analysing public opinion and sentiment towards Bitcoin over time [1].
- Natural Language Processing (NLP) research: Utilising the comment body text for linguistic analysis [2].
- Market research: Understanding community discussions and concerns related to Bitcoin.
- Time-series analysis: Observing how sentiment and discussion volume change over the month of June 2022.
Coverage
The dataset covers content from the Reddit /r/Bitcoin subreddit [1, 2].
- Time Range: Specifically the month of June 2022 [1, 2].
- Geographic Scope: While Reddit is global, the specific geographic origin of users is not detailed in the dataset columns. However, it can be considered a global snapshot of online discussion [11].
- Demographic Scope: Reflects the opinions and discussions of Reddit users who actively participate in the /r/Bitcoin subreddit.
License
CC0
Who Can Use It
- Data Scientists and Analysts: For conducting sentiment analysis, trend tracking, and NLP projects [2].
- Researchers: Studying online communities, cryptocurrency market dynamics, and public discourse.
- Cryptocurrency Enthusiasts and Investors: To gain insights into community perception and market sentiment.
- Developers: To train and test NLP models related to financial or cryptocurrency text.
Dataset Name Suggestions
- Bitcoin Subreddit Comments: June 2022 Sentiment Analysis
- Reddit r/Bitcoin Public Opinion Data (June 2022)
- Cryptocurrency Discussion Sentiment Dataset
Attributes
Original Data Source:Viral Fads and Cryptocurrency