Bitcoin Global News Archive
Finance & Banking Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
The dataset includes news articles that have been web scraped from various internet sources. A notable trend observed in the articles is the increasing number of long-term Bitcoin holders, with a significant portion of Bitcoin remaining unmoved for at least a year. This trend contributes to the scarcity of the asset, as fewer Bitcoins are available for trading or sale on the market. The articles cover diverse topics, including updates on Bitcoin mining operations and the launch of Bitcoin NFT marketplaces.
-
Columns
The dataset is structured with the following columns:
-
article_id: A unique identifier for each news article.
-
title: The headline or title of the news article.
-
author: The individual or entity credited as the author of the article.
-
published_date: The date and time when the article was published.
-
link: The direct URL to the original news article.
-
clean_url: A simplified or cleaned version of the article's URL.
-
excerpt: A brief, introductory summary of the article's content.
-
summary: The full text of the article, typically limited to a maximum of 250 words.
-
rights: Specifies the owner or rights holder of the article content.
-
article_rank: A numerical rank indicating the article's engagement or relevance.
-
topic: The subject or category of the article (e.g., finance).
-
country: The country associated with the article or its publisher (e.g., US).
-
language: The language in which the article is written (e.g., en for English).
-
authors: A list of authors for the article, potentially including multiple names.
-
media: A link to associated media, such as an image.
-
twitter_account: The Twitter handle related to the article's author or source.
-
article_score: A numerical score indicating the article's quality or relevance.
-
Distribution
This dataset contains news article corpora provided in CSV files. It features 2,402 unique article IDs. The articles span a time range from 10 March 2022 to 11 September 2022, with varying article counts across different date intervals. For instance, there were 471 articles between 3 March 2022 and 28 March 2022, and 503 between 28 March 2022 and 15 April 2022. The dataset includes 2,324 unique values for authors, with PRNewswire accounting for 11% and 9% being null. The articles originate from various cleaned URLs, with forextv.com contributing 7%, bitrss.com 4%, and stl.news 5%. Article ranks range from 31.00 to 993,405.00, with the majority (1,777 articles) falling into the 31.00 - 99,368.40 range.
-
Usage
This dataset is ideally suited for:
-
Text mining and text analytics: Extracting patterns, insights, and structured information from news content.
-
Sentiment analysis: Determining the sentiment (positive, negative, neutral) expressed in Bitcoin-related news.
-
Topic modelling: Identifying and tracking key themes and topics discussed in Bitcoin news over time.
-
Word embeddings: Generating numerical representations of words for natural language processing tasks.
-
Market trend analysis: Understanding the impact of news on Bitcoin's perceived scarcity due to long-term holding patterns.
-
Research on cryptocurrency adoption and behaviour: Analysing how news coverage reflects or influences investor behaviour.
-
Tracking industry-specific news: Monitoring developments related to Bitcoin mining operations and NFT marketplaces.
-
Coverage
The dataset's time coverage extends from at least March 2022 to September 2022. Geographically, the articles include content relevant to the US, and the dataset's reach is global. All articles within the dataset are in English.
-
License
CC0
-
Who Can Use It
This dataset is beneficial for a wide range of users, including:
-
Data analysts and scientists: For conducting in-depth text analysis, pattern recognition, and predictive modelling based on news sentiment.
-
Financial researchers: To study the dynamics between news cycles and cryptocurrency market behaviour, particularly regarding Bitcoin's value and investor holding trends.
-
Developers and AI/ML engineers: For training and validating natural language processing (NLP) models, such as those for sentiment analysis, topic extraction, or news summarisation in the financial domain.
-
Business intelligence professionals: To gain insights into public perception, emerging trends, and key events influencing the Bitcoin ecosystem.
-
Academics and students: For educational purposes, research projects, and case studies on digital currencies and blockchain technology.
-
Dataset Name Suggestions
-
Bitcoin News Articles Corpus 2022
-
Cryptocurrency News Text Data
-
Bitcoin Market & Mining News
-
Digital Currency News Dataset
-
Bitcoin Global News Archive
-
Attributes
Original Data Source: Bitcoin - News articles text corpora