Sherlock Holmes Stories Text Data
Entertainment & Media Consumption
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset offers a unique exploration of "The Adventures of Sherlock Holmes", a classic literary work, through the lens of modern data analysis techniques. It draws a parallel between Sherlock Holmes's meticulous detective methods and the systematic approach of contemporary data analysts. Just as Holmes emphasised that "Data! data! data!" is the bedrock of any investigation, this project demonstrates how conclusions without robust data are merely speculative. The dataset includes the full text of twelve famous stories, such as "A scandal in Bohemia" and "The Copper Beeches", presented in a format suitable for various text and sentiment analysis applications. It aims to provide insights into word usage, emotional tone, key themes, and readability within this iconic collection.
Columns
- Text: This column contains the raw textual content from "The Project Gutenberg ebook of The Adventures of Sherlock Holmes". It features 2527 unique values, representing distinct segments or entries from the stories.
Distribution
The dataset is provided in CSV format. Specific numbers for rows or records beyond the unique values in the 'Text' column are not detailed. It is available as a free dataset on the Opendatabay platform.
Usage
This dataset is ideal for:
- Performing sentiment analysis to understand emotional tones using lexicons like Bing and NRC.
- Conducting text analysis, including word counts, topic modelling, and common bigram identification.
- Developing and testing Natural Language Processing (NLP) models.
- Creating data visualisations such as word clouds to represent word frequencies.
- Assessing the readability of classic literature using metrics like the Flesch-Kincaid score.
- Academic research into literary analysis, digital humanities, and the application of data science to qualitative data.
- Uncovering hidden patterns and deriving meaningful conclusions from textual information.
Coverage
The dataset covers the complete text of twelve stories from "The Adventures of Sherlock Holmes" by Arthur Conan Doyle. As a literary dataset, its scope is limited to the content of these specific fictional narratives. No specific geographic, time range, or demographic coverage beyond the inherent context of the stories themselves is provided.
License
CC0
Who Can Use It
- Data Analysts and Data Scientists: To practice and apply text analysis, NLP, and sentiment analysis techniques.
- Students and Academics: For literary studies, linguistics, and digital humanities projects, suitable for readers aged 11-13 (Year 7/8) in the English schooling system for its general readability.
- Developers: To build applications that require text processing, sentiment understanding, or literary data exploration.
- Researchers: Interested in comparing detective methodologies to data analysis processes or exploring classic literature through a data-driven lens.
Dataset Name Suggestions
- Sherlock Holmes Stories Text Data
- Adventures of Sherlock Holmes: Text Analysis Dataset
- Classic Detective Fiction Sentiment Data
- Holmesian NLP Dataset
- Project Gutenberg Sherlock Holmes Text
Attributes
Original Data Source: Adventures of Sherlock Holmes: Sentiment Analysis.