Dark Mode

Home

Data Categories

Science & Research Data

Sherlock Holmes Stories Text Data

FREE DATASET LIBRARY

Verified Data Provider

£0

Sherlock Holmes Stories Text Data

Entertainment & Media Consumption

Tags and Keywords

Arts

Entertainment

Text

Data

Visualization

Literature

Nlp

R

Trusted By

Sherlock Holmes Stories Text Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset offers a unique exploration of "The Adventures of Sherlock Holmes", a classic literary work, through the lens of modern data analysis techniques. It draws a parallel between Sherlock Holmes's meticulous detective methods and the systematic approach of contemporary data analysts. Just as Holmes emphasised that "Data! data! data!" is the bedrock of any investigation, this project demonstrates how conclusions without robust data are merely speculative. The dataset includes the full text of twelve famous stories, such as "A scandal in Bohemia" and "The Copper Beeches", presented in a format suitable for various text and sentiment analysis applications. It aims to provide insights into word usage, emotional tone, key themes, and readability within this iconic collection.

Columns

Text: This column contains the raw textual content from "The Project Gutenberg ebook of The Adventures of Sherlock Holmes". It features 2527 unique values, representing distinct segments or entries from the stories.

Distribution

The dataset is provided in CSV format. Specific numbers for rows or records beyond the unique values in the 'Text' column are not detailed. It is available as a free dataset on the Opendatabay platform.

Usage

This dataset is ideal for:

Performing sentiment analysis to understand emotional tones using lexicons like Bing and NRC.
Conducting text analysis, including word counts, topic modelling, and common bigram identification.
Developing and testing Natural Language Processing (NLP) models.
Creating data visualisations such as word clouds to represent word frequencies.
Assessing the readability of classic literature using metrics like the Flesch-Kincaid score.
Academic research into literary analysis, digital humanities, and the application of data science to qualitative data.
Uncovering hidden patterns and deriving meaningful conclusions from textual information.

Coverage

The dataset covers the complete text of twelve stories from "The Adventures of Sherlock Holmes" by Arthur Conan Doyle. As a literary dataset, its scope is limited to the content of these specific fictional narratives. No specific geographic, time range, or demographic coverage beyond the inherent context of the stories themselves is provided.

License

CC0

Who Can Use It

Data Analysts and Data Scientists: To practice and apply text analysis, NLP, and sentiment analysis techniques.
Students and Academics: For literary studies, linguistics, and digital humanities projects, suitable for readers aged 11-13 (Year 7/8) in the English schooling system for its general readability.
Developers: To build applications that require text processing, sentiment understanding, or literary data exploration.
Researchers: Interested in comparing detective methodologies to data analysis processes or exploring classic literature through a data-driven lens.

Dataset Name Suggestions

Sherlock Holmes Stories Text Data
Adventures of Sherlock Holmes: Text Analysis Dataset
Classic Detective Fiction Sentiment Data
Holmesian NLP Dataset
Project Gutenberg Sherlock Holmes Text

Attributes

Original Data Source: Adventures of Sherlock Holmes: Sentiment Analysis.

Listing Stats

VIEWS

DOWNLOADS

LISTED

22/06/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format

Recommended Datasets

Loading recommendations...