British Literary Phrases Dataset
Knowledge Bundles
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset features a curated collection of labelled phrases from renowned British authors, spanning from the 14th to the 21st centuries. The content has been carefully extracted sentence by sentence using Natural Language Processing (NLP) techniques. Each entry is meticulously labelled with the writer's name, the title of the book, and the century in which it was written. A unique index number is assigned to each writer, facilitating the analysis of sequential phrase patterns. This dataset is designed with no missing information and includes works from celebrated authors such as William Shakespeare, Charles Dickens, Virginia Woolf, Jane Austen, and J. K. Rowling. It serves as a valuable resource for various analytical and model-building tasks.
Columns
The dataset comprises four primary columns:
- Sentence: A sample of a meaningful sentence, representing the extracted phrases from literary works.
- Name of writer: The name of the British author from whom the sentence was extracted.
- Name of Book: The title of the book from which the sentence originates.
- Century: The century to which the literary work belongs. All columns are merged together without any missing data.
Distribution
The data files are typically provided in a CSV format. A sample file will be made available separately on the platform. The dataset is offered freely. Specific numbers for rows or records are not currently specified.
Usage
This dataset is highly suitable for a variety of applications, including:
- Developing NLP models capable of identifying the century to which an English phrase belongs.
- Creating NLP models that can determine which British author an English phrase is similar to.
- Training NLP models on informal, non-scientific phrases.
- Facilitating the prediction, when combined with newspaper data, of whether a sentence pertains to literature or non-literature.
- Building NLP models designed to detect romantic and literary phrases.
Coverage
The dataset focuses on famous British writers, covering a substantial time range from the 14th to the 21st centuries. While the content is derived from British literature, its regional applicability for usage is global.
License
CCO
Who Can Use It
This dataset is ideal for:
- Data scientists and machine learning engineers working on text classification, author attribution, or literary style analysis.
- Researchers in literary studies and the humanities seeking to apply computational methods to analyse historical texts.
- Developers creating educational tools or applications related to British literature.
- Anyone interested in Natural Language Processing and building models for text understanding.
Dataset Name Suggestions
- British Literary Phrases Dataset
- Historical British Literature NLP
- UK Author Text Corpus
- Century-Labelled British Sentences
- English Literary Phrases (NLP)
Attributes
Original Data Source: British_literature_NLP_Labelled_Phrase