William Shakespeare's Play Script Dataset
Entertainment & Media Consumption
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset features every line of dialogue from all plays by William Shakespeare, meticulously organised for ease of analysis. It includes 108,093 lines, each tagged with metadata such as play name, genre, character, act, scene, and sentence number, along with the dialogue text itself and the gender of the character speaking it. This resource is an invaluable tool for literary analysis, natural language processing, and the historical study of one of English literature's most significant figures. It is suitable for simple queries as well as complex linguistic analyses.
Columns
- play_name: The name of the play.
- genre: The genre of the play, such as Comedy, History, or Tragedy.
- character: The name of the character delivering the line.
- act: The act number within the play.
- scene: The scene number within the act.
- sentence: The line number within the scene.
- text: The actual text of the dialogue.
- sex: The gender of the character.
Distribution
The dataset is provided as a CSV file and contains 108,093 rows, representing individual lines of dialogue. It features dialogue from all of William Shakespeare's plays, with lines linked to over 950 unique characters. The gender distribution of characters delivering lines is approximately 82% male and 18% female. Play genres are distributed with Comedy accounting for 42%, Tragedy for 29%, and History for 28%.
Usage
This dataset is ideal for:
- Textual Analysis: Applying natural language processing techniques to examine Shakespeare’s language, themes, and character development.
- Gender Studies: Investigating the representation of gender across various plays and genres.
- Educational Tools: Developing educational content and analytical tools for students and scholars studying Shakespeare.
- Sentiment Analysis: Determining the sentiment of dialogues and observing its variation across different play types and characters.
- Topic Modelling: Identifying prevalent themes and topics throughout different plays.
- Network Analysis: Analysing character interactions to map social networks within plays.
- Machine Learning Applications: Predicting the play or character based on a given line of text.
Coverage
This dataset covers all plays written by William Shakespeare. It includes dialogues from over 950 unique characters, with gender information provided for each. The data reflects the linguistic and thematic elements present in Shakespeare's works without specific geographic or temporal constraints beyond the author's historical period.
License
CC0
Who Can Use It
This dataset is suited for:
- Educators, students, researchers, and enthusiasts interested in in-depth exploration of Shakespeare’s works.
- Individuals focused on literary analysis, natural language processing, and the historical study of English literature.
- Anyone keen to delve into the linguistic and thematic aspects of Shakespeare's plays for various levels of analysis.
Dataset Name Suggestions
- Shakespearean Dialogue Collection
- William Shakespeare's Play Script Dataset
- Shakespearean Character Dialogue Database
- Bard's Lines Dataset
- Shakespeare's Complete Works Dialogue
Attributes
Original Data Source: Shakespeare's Plays: Dialogues & Characters