Opendatabay APP

Game of Thrones Dialogue Dataset

Entertainment & Media Consumption

Tags and Keywords

Business

Arts

Nlp

Text

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Game of Thrones Dialogue Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset provides the complete script for all seasons of the popular Game of Thrones series, compiled into a single CSV file. It was created through an extensive process of scraping and cleaning URLs from Genius.com, offering a rich resource for textual analysis. The dataset's purpose is to facilitate various analytical tasks, including the identification of key character contributions and narrative patterns across the series.

Columns

The dataset contains six columns, each serving a distinct purpose:
  • Release Date: The original air date for each episode.
  • Season: The specific season number for the episode.
  • Episode: The episode number within its respective season.
  • Episode Title: The official title of each episode.
  • Name: The name of the character speaking a line.
  • Sentence: The actual sentence spoken by the character.

Distribution

The dataset is provided in a CSV (Comma Separated Values) file format. It comprises the entire Game of Thrones script across all seasons. While a specific total row count is not explicitly stated as a single figure, the data spans from 17 April 2011 to 19 May 2019, with various counts per date range indicating thousands of records, such as 3,179, 3,914, and 2,206 sentences for different periods. The dataset contains 22,300 unique values for sentences.

Usage

This dataset is ideal for a wide range of applications, including:
  • Natural Language Processing (NLP): Analysing dialogue, character sentiment, and thematic development.
  • Character Analysis: Identifying dominant speaking roles, character interactions, and narrative arcs.
  • Script Breakdown: Detailed examination of episode content, dialogue structure, and plot progression.
  • Academic Research: Studies on storytelling, media consumption, and popular culture.
  • Data Visualisation: Creating visual representations of dialogue frequency, character appearances, and temporal trends.

Coverage

The dataset covers the entire broadcast period of the Game of Thrones series, from 17 April 2011 to 19 May 2019. It includes script content for all seasons and episodes. The scope is global in terms of accessibility, and it encompasses all speaking characters within the series, providing a broad demographic representation of the fictional world.

License

CC0

Who Can Use It

This dataset is suitable for a broad audience, including:
  • Data Scientists and Analysts: For text mining, NLP tasks, and machine learning model training.
  • Researchers: In fields such as media studies, literature, and computational linguistics.
  • Entertainment Industry Professionals: For content analysis, script insights, and audience engagement strategies.
  • Game of Thrones Enthusiasts: For personal exploration, fan theories, and detailed character studies.
  • Educators and Students: For teaching and learning about data analysis, text processing, and narrative structures.

Dataset Name Suggestions

  • Game of Thrones Dialogue Data
  • Westeros Script Archive
  • GoT Complete Script Dataset
  • Game of Thrones Series Dialogue
  • Full Game of Thrones Text Data

Attributes

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

16/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format