Hockey Player & Coach Speech Data
Sports & Recreation
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides a unique collection of interview transcripts from the National Hockey League, primarily focusing on the Stanley Cup Final. It offers a valuable resource for anyone interested in sports communication, natural language processing, or the evolution of speech patterns in professional hockey. The data was meticulously scraped from a sports website, with efforts made to account for various formatting, though some complex pages were excluded. It includes detailed information about the teams involved, interview dates, and the roles of the individuals interviewed, whether players, coaches, or other officials. This dataset is particularly significant for tasks like training conversational AI models or analysing linguistic differences between various roles within the NHL.
Columns
- RowId: A distinct identifier for each interview record.
- team1: One of the two teams participating in the Stanley Cup Final. The assignment to 'team1' or 'team2' is based on the listing order on the original website.
- team2: The other team involved in the Stanley Cup Final.
- date: The specific date when the interview took place.
- name: The name of the person being interviewed.
- job: Describes the interviewee's role at the time of the interview. Values include 'player', 'coach', and 'other'. The 'other' category typically encompasses general managers, league officials, and commentators. Some job titles were assigned automatically, while others were determined manually.
- text: The transcribed interview content. This column only contains speech from the interviewee, as interviewer questions were not collected. Responses are separated by periods, which are the only punctuation present.
Distribution
The dataset is typically provided in a CSV (Comma Separated Values) file format. It contains approximately 2,095 unique records. The data spans interviews conducted from 30th May 1997 to 10th June 2019. Specific file size details are not available.
Usage
This dataset is ideal for a range of analytical and machine learning applications, including:
- Training RNN-based chatbots to simulate hockey player responses.
- Analysing speech patterns of NHL coaches and players.
- Investigating whether coaches exhibit more positive or team-oriented language than players.
- Studying how hockey interview responses have evolved over different eras.
- Developing AI models that can generate text resembling NHL interview dialogue.
Coverage
The dataset covers National Hockey League interviews, with a primary focus on the Stanley Cup Final. It implicitly covers a global scope as the NHL is an international league, though specific geographic locations of interviews are not detailed. The time range for the interviews is from 30th May 1997 to 10th June 2019. The demographic scope includes individuals categorised by their job roles as players, coaches, and other officials (such as general managers, league officials, and commentators).
License
CC0
Who Can Use It
This dataset is suitable for:
- Natural Language Processing (NLP) researchers looking for domain-specific text data.
- Data scientists and analysts interested in sports analytics and communication trends.
- Machine learning engineers developing conversational AI or text generation models.
- Academics and students studying linguistics, sports history, or media studies.
- Sports enthusiasts curious about the language used by NHL figures.
Dataset Name Suggestions
- NHL Interview Transcripts 1997-2019
- Stanley Cup Final Interviews
- Hockey Player & Coach Speech Data
- NHL Media Conference Transcripts
- ASAPSports NHL Interview Archive
Attributes
Original Data Source: National Hockey League Interviews