Opendatabay APP

Definitive TED Talk Metadata and Transcripts

Data Science and Analytics

Tags and Keywords

Ted

Transcripts

Nlp

Metadata

Talks

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Definitive TED Talk Metadata and Transcripts Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Detailed metadata and transcripts covering talks hosted on TED.com, captured up to June 2020. This collection allows for in-depth analysis of public speaking trends, audience engagement through comment counts, and content mining via full transcripts. It serves as a resource for discovering insights worth sharing from the extensive library of TED Talks, covering diverse tags such as Business, Earth and Nature.

Columns

  • transcript: Transcript of the Talk.
  • duration: Duration of the Talk in seconds.
  • comment_count: Number of Comments on the Talk page.
  • Metadata: Includes generic, speaker, and talk-related information (columns detailed in the dataset structure).

Distribution

The dataset contains 4,609 valid entries. The structure is tabular, with each row corresponding to a specific talk.
  • Duration: Ranges from 411 seconds to 5,257 seconds, with a mean of approximately 727 seconds.
  • Comments: Ranges from 37 to 6,460 comments, with a mean of 161.
  • Completeness: The core metrics such as duration show 100% validity, while transcripts have a high validity rate (approximately 89%).

Usage

  • Natural Language Processing (NLP): Analyse transcripts for sentiment, topic modelling, or linguistic patterns.
  • Engagement Analysis: Correlate comment counts and duration to understand audience interaction.
  • Public Speaking Research: Study the structure and length of successful talks.
  • Content Recommendation: Build systems based on talk metadata and tags.

Coverage

  • Time Range: Includes talks available on TED.com up to the scrape date of 24 June 2020.
  • Scope: Covers the entire TED.com repository available at the time.
  • Update Frequency: Expected to be updated quarterly.

License

CC0: Public Domain

Who Can Use It

  • Data Scientists: For NLP projects and predictive modelling.
  • Linguists: For analysing speech patterns and rhetorical devices.
  • Public Speaking Coaches: To derive data-driven insights on effective presentation styles.
  • Educators: For finding content relevant to specific topics like Business or Nature.

Dataset Name Suggestions

  • Definitive TED Talk Metadata and Transcripts
  • TED.com Complete Talks Collection
  • TED Talks: Transcripts and Engagement Metrics

Attributes

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

03/12/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in CSV Format