Opendatabay APP

PM Modi Speeches Transcript Collection

Data Science and Analytics

Tags and Keywords

Modi

Speeches

India

Politics

Transcript

Trusted By
Trusted by company1Trusted by company2Trusted by company3
PM Modi Speeches Transcript Collection Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

A structured collection of public address transcripts delivered by Narendra Damodardas Modi, the 14th Prime Minister of India, covering his tenure from August 2014 through August 2020. This resource provides insight into political rhetoric and policy communication, sourced directly from the official PM India website. The original transcripts were scraped from the official site using tools like Selenium and Python's Beautiful Soup package.

Columns

The collection includes six fields detailing aspects of each address:
  • date: The specific date on which the speech was given. The range covered is 15 August 2014 to 30 August 2020.
  • title: The designated heading or title assigned to the speech on the originating website.
  • url: The unique link providing access to the original HTML version of the speech on the official website.
  • lang: The detected language of the speech text, categorized as either English (en) or Hindi (hi).
  • words: The calculated total count of words found within the speech transcript. Word counts range from a minimum of 753 to a maximum of 74.2k, with a mean of 12.4k words per speech.
  • text: The full, unabridged transcript of the Prime Minister's address.

Distribution

The data is distributed in standard machine-readable formats, specifically CSV (with the main file, PM_Modi_speeches.csv, being approximately 20.65 MB) and JSON formats. The collection consists of 922 valid records, each representing a unique speech transcript. The data is intended to be updated on a monthly basis.

Usage

This collection is ideally suited for advanced Natural Language Processing (NLP) activities, including:
  • Topic Modelling: Identifying evolving political agendas and shifts in governmental focus over the six-year period.
  • Sentiment Analysis: Measuring the tone and emotional content of high-level political discourse.
  • Linguistic Research: Studying the syntax, vocabulary, and stylistic evolution of political communication in India.
  • Time-Series Rhetoric Analysis: Tracking the frequency of specific keywords or policy mentions over the stated timeframe.

Coverage

The temporal scope runs from 15 August 2014 to 30 August 2020. The content reflects the political communication of the Prime Minister of India, addressing both domestic and international subjects. The content language is split primarily between English (53%) and Hindi (47%).

License

Attribution 4.0 International (CC BY 4.0)

Who Can Use It

  • Academics and Students: For research into Indian political science, communication studies, and contemporary history.
  • Data Scientists: For training custom language models focusing on political and governmental texts.
  • Media Analysts: To conduct textual analysis of policy pronouncements and public relations strategies.
  • Government Researchers: To benchmark public communication and messaging strategies.

Dataset Name Suggestions

  1. PM Modi Speeches Transcript Collection (2014–2020)
  2. Narendra Modi Public Addresses Archive
  3. Indian Political Rhetoric Corpus (2014–2020)
  4. PM India Official Transcripts

Attributes

Listing Stats

VIEWS

2

DOWNLOADS

0

LISTED

23/11/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in ZIP Format