£0

King James Bible Text Dataset

Knowledge Bundles

Tags and Keywords

Religion

Belief

Systems

Text

Nlp

English

Trusted By

King James Bible Text Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset provides the full text of the King James Bible, a sacred book for Christians with a rich and varied history. The Old Testament, originally written in Hebrew, recounts the story of the Israelite people and includes religious law, poetry, and prophecy. The New Testament, originally in Greek, details the life of Jesus Christ and the early development of the Christian church. Authorised in 1604 by King James I of England for the Church of England, this translation has become the most popular English version of the bible. It is an excellent resource for Natural Language Processing (NLP) techniques, offering opportunities to explore unique linguistic features such as Hebrew parallelism and chiasmus, or to uncover "riddles" referenced by King Solomon in the book of Proverbs.

Columns

version_name: The name of the bible version.
version_abbr: The abbreviation for the bible version.
testament_abbr: An abbreviation for the bible section, either Old Testament (OT) or New Testament (NT).
testament_name: The full name of the bible section, Old Testament or New Testament.
book_name: The name of the book within the bible.
book_number: The numerical order of the book within the bible.
chapter_number: The chapter number within a book.
verse_number: The verse number within a chapter.
verse_text: The actual text of the verse.

Distribution

The dataset is typically provided in a CSV format. It contains 30,833 unique verse values. Approximately 74% of the verses belong to the Old Testament, with the remaining 26% from the New Testament. The book of Psalms accounts for about 8% of the verses, while Genesis constitutes 5%, and other books make up 87%. The distribution of verse text length varies, with significant counts of verses falling into various character length ranges, from 1.00-4.25 characters (4,893 verses) up to longer ranges such as 40.00-43.25 characters (3,779 verses) and 17.25-20.50 characters (4,446 verses).

Usage

This dataset is ideal for various applications, especially those involving Natural Language Processing (NLP). Potential uses include identifying instances of Hebrew literary techniques like parallelism, detecting chiastic structures spanning chapters, and exploring the "riddles" mentioned in the book of Proverbs. It can also be used for linguistic analysis, text mining, and creating large language models.

Coverage

The dataset has global relevance, providing a foundational text for users worldwide. The content spans the historical periods covered by the Old Testament (focusing on the Israelite people) and the New Testament (covering the life of Jesus Christ and the early Christian church). The translation itself was authorised in 1604.

License

CC0

Who Can Use It

This dataset is suitable for:

Researchers and academics: For studies in theology, linguistics, literary analysis, and digital humanities.
Developers and data scientists: For building NLP models, text generation, and historical text analysis tools.
Educators: For teaching about biblical texts, history, and language.
Individuals interested in religious texts: For personal study or exploration of the King James Bible.

Dataset Name Suggestions

King James Bible Text Dataset
KJV Verses Collection
Biblical Text (King James Version)
Sacred Scripture Dataset

Attributes

Original Data Source: The King James Bible

Listing Stats

VIEWS

DOWNLOADS

LISTED

21/06/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0