King James Bible Text Dataset
Knowledge Bundles
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides the full text of the King James Bible, a sacred book for Christians with a rich and varied history. The Old Testament, originally written in Hebrew, recounts the story of the Israelite people and includes religious law, poetry, and prophecy. The New Testament, originally in Greek, details the life of Jesus Christ and the early development of the Christian church. Authorised in 1604 by King James I of England for the Church of England, this translation has become the most popular English version of the bible. It is an excellent resource for Natural Language Processing (NLP) techniques, offering opportunities to explore unique linguistic features such as Hebrew parallelism and chiasmus, or to uncover "riddles" referenced by King Solomon in the book of Proverbs.
Columns
- version_name: The name of the bible version.
- version_abbr: The abbreviation for the bible version.
- testament_abbr: An abbreviation for the bible section, either Old Testament (OT) or New Testament (NT).
- testament_name: The full name of the bible section, Old Testament or New Testament.
- book_name: The name of the book within the bible.
- book_number: The numerical order of the book within the bible.
- chapter_number: The chapter number within a book.
- verse_number: The verse number within a chapter.
- verse_text: The actual text of the verse.
Distribution
The dataset is typically provided in a CSV format. It contains 30,833 unique verse values. Approximately 74% of the verses belong to the Old Testament, with the remaining 26% from the New Testament. The book of Psalms accounts for about 8% of the verses, while Genesis constitutes 5%, and other books make up 87%. The distribution of verse text length varies, with significant counts of verses falling into various character length ranges, from 1.00-4.25 characters (4,893 verses) up to longer ranges such as 40.00-43.25 characters (3,779 verses) and 17.25-20.50 characters (4,446 verses).
Usage
This dataset is ideal for various applications, especially those involving Natural Language Processing (NLP). Potential uses include identifying instances of Hebrew literary techniques like parallelism, detecting chiastic structures spanning chapters, and exploring the "riddles" mentioned in the book of Proverbs. It can also be used for linguistic analysis, text mining, and creating large language models.
Coverage
The dataset has global relevance, providing a foundational text for users worldwide. The content spans the historical periods covered by the Old Testament (focusing on the Israelite people) and the New Testament (covering the life of Jesus Christ and the early Christian church). The translation itself was authorised in 1604.
License
CC0
Who Can Use It
This dataset is suitable for:
- Researchers and academics: For studies in theology, linguistics, literary analysis, and digital humanities.
- Developers and data scientists: For building NLP models, text generation, and historical text analysis tools.
- Educators: For teaching about biblical texts, history, and language.
- Individuals interested in religious texts: For personal study or exploration of the King James Bible.
Dataset Name Suggestions
- King James Bible Text Dataset
- KJV Verses Collection
- Biblical Text (King James Version)
- Sacred Scripture Dataset
Attributes
Original Data Source: The King James Bible