Opendatabay APP

First World War Primary Sources

Data Science and Analytics

Tags and Keywords

Wwi

Letters

History

Military

Text

Trusted By
Trusted by company1Trusted by company2Trusted by company3
First World War Primary Sources Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Explains data documenting letters and extracts exchanged during the First World War. This material offers primary, invaluable insight into the everyday existence and emotional realities experienced by soldiers, and occasionally their families, throughout the conflict. Researchers should note the presence of censor and self-censor factors influencing the content. The collection includes detailed metadata for each record, providing information about the authors, the date and place of writing, and the language used. The data helps illuminate the hard life of individuals forced into service during WWI.

Columns

  • letter_key: The identification key used to locate the full text of the letter within an associated JSON file. This column contains 60 total values.
  • author: The name of the individual who authored the correspondence. Note that 17% of values are missing. R.C.S. Frost is the most frequently occurring author, making up 8% of the records.
  • year: The year the letter was written, ranging from 1914 to 1918. Approximately 8% of entries are missing this value.
  • month: The specific month the correspondence was dated. Roughly 12% of entries are missing.
  • day: The day of the month the correspondence was dated, ranging from 1 to 31. Approximately 13% of entries are missing.
  • place: The location where the letter was composed, which could be a locality, region, or country. France is the most frequently recorded location (38%), and 13% of values are missing.
  • language: The language employed in the correspondence, which is either English (67%) or French (33%).
  • source: The origin, mainly websites, that supplied the digital transcription of the letter. The National Archives website is the most common source (67%).

Distribution

The data is usually available in a standard structured file format, typically CSV. The primary index file, named index.csv, occupies 5.3 kB. This file is structured across 8 columns. The collection contains 60 records or total values. A sample file will be updated separately to the platform.

Usage

This collection is suitable for several research applications:
  • Historical Analysis: Examining the personal impact of WWI on the morale and daily life of soldiers.
  • Linguistic Studies: Applying Natural Language Processing (NLP) techniques for historical text analysis, focusing on sentiment or linguistic evolution.
  • Primary Source Research: Utilising first-hand accounts to understand wartime communication and emotional expression.

Coverage

The data spans the duration of the conflict, with recorded writing dates ranging from 1914 through 1918. The authors are primarily World War I soldiers, occasionally supplemented by letters from their family members. The correspondence is available in two languages: English and French. Geographically, the location of writing is varied, with France being the most frequent place recorded.

License

Attribution 3.0 Unported (CC BY 3.0)

Who Can Use It

  • Historians and Academics: For detailed research using primary source materials related to the First World War centenary and beyond.
  • Data Scientists and NLP Practitioners: Professionals seeking structured, digitised historical text data for training models and conducting text analysis.
  • Social Scientists: Researchers interested in the social impact and personal communications of the era.

Dataset Name Suggestions

  • World War I Letters
  • First World War Soldier Correspondence
  • WWI Wartime Communications
  • Historical Soldier Letters 1914-1918

Attributes

Original Data Source: First World War Primary Sources

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

08/11/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in ZIP Format