Opendatabay APP

European Healthcare Machine Translation Registry

Patient Health Records & Digital Health

Tags and Keywords

Medical

Translation

Portuguese

Healthcare

Corpus

Trusted By
Trusted by company1Trusted by company2Trusted by company3
European Healthcare Machine Translation Registry Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Providing a specialised resource for neural machine translation, the ELRC-Medical-V2 collection serves as a parallel corpus specifically tailored for the healthcare sector. Funded by the European Commission and coordinated by the German Research Center for Artificial Intelligence, the project facilitates the development of high-quality translation systems for medical and administrative contexts. By offering a large set of aligned English and Portuguese phrases, it supports the advancement of medical linguistics and digital health communication across the European Union.

Columns

  • id: A unique numeric identifier assigned to each record within the collection.
  • lang: The language pair code for the record, specified as "en-pt" for English and Portuguese.
  • source_text: The original textual content in the source language, containing healthcare-related phrases or administrative documentation.
  • target_text: The translated version of the source text in the target language, providing the linguistic equivalent for the parallel corpus.

Distribution

The information is delivered in a tabular CSV format titled csv_en-pt.csv, with a total file size of 4.59 MB. The collection contains 13,148 records with 100% validity across all four columns, indicating there are no missing or mismatched entries. This resource has achieved a perfect usability score of 10.00.

Usage

This resource is ideal for training and fine-tuning neural machine translation (NMT) models within the medical domain. It is well-suited for linguistic researchers conducting cross-lingual analysis of healthcare terminology. Additionally, the corpus can be used as a benchmarking tool for evaluating the accuracy of translation software in specific technical and governmental contexts.

Coverage

The scope is focused on the European healthcare and medical sectors, specifically addressing the English and Portuguese language pair. The text reflects the linguistic standards and administrative terminology used by European Union bodies and medical institutions. It is a static archive with no further updates expected, capturing a specific state of the European healthcare translation landscape.

License

Attribution 4.0 International (CC BY 4.0)

Who Can Use It

Computational linguists and AI researchers can leverage these records to improve the performance of language models in technical domains. Healthcare software developers may utilise the data to build translation features for medical platforms. Furthermore, professional translators and terminologists can use the corpus as a reference for consistent translation of medical and administrative phrases.

Dataset Name Suggestions

  • ELRC-Medical-V2: Portuguese-English Parallel Corpus
  • European Healthcare Machine Translation Registry
  • Portuguese-English Medical Linguistic Archive
  • Healthcare Domain Translation Database (V2)
  • EU Funded Portuguese-English Medical Text Collection

Attributes

Listing Stats

VIEWS

1

DOWNLOADS

1

LISTED

26/12/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in CSV Format