European Healthcare Machine Translation Registry
Patient Health Records & Digital Health
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Providing a specialised resource for neural machine translation, the ELRC-Medical-V2 collection serves as a parallel corpus specifically tailored for the healthcare sector. Funded by the European Commission and coordinated by the German Research Center for Artificial Intelligence, the project facilitates the development of high-quality translation systems for medical and administrative contexts. By offering a large set of aligned English and Portuguese phrases, it supports the advancement of medical linguistics and digital health communication across the European Union.
Columns
- id: A unique numeric identifier assigned to each record within the collection.
- lang: The language pair code for the record, specified as "en-pt" for English and Portuguese.
- source_text: The original textual content in the source language, containing healthcare-related phrases or administrative documentation.
- target_text: The translated version of the source text in the target language, providing the linguistic equivalent for the parallel corpus.
Distribution
The information is delivered in a tabular CSV format titled
csv_en-pt.csv, with a total file size of 4.59 MB. The collection contains 13,148 records with 100% validity across all four columns, indicating there are no missing or mismatched entries. This resource has achieved a perfect usability score of 10.00.Usage
This resource is ideal for training and fine-tuning neural machine translation (NMT) models within the medical domain. It is well-suited for linguistic researchers conducting cross-lingual analysis of healthcare terminology. Additionally, the corpus can be used as a benchmarking tool for evaluating the accuracy of translation software in specific technical and governmental contexts.
Coverage
The scope is focused on the European healthcare and medical sectors, specifically addressing the English and Portuguese language pair. The text reflects the linguistic standards and administrative terminology used by European Union bodies and medical institutions. It is a static archive with no further updates expected, capturing a specific state of the European healthcare translation landscape.
License
Attribution 4.0 International (CC BY 4.0)
Who Can Use It
Computational linguists and AI researchers can leverage these records to improve the performance of language models in technical domains. Healthcare software developers may utilise the data to build translation features for medical platforms. Furthermore, professional translators and terminologists can use the corpus as a reference for consistent translation of medical and administrative phrases.
Dataset Name Suggestions
- ELRC-Medical-V2: Portuguese-English Parallel Corpus
- European Healthcare Machine Translation Registry
- Portuguese-English Medical Linguistic Archive
- Healthcare Domain Translation Database (V2)
- EU Funded Portuguese-English Medical Text Collection
Attributes
Original Data Source:European Healthcare Machine Translation Registry
Loading...
Free
Download Dataset in CSV Format
Recommended Datasets
Loading recommendations...
