English-ASL Language Interoperability Dataset
Health Information Systems & Technology
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset offers a powerful synthetic English-ASL gloss parallel corpus that was generated in 2012, providing an exciting opportunity to bridge the cultural divide between English and American Sign Language. By exploring this cross-cultural language interoperability, it aims to connect linguistic communities and bring together aspects of communication often seen as separated. The data supports innovative approaches to machine translation models and helps to uncover further insights into bridging linguistic divides.
Columns
The dataset consists of two primary columns:
- gloss: This column contains the ASL gloss representation in a given context for any keyword or phrase spoken in ASL. It provides English representations of an ASL sign, helping users to better understand the correlation between written English and ASL signs.
- text: This column provides a written translation or interpretation in English for each corresponding ASL sign within the gloss column.
Distribution
The dataset is typically provided in a CSV file format, specifically referenced as
train.csv
. It comprises two columns: gloss
and text
. The gloss
column contains 81,123 unique values, while the text
column contains 81,016 unique values. This indicates the dataset consists of approximately 81,123 records.Usage
This dataset can be used for a variety of applications and use cases, including:
- Creating a variety of scenarios which emulate common conversation topics found in everyday life, such as greetings, family activities, or home chores, by pairing individual words with their translations into ASL signs.
- Helping users to gain proficiency over time in having coherent conversations using both spoken languages and signed languages such as American Sign Language (ASL).
- Developing generative ASL-English bilingual chat bots.
- Benchmarking different translation models to measure their accuracy.
- Assessing various translation techniques and determining which is the most successful in translating from English to ASL.
- Further exploration using predictive models to unravel complex linguistic problems that often abound cross-cultural communication barriers.
Coverage
The dataset focuses on the linguistic relationship between English and American Sign Language. While specific demographic details are not provided, its general availability is noted as global. The data was generated in 2012, offering a snapshot from that time.
License
CC0
Who Can Use It
This dataset is ideal for:
- Researchers interested in linguistics, natural language processing (NLP), and machine translation.
- Individuals seeking to learn and practise American Sign Language, aiming to improve their proficiency in coherent conversations using both spoken and signed communication.
- Developers and data scientists working on AI models, chat bots, or translation systems that involve ASL and English.
- Anyone interested in cross-cultural communication and bridging linguistic divides through language interoperability.
Dataset Name Suggestions
- ASL-English Parallel Gloss Corpus 2012
- American Sign Language Translation Data
- English-ASL Language Interoperability Dataset
- ASL Gloss Representation Corpus
- Bilingual ASL-English Communication Data
Attributes
Original Data Source: AslgPc12 (English-ASL Gloss Parallel Corpus 2012)