Opendatabay APP

English-ASL Language Interoperability Dataset

Health Information Systems & Technology

Tags and Keywords

Health

Social

Text

Linguistics

Nlp

English

Translation

Trusted By
Trusted by company1Trusted by company2Trusted by company3
English-ASL Language Interoperability Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset offers a powerful synthetic English-ASL gloss parallel corpus that was generated in 2012, providing an exciting opportunity to bridge the cultural divide between English and American Sign Language. By exploring this cross-cultural language interoperability, it aims to connect linguistic communities and bring together aspects of communication often seen as separated. The data supports innovative approaches to machine translation models and helps to uncover further insights into bridging linguistic divides.

Columns

The dataset consists of two primary columns:
  • gloss: This column contains the ASL gloss representation in a given context for any keyword or phrase spoken in ASL. It provides English representations of an ASL sign, helping users to better understand the correlation between written English and ASL signs.
  • text: This column provides a written translation or interpretation in English for each corresponding ASL sign within the gloss column.

Distribution

The dataset is typically provided in a CSV file format, specifically referenced as train.csv. It comprises two columns: gloss and text. The gloss column contains 81,123 unique values, while the text column contains 81,016 unique values. This indicates the dataset consists of approximately 81,123 records.

Usage

This dataset can be used for a variety of applications and use cases, including:
  • Creating a variety of scenarios which emulate common conversation topics found in everyday life, such as greetings, family activities, or home chores, by pairing individual words with their translations into ASL signs.
  • Helping users to gain proficiency over time in having coherent conversations using both spoken languages and signed languages such as American Sign Language (ASL).
  • Developing generative ASL-English bilingual chat bots.
  • Benchmarking different translation models to measure their accuracy.
  • Assessing various translation techniques and determining which is the most successful in translating from English to ASL.
  • Further exploration using predictive models to unravel complex linguistic problems that often abound cross-cultural communication barriers.

Coverage

The dataset focuses on the linguistic relationship between English and American Sign Language. While specific demographic details are not provided, its general availability is noted as global. The data was generated in 2012, offering a snapshot from that time.

License

CC0

Who Can Use It

This dataset is ideal for:
  • Researchers interested in linguistics, natural language processing (NLP), and machine translation.
  • Individuals seeking to learn and practise American Sign Language, aiming to improve their proficiency in coherent conversations using both spoken and signed communication.
  • Developers and data scientists working on AI models, chat bots, or translation systems that involve ASL and English.
  • Anyone interested in cross-cultural communication and bridging linguistic divides through language interoperability.

Dataset Name Suggestions

  • ASL-English Parallel Gloss Corpus 2012
  • American Sign Language Translation Data
  • English-ASL Language Interoperability Dataset
  • ASL Gloss Representation Corpus
  • Bilingual ASL-English Communication Data

Attributes

Listing Stats

VIEWS

1

DOWNLOADS

2

LISTED

17/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format