Dark Mode

Home

Data Categories

AI & ML Data

Bd Indigenous Languages Dataset

FREE DATASET LIBRARY

Verified Data Provider

£0

Bd Indigenous Languages Dataset

Knowledge Bundles

Tags and Keywords

Tabular

Classification

Nlp

Lstm

Svm

Trusted By

Bd Indigenous Languages Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset contains text entries in various ethnic languages of Bangladesh. It is valuable for tasks related to multilingual Natural Language Processing (NLP), language classification, and the preservation of underrepresented languages. The dataset includes 4,713 entries.

Columns

Converted Text: This column holds short text samples in various ethnic languages.
Language: This column provides the corresponding language label for each text sample. It includes six distinct languages: Chakma, Marma, Tripura, Santali, Garo, and Rakhine.

Distribution

The dataset consists of 4,713 rows and two columns. The distribution of languages within the dataset is as follows: Chakma accounts for 22% of entries, Marma for 20%, and other languages collectively make up 57% of the dataset. The data is typically available in a CSV format.

Usage

Ideal applications and use cases for this dataset include:

Developing and testing language classification models.
Research in multilingual NLP, particularly for less-resourced languages.
Projects focused on the preservation and analysis of indigenous languages.
Training machine learning algorithms for text analysis in specific Bangladeshi ethnic languages.

Coverage

This dataset covers text samples from several ethnic languages spoken in Bangladesh, specifically Chakma, Marma, Tripura, Santali, Garo, and Rakhine. The focus is on the linguistic diversity within Bangladesh. No specific time range for data collection is provided.

License

CCO

Who Can Use It

This dataset is suitable for:

NLP Researchers: To build and evaluate language models for underrepresented languages.
Linguists: For academic study and documentation of Bangladeshi ethnic languages.
Machine Learning Engineers: To train and deploy language classification systems.
Academics: For educational and research purposes related to language diversity and digital humanities.
Developers: To integrate language awareness into applications targeting diverse linguistic groups.

Dataset Name Suggestions

Bangladeshi Ethnic Languages Text Collection
Bd Indigenous Languages Dataset
Bangladesh Minority Languages Text Corpus
Chakma Marma Tripura Santali Garo Rakhine Text Dataset

Attributes

Original Data Source: Bd Ethnic Languages Classification

Listing Stats

VIEWS

DOWNLOADS

LISTED

16/06/2025

REGION

ASIA

QUALITY

5 / 5

VERSION

1.0

FREE DATASET LIBRARY

£0

Bd Indigenous Languages Dataset

Knowledge Bundles

Tags and Keywords

Tabular

Classification

Nlp

Lstm

Svm

Trusted By

Free

About

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Listing Stats

Free

Download Dataset in CSV Format

RECOMMENDED DATASETS