Dark Mode

Home

Data Categories

AI & ML Data

Bulgarian PoS and Lemma Dataset

FREE DATASET LIBRARY

Verified Data Provider

£0

Bulgarian PoS and Lemma Dataset

Education & Learning Analytics

Tags and Keywords

Nlp

Languages

Bulgarian

Trusted By

Bulgarian PoS and Lemma Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset aims to address the limited availability of data for the Bulgarian language, particularly on platforms like Kaggle. It contains a collection of Bulgarian words in various forms, scraped to provide a foundational resource for natural language processing tasks. The primary purpose is to facilitate part-of-speech tagging, lemmatisation, and exploratory data analysis for the Bulgarian language.

Columns

word: The word string itself, representing all scraped Bulgarian words.
lemma: The lemma or basic form of the word.
form: The specific grammatical form of the word as it appears in the 'word' column.
pos: The part of speech assigned to the word.

Distribution

The dataset is provided as a single CSV file, bg-pos.csv. It includes a substantial collection of Bulgarian words in their various forms. Specific numbers for rows or records are not available.

Usage

This dataset is ideal for:

Exploratory data analysis (EDA) of Bulgarian language structures.
Developing and training models for part-of-speech tagging and recognition for Bulgarian text.
Implementing and improving lemmatisation algorithms for the Bulgarian language.

Coverage

The dataset focuses exclusively on the Bulgarian language, covering almost all words in their various forms. Its applicability is global, serving anyone working with Bulgarian linguistic data. There are no specific time ranges or demographic scopes noted for the data.

License

CC0

Who Can Use It

Linguists and researchers studying Bulgarian morphology and syntax.
Data scientists and machine learning engineers developing NLP applications for the Bulgarian language.
Academics and students in fields such as computational linguistics, artificial intelligence, and language studies.
Anyone interested in the structural analysis of the Bulgarian language.

Dataset Name Suggestions

Bulgarian PoS and Lemma Dataset
Bulgarian Word Forms Collection
Bulgarian Language Morphology Data
Bulgarian NLP Starter Pack

Attributes

Original Data Source: Bulgarian Part Of Speech Dataset

Listing Stats

VIEWS

DOWNLOADS

LISTED

27/06/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in ZIP Format

Recommended Datasets

Loading recommendations...