Opendatabay APP

Bulgarian Poetry Dataset for NLP

Entertainment & Media Consumption

Tags and Keywords

Literature

Nlp

Languages

Bulgarian

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Bulgarian Poetry Dataset for NLP Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset offers a collection of poems in Bulgarian, originally scraped from Chitanka.info. It serves as a valuable resource for text generation and author categorisation tasks within natural language processing.

Columns

The dataset is structured with three key columns:
  • author: The name of the poem's author.
  • title: The specific title of the poem.
  • poem: The full text of the poem, where a special token denotes a newline.

Distribution

This dataset is provided as a single CSV file, named chitanka-corpus.csv. It comprises three columns. While the exact number of rows or records is not specified, it contains data on over 15,000 unique authors and over 17,000 unique poem titles, indicating a substantial volume of literary works.

Usage

Ideal applications for this dataset include:
  • Developing and training text generation models for Bulgarian poetry.
  • Implementing and evaluating author categorisation algorithms.
  • Linguistic research and analysis of Bulgarian literary styles.

Coverage

The dataset focuses exclusively on Bulgarian poems and authors. While specific time ranges are not detailed, it includes works from notable authors such as Борис Младенов-Young (representing 5% of the authors) and Иван Вазов (representing 4% of the authors), with the majority of content attributed to various other authors (91%).

License

CC0

Who Can Use It

This dataset is particularly suitable for:
  • Data scientists and machine learning engineers working on natural language processing tasks.
  • Linguists and literary scholars interested in Bulgarian language and poetry.
  • Researchers developing new algorithms for text analysis and generation.

Dataset Name Suggestions

  • Bulgarian Poems Corpus
  • Chitanka Literary Collection
  • Bulgarian Poetry Dataset for NLP
  • Bulgarian Authorial Verse

Attributes

Original Data Source: Bulgarian Poems Dataset

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

24/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format