Opendatabay APP

Classical Bengali Poetry Collection

Knowledge Bundles

Tags and Keywords

Text

Literature

Nlp

Deep

Learning

Transformers

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Classical Bengali Poetry Collection Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset provides a valuable source of information for fine-tuning pre-trained models with the Bengali language. It is particularly useful for tasks such as poetry generation. Given the emergence of large language models like GPT-2, GPT-3, GPT-Neo, and GPT-J, this dataset supports various downstream applications, including text generation, masked word prediction, and sentiment classification. The dataset contains 2,686 poems from prominent Bengali poets, all of which are in the public domain.

Columns

The dataset typically includes metadata about the poets and the poems themselves. The structured data for poets contains columns such as:
  • poet: The name of the Bengali poet.
  • wikipedia_link: A direct URL to the poet's Wikipedia page, offering further biographical and literary context.
  • filename: The name of the text file where the poet's collected poems are stored.
  • number_of_poems: The total count of poems contributed by that specific poet within the dataset. The actual poem content is organised within separate files, as indicated by the 'filename' column.

Distribution

The dataset is structured with a CSV file (poets.csv) for poet metadata and individual text files for the poems. It includes a README.md file for additional information. The collection encompasses 2,686 poems. While the exact number of records for the metadata file (like poets.csv) is not specified, it lists details for multiple distinct poets. The dataset maintains a quality rating of 5 out of 5 and is currently at version 1.0.

Usage

This dataset is ideally suited for training and fine-tuning models specifically designed for poetry generation. Beyond this, it can be leveraged for various natural language processing tasks, including general text generation, predicting masked words, and sentiment analysis within Bengali text. Users are encouraged to apply the dataset responsibly, recognising the potential power of the tools it facilitates.

Coverage

The dataset's geographic scope is global. It features works by prominent Bengali poets, with all included poems being entirely in the public domain, which suggests a historical collection. There are no specific notes regarding data availability limitations for particular groups or timeframes beyond the public domain status.

License

CC-BY-SA

Who Can Use It

This dataset is highly beneficial for data scientists, machine learning engineers, and researchers engaged in natural language processing, particularly those with a focus on the Bengali language. It is also suitable for developers building applications that require text generation, such as creative writing tools, or those looking to enhance existing language models with Bengali poetry.

Dataset Name Suggestions

  • Free Bengali Poetry Dataset
  • Bengali Public Domain Poetry
  • Bengali Language Model Corpus
  • Classical Bengali Poetry Collection

Attributes

Original Data Source: Free Bengali Poetry

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

21/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in ZIP Format