Classical Bengali Poetry Collection
Knowledge Bundles
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides a valuable source of information for fine-tuning pre-trained models with the Bengali language. It is particularly useful for tasks such as poetry generation. Given the emergence of large language models like GPT-2, GPT-3, GPT-Neo, and GPT-J, this dataset supports various downstream applications, including text generation, masked word prediction, and sentiment classification. The dataset contains 2,686 poems from prominent Bengali poets, all of which are in the public domain.
Columns
The dataset typically includes metadata about the poets and the poems themselves. The structured data for poets contains columns such as:
- poet: The name of the Bengali poet.
- wikipedia_link: A direct URL to the poet's Wikipedia page, offering further biographical and literary context.
- filename: The name of the text file where the poet's collected poems are stored.
- number_of_poems: The total count of poems contributed by that specific poet within the dataset. The actual poem content is organised within separate files, as indicated by the 'filename' column.
Distribution
The dataset is structured with a CSV file (
poets.csv
) for poet metadata and individual text files for the poems. It includes a README.md
file for additional information. The collection encompasses 2,686 poems. While the exact number of records for the metadata file (like poets.csv
) is not specified, it lists details for multiple distinct poets. The dataset maintains a quality rating of 5 out of 5 and is currently at version 1.0.Usage
This dataset is ideally suited for training and fine-tuning models specifically designed for poetry generation. Beyond this, it can be leveraged for various natural language processing tasks, including general text generation, predicting masked words, and sentiment analysis within Bengali text. Users are encouraged to apply the dataset responsibly, recognising the potential power of the tools it facilitates.
Coverage
The dataset's geographic scope is global. It features works by prominent Bengali poets, with all included poems being entirely in the public domain, which suggests a historical collection. There are no specific notes regarding data availability limitations for particular groups or timeframes beyond the public domain status.
License
CC-BY-SA
Who Can Use It
This dataset is highly beneficial for data scientists, machine learning engineers, and researchers engaged in natural language processing, particularly those with a focus on the Bengali language. It is also suitable for developers building applications that require text generation, such as creative writing tools, or those looking to enhance existing language models with Bengali poetry.
Dataset Name Suggestions
- Free Bengali Poetry Dataset
- Bengali Public Domain Poetry
- Bengali Language Model Corpus
- Classical Bengali Poetry Collection
Attributes
Original Data Source: Free Bengali Poetry