Indonesian Poetry Collection Dataset
Entertainment & Media Consumption
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides a collection of Indonesian poems, known as puisi, along with their respective titles and authors. It comprises 7,223 unique poems, meticulously scraped from a dedicated poetry website using BeautifulSoup. The title and author fields were subsequently extracted and parsed using regular expressions from the main poem text that included header information. This dataset is designed to support natural language processing (NLP) tasks, literary analysis, and the development of artificial intelligence and machine learning models focused on text generation or understanding in the Indonesian language.
Columns
- puisi: Contains the full text of the poem.
- title: Represents the title of the individual poem.
- author: Identifies the author of the poem.
- puisi_with_header: Includes the poem text combined with its title and author information as originally scraped.
Distribution
The dataset is typically provided in a CSV file format. It contains 7,223 distinct records, each representing a single poem with its associated metadata. Specific details regarding file size are not available, but the structure is consistent with tabular data.
Usage
This dataset is ideally suited for:
- Developing and training Natural Language Processing (NLP) models for text analysis, sentiment analysis, or language generation in Indonesian.
- Conducting literary research and studies on Indonesian poetry.
- Building AI applications that can generate or understand poetic text.
- Exploratory data analysis on linguistic patterns and themes within Indonesian literature.
Coverage
The dataset focuses exclusively on Indonesian puisi, providing a linguistic and cultural scope specific to Indonesia. While the source material is global, the content itself is language-specific. There are no explicit geographical, time range, or demographic notes on data availability for specific groups or years within the provided information.
License
CCO
Who Can Use It
- AI and Machine Learning Researchers: For training and evaluating models on Indonesian text data, particularly for creative text generation or understanding.
- Linguists and Literary Scholars: To analyse the structure, themes, and authorship of Indonesian poetry.
- Data Scientists: For projects involving text mining, natural language processing, or building recommendation systems based on literary content.
- Developers: Interested in integrating poetic data into applications or services.
Dataset Name Suggestions
- Indonesian Poetry Collection
- Puisi Indonesia Dataset
- Indonesian Poem Corpus
- Bahasa Indonesia Puisi Data
Attributes
Original Data Source: Puisi Indonesia