Pantun Poem Corpus
Telecommunications & Network Data
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset is a collection of 440 unique Indonesian pantun poems, designed for various applications including natural language processing and linguistic studies. It offers a rich resource for understanding traditional Indonesian poetry and can be used to develop models for text generation or analysis. The poems are categorised by type, providing structured content for diverse research and development needs.
Columns
- teks: This column contains the full text of each individual pantun poem.
- tipe: This column indicates the specific type or category of the pantun, such as Pantun Cinta (Love Pantun) or Pantun Jenaka (Humourous Pantun). There are 18 distinct types represented within the collection.
Distribution
The dataset comprises 440 unique records, each representing a single Indonesian pantun. It is typically distributed in a CSV file format. The collection is well-structured, featuring 18 different pantun types. For instance, it includes 83 Pantun Cinta, 63 Pantun Jenaka, 43 Pantun Agama, 41 Pantun Nasihat, and 36 Pantun Teka-Teki, alongside other types. Specific row counts are not explicitly stated for all types but are provided for the major categories.
Usage
This dataset is ideally suited for:
- Natural Language Processing (NLP) research and model training, particularly for text generation, classification, and sentiment analysis tasks related to poetry.
- Linguistic studies focusing on Indonesian language, poetic structures, and cultural expressions.
- Educational purposes, providing authentic examples of Indonesian literature for students and scholars.
- Art and cultural projects, where insights into traditional poetry are valuable.
- AI and Machine Learning development, for creating applications that understand or generate poetic text.
Coverage
The dataset focuses exclusively on Indonesian pantun poems, reflecting the cultural and linguistic nuances of the region. There is no specific time range mentioned for the creation or collection of these poems, implying a timeless or general representation of the form. Demographic scope is primarily linguistic and cultural, without specific notes on data availability for particular groups. The overall region covered is global for accessibility.
License
CC0
Who Can Use It
- AI and ML Engineers: For developing and training models for text generation, understanding, and classification specific to poetic forms.
- Linguists and Academics: To conduct research on Indonesian language, poetic structures, and cultural contexts.
- Content Creators and Artists: To inspire new works or analyse traditional poetic forms.
- Educators: For teaching Indonesian language, literature, and cultural studies.
- Data Scientists: For exploring structured textual data and applying various analytical techniques.
Dataset Name Suggestions
- Indonesian Pantun Collection
- Pantun Poem Corpus
- Indonesian Traditional Poetry Dataset
- Pantun Anthology for NLP
- Indonesian Rhyme Scheme Dataset
Attributes
Original Data Source: Pantun Indonesia