Multi-Source Indonesia News 2024
Entertainment & Media Consumption
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset offers a snapshot of news articles from prominent Indonesian media outlets, Detik, Tempo, and Kompas. It captures key narratives and events from January 2024 to 5 September 2024. Each entry provides the news title, a direct link to the original article, the complete content, and specific categorisation tags to aid analysis. This collection is ideal as an educational resource for data science projects and for in-depth research into news trends and media consumption patterns.
Columns
- Judul: The title of the news article.
- Waktu: The date the news article was published.
- Link: A direct URL to the original news article.
- Content: The full text content of the news piece.
- tag1 to tag5: Up to five categorisation tags associated with the article, facilitating sorting and analysis.
- source: The name of the news outlet from which the article was aggregated (e.g., Detik, Tempo, Kompas).
Distribution
The dataset is typically provided in a CSV format. It aggregates articles from three distinct news sources: Detik, Tempo, and Kompas. It contains approximately 45,000 unique records, with a nearly even distribution of articles between Detik and Tempo, and a smaller proportion from the third source. Specific numbers for rows or records are reflected in the unique value counts for columns like 'Judul', 'Link', and 'Content', all of which are over 45,000.
Usage
This dataset is well-suited for various applications, including:
- Data science educational projects: Providing a practical dataset for students to learn data cleaning, analysis, and modelling.
- Research on news trends: Analysing evolving narratives, popular topics, and media biases over time.
- Natural Language Processing (NLP) tasks: Developing and testing models for text classification, summarisation, sentiment analysis, and topic modelling.
- Media studies: Investigating content patterns and reporting styles across different news outlets.
Coverage
The dataset's content is geographically focused on Indonesia, drawing from major national news outlets. It covers the time period from January 2024 to 5 September 2024, offering a defined temporal scope for analysis. There are no specific demographic notes beyond the general nature of news content from these sources.
License
CC-BY-NC
Who Can Use It
- Students: For academic projects, dissertations, and skill development in data science and NLP.
- Researchers: For studies on media, linguistics, social science, and current events in Indonesia.
- Data Scientists: To build and refine models for text analysis, information retrieval, and content categorisation.
- Journalists and Analysts: For historical context or trend analysis within Indonesian news.
Dataset Name Suggestions
- Indonesian News Archive 2024
- Multi-Source Indonesia News 2024
- Nusantara News Articles
- ID News Collection (Jan-Sep 2024)
Attributes
Original Data Source: Indonesia News Dataset (2024)