openHPI Course Modality Prediction
Education & Learning Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This curated collection offers insight into Massive Open Online Courses (MOOCs) hosted on the openHPI platform, provided by the Hasso Plattner Institute in Potsdam, Germany. The data’s core value lies in the manual classification of these courses into one of two categories: ‘Theory’ or ‘Hands-On’. The goal is to provide a reliable dataset for predicting course modality based on textual information, supporting the creation of machine learning solutions for classifying new course offerings. Notably, the content is multilingual, containing courses offered in both English and German.
Columns
The dataset structure includes four key fields:
- title: A concise label providing a short description of the course content.
- url: The direct link where the course can be accessed on the openHPI platform.
- description: A detailed summary explaining what the course is about.
- label: The assigned binary classification indicating whether the course is considered 'Theory' (60% of records) or 'Hands-On' (40% of records).
Distribution
The course details are supplied in a CSV file format and represent a fixed collection. The file contains exactly 100 records or rows of data. Its structure is suitable for immediate use in machine learning environments, and the overall file size is approximately 85 kilobytes. There are no missing or mismatched values reported across the fields.
Usage
This data is highly valuable for those working on binary classification problems using text features. Primary applications include:
- Training models to automatically categorise educational content based on titles and descriptions.
- Exploring the challenges of classification in a multilingual context (English and German).
- Developing recommender systems that filter courses based on learning style preference (practical versus abstract).
Coverage
The data covers courses provided exclusively by the openHPI platform. It is a static dataset, meaning there are no expected updates or changes to the records in the future. The scope includes course descriptions delivered in both the German and English languages.
License
CC0: Public Domain
Who Can Use It
Intended users include:
- Machine Learning Engineers: For developing and testing natural language processing (NLP) models.
- Academics and Researchers: For studying the characteristics of MOOC content and course design.
- Data Scientists: For examining how subjective human labelling aligns with objective text features.
Dataset Name Suggestions
- openHPI Course Modality Prediction
- MOOC Content Classification (Theory/Hands-On)
- HPI E-Learning Text Data
Attributes
Original Data Source: openHPI Course Modality Prediction
Loading...
