Higher Education Outcome Forecasting Dataset
Education & Learning Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This collection of student records is engineered to assist in the early identification of students at risk of academic failure or dropping out in higher education settings. The primary purpose is to contribute to the reduction of academic dropout rates by providing data suitable for machine learning techniques. The dataset captures crucial information known at the point of student enrollment, encompassing factors such as academic history, demographics, and socio-economic standing. It is specifically formulated to support a multi-class classification task, predicting student outcomes as either dropout, enrolled, or graduate at the end of the standard course duration. The creation of this material was supported by the SATDAP - Capacitação da Administração Pública programme in Portugal.
Columns (List and describe each column found in the 'Original Data Sample'.)
The dataset contains information where each instance represents a single student.
- id: A unique numerical identifier assigned to each student record.
- Target: The resulting outcome classification for the student, categorised as 'Graduate', 'Dropout', or 'Other' (Enrolled).
- Predictive Features: Additional features are included relating to the student’s academic path, demographic details, and social-economic factors, which are used as inputs for predictive models.
Distribution
The dataset currently consists of 51,000 valid records, with each record representing one student instance. The raw data underwent rigorous preprocessing to address anomalies, unexplainable outliers, and missing values before publication. Consequently, the dataset is verified to contain no missing values. The overall structure is suitable for machine learning training and testing, and a recommended data split suggests using 80% for training and 20% for testing. The dataset is expected to be updated weekly.
Usage
This data is ideal for developing predictive models designed to forecast student outcomes, primarily focusing on identifying early indicators of attrition risk. It is suitable for three-category classification challenges (Dropout, Enrolled, Graduate). The results derived from analysing this data can directly inform strategies implemented by educational institutions to support vulnerable students.
Coverage
The data focuses on students within higher education and includes detailed variables regarding their background known at the time of enrollment. Given the funding context, the data has a strong geographical scope linked to public administration and educational structures within Portugal. The scope includes academic records, demographic profiles, and various socio-economic inputs.
License
Attribution 4.0 International (CC BY 4.0)
Who Can Use It
- Academic Researchers: To study the dynamics of student success and failure in tertiary education and publish findings.
- Data Scientists and Machine Learning Engineers: To train and benchmark classification models aimed at prediction accuracy.
- University Administrators: To implement data-driven early warning systems and tailor student support programmes based on identified risk factors.
Dataset Name Suggestions
- Predict Students' Dropout from UCI
- Higher Education Outcome Forecasting Dataset
- Academic Success and Dropout Predictors
Attributes
Original Data Source: Higher Education Outcome Forecasting Dataset
Loading...
