Student Academic Achievement in Portugal
Education & Learning Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset is designed to predict student performance in secondary education (high school) within two Portuguese schools [1]. It incorporates a range of attributes including student grades, demographic, social, and school-related features [1]. The data was collected using school reports and questionnaires [1]. There are two distinct datasets available, one for Mathematics (mat) and another for Portuguese language (por) performance [1]. These datasets have been previously modelled for binary or five-level classification and regression tasks [1]. An important note is that the final year grade (G3) has a strong correlation with the first and second period grades (G1 and G2) [1]. While predicting G3 without G2 and G1 is more challenging, such a prediction is considered much more useful [1].
Columns
- school: Student's school (binary: 'GP' - Gabriel Pereira or 'MS' - Mousinho da Silveira) [2]
- sex: Student's sex (binary: 'F' - female or 'M' - male) [2]
- age: Student's age (numeric: from 15 to 22) [2]
- address: Student's home address type (binary: 'U' - urban or 'R' - rural) [2]
- famsize: Family size (binary: 'LE3' - less or equal to 3 or 'GT3' - greater than 3) [2]
- Pstatus: Parent's cohabitation status (binary: 'T' - living together or 'A' - apart) [2]
- Medu: Mother's education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education) [3]
- Fedu: Father's education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education) [3]
- Mjob: Mother's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other') [3]
- Fjob: Father's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other') [3]
- reason: Reason to choose this school (nominal: close to 'home', school 'reputation', 'course' preference or 'other') [4]
- guardian: Student's guardian (nominal: 'mother', 'father' or 'other') [4]
- traveltime: Home to school travel time (numeric: 1 - <15 min., 2 - 15 to 30 min., 3 - 30 min. to 1 hour, or 4 - >1 hour) [4]
- studytime: Weekly study time (numeric: 1 - <2 hours, 2 - 2 to 5 hours, 3 - 5 to 10 hours, or 4 - >10 hours) [4]
- failures: Number of past class failures (numeric: n if 1<=n<3, else 4) [4]
- schoolsup: Extra educational support (binary: yes or no) [4]
- famsup: Family educational support (binary: yes or no) [5]
- paid: Extra paid classes within the course subject (Math or Portuguese) (binary: yes or no) [5]
- activities: Extra-curricular activities (binary: yes or no) [5]
- nursery: Attended nursery school (binary: yes or no) [5]
- higher: Wants to take higher education (binary: yes or no) [5]
- internet: Internet access at home (binary: yes or no) [5]
- romantic: With a romantic relationship (binary: yes or no) [5]
- famrel: Quality of family relationships (numeric: from 1 - very bad to 5 - excellent) [5]
- freetime: Free time after school (numeric: from 1 - very low to 5 - very high) [6]
- goout: Going out with friends (numeric: from 1 - very low to 5 - very high) [6]
- Dalc: Workday alcohol consumption (numeric: from 1 - very low to 5 - very high) [6]
- Walc: Weekend alcohol consumption (numeric: from 1 - very low to 5 - very high) [6]
- health: Current health status (numeric: from 1 - very bad to 5 - very good) [6]
- absences: Number of school absences (numeric: from 0 to 93) [6]
- G1: First period grade (numeric: from 0 to 20) [7]
- G2: Second period grade (numeric: from 0 to 20) [7]
- G3: Final grade (numeric: from 0 to 20, output target) [7]
Distribution
The dataset is provided as two separate files:
Maths.csv
and Portuguese.csv
, corresponding to student performance in Mathematics and Portuguese language subjects respectively [1]. The Maths.csv
file has a size of 50.67 kB [7]. While specific row counts for the full datasets are not explicitly stated, the data indicates columns with hundreds of valid entries (e.g., 347 valid for one column, 116 for another) [8, 9], suggesting the dataset contains several hundred records. Data files are typically in CSV format [10].Usage
This dataset is ideal for predictive analytics in education, specifically for forecasting student performance [1]. It can be employed for:
- Developing classification models to predict whether a student will pass or fail, or to categorise performance into multiple levels (e.g., five-level classification) [1].
- Building regression models to predict exact final grades (G3) [1].
- Analysing the impact of demographic, social, and school-related factors on student achievement [1].
- Identifying students at risk of academic failure for early intervention strategies [1].
Coverage
The dataset covers secondary education students from two Portuguese schools: Gabriel Pereira (GP) and Mousinho da Silveira (MS) [1, 2]. The age range of students included is from 15 to 22 years [2]. The data encompasses a variety of attributes relating to the students' home life (address, family size, parental status, education, and jobs, family relationships), school life (reason for choice, guardian, travel time, study time, failures, extra support, extra-curricular activities), and personal habits (nursery attendance, desire for higher education, internet access, romantic relationships, free time, socialising, alcohol consumption, health, and absences) [2-6]. It also includes their grades for Mathematics and Portuguese language courses [7].
License
Attribution 4.0 International (CC BY 4.0)
Who Can Use It
This dataset is suitable for a diverse group of users, including:
- Data Scientists and Machine Learning Practitioners: For building and testing predictive models for student outcomes [1].
- Educational Researchers: To gain insights into factors influencing academic success and to study educational trends [1].
- Policy Makers and School Administrators: To inform decisions regarding curriculum development, resource allocation, and support programmes for students [1].
- Academics and Students: As a valuable resource for research projects, thesis work, and learning about educational data mining [1].
Dataset Name Suggestions
- Portuguese High School Performance
- Student Academic Achievement in Portugal
- Educational Outcome Prediction
- Portuguese Secondary Education Dataset
- Student Performance Factors
Attributes
Original Data Source: Student Academic Achievement in Portugal