Opendatabay APP

Student Academic Achievement in Portugal

Education & Learning Analytics

Tags and Keywords

Student

Performance

Education

Grades

Portugal

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Student Academic Achievement in Portugal Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is designed to predict student performance in secondary education (high school) within two Portuguese schools [1]. It incorporates a range of attributes including student grades, demographic, social, and school-related features [1]. The data was collected using school reports and questionnaires [1]. There are two distinct datasets available, one for Mathematics (mat) and another for Portuguese language (por) performance [1]. These datasets have been previously modelled for binary or five-level classification and regression tasks [1]. An important note is that the final year grade (G3) has a strong correlation with the first and second period grades (G1 and G2) [1]. While predicting G3 without G2 and G1 is more challenging, such a prediction is considered much more useful [1].

Columns

  • school: Student's school (binary: 'GP' - Gabriel Pereira or 'MS' - Mousinho da Silveira) [2]
  • sex: Student's sex (binary: 'F' - female or 'M' - male) [2]
  • age: Student's age (numeric: from 15 to 22) [2]
  • address: Student's home address type (binary: 'U' - urban or 'R' - rural) [2]
  • famsize: Family size (binary: 'LE3' - less or equal to 3 or 'GT3' - greater than 3) [2]
  • Pstatus: Parent's cohabitation status (binary: 'T' - living together or 'A' - apart) [2]
  • Medu: Mother's education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education) [3]
  • Fedu: Father's education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education) [3]
  • Mjob: Mother's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other') [3]
  • Fjob: Father's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other') [3]
  • reason: Reason to choose this school (nominal: close to 'home', school 'reputation', 'course' preference or 'other') [4]
  • guardian: Student's guardian (nominal: 'mother', 'father' or 'other') [4]
  • traveltime: Home to school travel time (numeric: 1 - <15 min., 2 - 15 to 30 min., 3 - 30 min. to 1 hour, or 4 - >1 hour) [4]
  • studytime: Weekly study time (numeric: 1 - <2 hours, 2 - 2 to 5 hours, 3 - 5 to 10 hours, or 4 - >10 hours) [4]
  • failures: Number of past class failures (numeric: n if 1<=n<3, else 4) [4]
  • schoolsup: Extra educational support (binary: yes or no) [4]
  • famsup: Family educational support (binary: yes or no) [5]
  • paid: Extra paid classes within the course subject (Math or Portuguese) (binary: yes or no) [5]
  • activities: Extra-curricular activities (binary: yes or no) [5]
  • nursery: Attended nursery school (binary: yes or no) [5]
  • higher: Wants to take higher education (binary: yes or no) [5]
  • internet: Internet access at home (binary: yes or no) [5]
  • romantic: With a romantic relationship (binary: yes or no) [5]
  • famrel: Quality of family relationships (numeric: from 1 - very bad to 5 - excellent) [5]
  • freetime: Free time after school (numeric: from 1 - very low to 5 - very high) [6]
  • goout: Going out with friends (numeric: from 1 - very low to 5 - very high) [6]
  • Dalc: Workday alcohol consumption (numeric: from 1 - very low to 5 - very high) [6]
  • Walc: Weekend alcohol consumption (numeric: from 1 - very low to 5 - very high) [6]
  • health: Current health status (numeric: from 1 - very bad to 5 - very good) [6]
  • absences: Number of school absences (numeric: from 0 to 93) [6]
  • G1: First period grade (numeric: from 0 to 20) [7]
  • G2: Second period grade (numeric: from 0 to 20) [7]
  • G3: Final grade (numeric: from 0 to 20, output target) [7]

Distribution

The dataset is provided as two separate files: Maths.csv and Portuguese.csv, corresponding to student performance in Mathematics and Portuguese language subjects respectively [1]. The Maths.csv file has a size of 50.67 kB [7]. While specific row counts for the full datasets are not explicitly stated, the data indicates columns with hundreds of valid entries (e.g., 347 valid for one column, 116 for another) [8, 9], suggesting the dataset contains several hundred records. Data files are typically in CSV format [10].

Usage

This dataset is ideal for predictive analytics in education, specifically for forecasting student performance [1]. It can be employed for:
  • Developing classification models to predict whether a student will pass or fail, or to categorise performance into multiple levels (e.g., five-level classification) [1].
  • Building regression models to predict exact final grades (G3) [1].
  • Analysing the impact of demographic, social, and school-related factors on student achievement [1].
  • Identifying students at risk of academic failure for early intervention strategies [1].

Coverage

The dataset covers secondary education students from two Portuguese schools: Gabriel Pereira (GP) and Mousinho da Silveira (MS) [1, 2]. The age range of students included is from 15 to 22 years [2]. The data encompasses a variety of attributes relating to the students' home life (address, family size, parental status, education, and jobs, family relationships), school life (reason for choice, guardian, travel time, study time, failures, extra support, extra-curricular activities), and personal habits (nursery attendance, desire for higher education, internet access, romantic relationships, free time, socialising, alcohol consumption, health, and absences) [2-6]. It also includes their grades for Mathematics and Portuguese language courses [7].

License

Attribution 4.0 International (CC BY 4.0)

Who Can Use It

This dataset is suitable for a diverse group of users, including:
  • Data Scientists and Machine Learning Practitioners: For building and testing predictive models for student outcomes [1].
  • Educational Researchers: To gain insights into factors influencing academic success and to study educational trends [1].
  • Policy Makers and School Administrators: To inform decisions regarding curriculum development, resource allocation, and support programmes for students [1].
  • Academics and Students: As a valuable resource for research projects, thesis work, and learning about educational data mining [1].

Dataset Name Suggestions

  • Portuguese High School Performance
  • Student Academic Achievement in Portugal
  • Educational Outcome Prediction
  • Portuguese Secondary Education Dataset
  • Student Performance Factors

Attributes

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

14/07/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in ZIP Format