Secondary School Performance Factors
Education & Learning Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides detailed information on high school student performance, encompassing academic results alongside demographic, social, and parental data. It includes student achievement data collected from two Portuguese high schools through school reports and questionnaires. Specifically, it offers two distinct datasets focused on student performance in Mathematics and the Portuguese language. The original data has been cleaned to enhance readability and ease of use.
Columns
- student_id: A unique numerical identifier for each student within a specific subject dataset. (e.g., 1 to 395 for student_math_cleaned.csv)
- school: The student's school, a binary attribute indicating either "GP" (Gabriel Pereira) or "MS" (Mousinho da Silveira). (GP represents 88% of students in the math dataset, MS 12%).
- sex: The student's gender, a binary attribute "F" (female) or "M" (male). (Female students make up 53% in the math dataset).
- age: The student's age, a numerical value ranging from 15 to 22. (Average age is 16.7 years).
- address_type: The type of student's home address, either "Urban" or "Rural". (78% of students reside in urban areas).
- family_size: The size of the student's family, categorised as "Less or equal to 3" or "Greater than 3". (71% of students are from families larger than 3).
- parent_status: The cohabitation status of the student's parents, indicating "Living together" or "Apart". (90% of parents are living together).
- mother_education: The mother's education level, an ordinal attribute from "none" to "higher education". (33% of mothers have higher education).
- father_education: The father's education level, an ordinal attribute from "none" to "higher education". (29% of fathers have 5th to 9th grade education).
- mother_job: The mother's profession, a nominal attribute including "teacher", "health" care related, civil "services", "at_home", or "other". (36% of mothers have "other" jobs).
- father_job: The father's profession, a nominal attribute including "teacher", "health" care related, civil "services", "at_home", or "other". (55% of fathers have "other" jobs).
- reason: The student's reason for choosing their school, a nominal attribute such as "home" proximity, school "reputation", "course" preference, or "other". (37% chose for "course" preference).
- guardian: The student's guardian, a nominal attribute: "mother", "father", or "other". (69% of students have their mother as guardian).
- travel_time: The time taken to travel from home to school, an ordinal attribute ranging from "<15 min." to ">1 hour". (65% travel less than 15 minutes).
- study_time: Weekly study time, an ordinal attribute from "<2 hours" to ">10 hours". (50% study 2 to 5 hours weekly).
- class_failures: The number of past class failures, a numerical attribute. (Average failures is 0.33).
- school_support: A binary attribute indicating whether the student receives extra educational support from the school (yes/no). (87% do not receive school support).
- family_support: A binary attribute indicating whether the student receives educational support from family (yes/no). (61% receive family support).
- extra_paid_classes: A binary attribute indicating whether the student takes extra paid classes within the course subject (yes/no). (54% do not take extra paid classes).
- activities: A binary attribute indicating whether the student participates in extra-curricular activities (yes/no). (51% participate in activities).
- nursery: A binary attribute indicating whether the student attended nursery school (yes/no). (79% attended nursery school).
- higher_ed: A binary attribute indicating whether the student intends to pursue higher education (yes/no). (95% want to pursue higher education).
- internet: A binary attribute indicating whether the student has Internet access at home (yes/no). (83% have internet access).
- romantic_relationship: A binary attribute indicating whether the student is in a romantic relationship (yes/no). (67% are not in a romantic relationship).
- family_relationship: The quality of family relationships, a numerical scale from 1 (very bad) to 5 (excellent). (Average quality is 3.94).
- free_time: The amount of free time after school, a numerical scale from 1 (very low) to 5 (very high). (Average free time is 3.24).
- social: Frequency of going out with friends, a numerical scale from 1 (very low) to 5 (very high). (Average social activity is 3.11).
- weekday_alcohol: Workday alcohol consumption, a numerical scale from 1 (very low) to 5 (very high). (Average weekday consumption is 1.48).
- weekend_alcohol: Weekend alcohol consumption, a numerical scale from 1 (very low) to 5 (very high). (Average weekend consumption is 2.29).
- health: Current health status, a numerical scale from 1 (very bad) to 5 (very good). (Average health status is 3.55).
- absences: Number of school absences, a numerical value from 0 to 93. (Average absences are 5.71).
- grade_1: First period grade, a numerical value from 0 to 20. (Average grade is 10.9).
- grade_2: Second period grade, a numerical value from 0 to 20. (Average grade is 10.7).
- final_grade: Final grade, a numerical value from 0 to 20, serving as the target variable. (Average final grade is 10.4). This attribute is strongly correlated with grade_1 and grade_2 as it represents the final year grade.
Distribution
The dataset is provided in CSV format and includes two separate files:
student_math_cleaned.csv
and student_portuguese_cleaned.csv
. The student_math_cleaned.csv
file contains 395 records across 34 columns. Details regarding the exact number of rows for the Portuguese language dataset are not specified, but it's noted that 382 students are present in both datasets.Usage
This dataset is ideal for various applications, including:
- Predicting secondary school student performance based on a variety of social, demographic, and educational factors.
- Analysing the influence of non-academic factors on student grades and educational outcomes.
- Conducting demographic studies related to high school populations and their academic achievements.
- Developing educational policy recommendations by identifying key determinants of student success.
- Social science research exploring relationships between family background, lifestyle, and academic attainment.
Coverage
- Geographic: Data pertains to students from two high schools in Portugal: Gabriel Pereira and Mousinho da Silveira.
- Time Range: The data was collected and pertains to the year 2008.
- Demographic Scope: Students aged 15 to 22, with detailed information on sex, family characteristics (size, parental status, education, jobs), and various social attributes (free time, socialising, alcohol consumption, relationships).
- Data Availability Notes: While there are two distinct datasets for Maths and Portuguese, 382 students are common to both, though their unique identifiers do not directly match across files. These common students can be identified by matching their shared attributes.
License
Attribution 4.0 International (CC BY 4.0)
Who Can Use It
- Researchers and Academics: For studies on educational attainment, social determinants of health and performance, and data mining applications in education.
- Educators and School Administrators: To understand student demographics, identify at-risk students, and tailor support programmes.
- Data Scientists and Analysts: For developing predictive models for student performance and exploring complex relationships within educational data.
- Policymakers: To inform decisions regarding educational funding, support systems, and curriculum development.
Dataset Name Suggestions
- Portuguese High School Student Outcomes
- Secondary School Performance Factors
- Student Grades and Demographics Portugal
- Educational Success Determinants
- High School Student Analytics
Attributes
Original Data Source: Secondary School Performance Factors