Youth Alcohol and Study Impact
Not Specified
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset focuses on student achievement within the secondary education system of two distinct Portuguese schools. It captures a range of attributes including student grades, demographic details, social habits, and school-related features. The information was compiled from school reports and student questionnaires. The dataset provides two separate files, one for Mathematics (mat) course performance and another for Portuguese language (por) course performance. It is particularly valuable for modelling tasks, such as binary/five-level classification and regression, to understand factors influencing academic outcomes. The final year grade (G3) shows a strong relationship with the first (G1) and second (G2) period grades, as G3 is issued at the end of the academic year following G1 and G2.
Columns
- school: Indicates the student's school, with options 'GP' (Gabriel Pereira) or 'MS' (Mousinho da Silveira). For Maths.csv, 88% attend GP.
- sex: The student's sex, binary 'F' (female) or 'M' (male). For Maths.csv, 53% are female.
- age: The student's age, a numeric value ranging from 15 to 22. For Maths.csv, the mean age is 16.7.
- address: Type of student's home address, binary 'U' (urban) or 'R' (rural). For Maths.csv, 78% reside in urban areas.
- famsize: Family size, binary 'LE3' (less or equal to 3) or 'GT3' (greater than 3). For Maths.csv, 71% have family sizes greater than 3.
- Pstatus: Parents' cohabitation status, binary 'T' (living together) or 'A' (apart). For Maths.csv, 90% of parents live together.
- Medu: Mother's education level, numeric from 0 (none) to 4 (higher education). For Maths.csv, the mean is 2.75.
- Fedu: Father's education level, numeric from 0 (none) to 4 (higher education). For Maths.csv, the mean is 2.52.
- Mjob: Mother's job, nominal ('teacher', 'health' care, 'services', 'at_home', 'other'). For Maths.csv, 'other' is the most common at 36%.
- Fjob: Father's job, nominal ('teacher', 'health' care, 'services', 'at_home', 'other'). For Maths.csv, 'other' is the most common at 55%.
- reason: Reason for choosing the school, nominal ('home', 'reputation', 'course' preference, 'other'). For Maths.csv, 'course' preference is the most common at 37%.
- guardian: Student's guardian, nominal ('mother', 'father', 'other'). For Maths.csv, 'mother' is the most common at 69%.
- traveltime: Home to school travel time, numeric from 1 (<15 min) to 4 (>1 hour). For Maths.csv, the mean is 1.45, with 257 students travelling less than 15 minutes.
- studytime: Weekly study time, numeric from 1 (<2 hours) to 4 (>10 hours). For Maths.csv, the mean is 2.04, with 198 students studying 2 to 5 hours.
- failures: Number of past class failures, numeric ('n' if 1<=n<3, else 4). For Maths.csv, the mean is 0.33, with 312 students having 0 failures.
- schoolsup: Extra educational support, binary 'yes' or 'no'. For Maths.csv, 13% receive extra support.
- famsup: Family educational support, binary 'yes' or 'no'. For Maths.csv, 61% receive family support.
- paid: Extra paid classes for the course subject, binary 'yes' or 'no'. For Maths.csv, 46% attend paid classes.
- activities: Extra-curricular activities, binary 'yes' or 'no'. For Maths.csv, 51% participate in activities.
- nursery: Attended nursery school, binary 'yes' or 'no'. For Maths.csv, 79% attended nursery.
- higher: Wants to take higher education, binary 'yes' or 'no'. For Maths.csv, 95% desire higher education.
- internet: Internet access at home, binary 'yes' or 'no'. For Maths.csv, 83% have internet access.
- romantic: In a romantic relationship, binary 'yes' or 'no'. For Maths.csv, 33% are in a relationship.
- famrel: Quality of family relationships, numeric from 1 (very bad) to 5 (excellent). For Maths.csv, the mean is 3.94, with most ratings being 4 or 5.
- freetime: Free time after school, numeric from 1 (very low) to 5 (very high). For Maths.csv, the mean is 3.24, with most ratings being 3 or 4.
- goout: Frequency of going out with friends, numeric from 1 (very low) to 5 (very high). For Maths.csv, the mean is 3.11, with most ratings being 2, 3 or 4.
- Dalc: Workday alcohol consumption, numeric from 1 (very low) to 5 (very high). For Maths.csv, the mean is 1.48, with most students having very low consumption.
- Walc: Weekend alcohol consumption, numeric from 1 (very low) to 5 (very high). For Maths.csv, the mean is 2.29, with common values being 1, 2, and 3.
- health: Current health status, numeric from 1 (very bad) to 5 (very good). For Maths.csv, the mean is 3.55, with most ratings being 5.
- absences: Number of school absences, numeric from 0 to 93. For Maths.csv, the mean is 5.71, with a median of 4.
- G1: First period grade, numeric from 0 to 20. For Maths.csv, the mean is 10.9.
- G2: Second period grade, numeric from 0 to 20. For Maths.csv, the mean is 10.7.
- G3: Final grade, numeric from 0 to 20 (output target attribute). For Maths.csv, the mean is 10.4.
Distribution
The dataset is available in CSV file format, specifically as
Maths.csv
and Portuguese.csv
files. The Maths.csv
file is 42.38 kB in size and contains 395 records. The exact number of rows or records for the Portuguese.csv
file is not detailed in the provided information.Usage
This dataset is well-suited for a variety of analytical and machine learning applications. It can be used for:
- Predicting student academic performance (final grades - G3) based on a wide array of personal, social, and school-related factors.
- Investigating the correlation between alcohol consumption habits (workday and weekend) and academic outcomes.
- Building classification models to identify students at risk of academic failure or those likely to pursue higher education.
- Performing regression analysis to understand the influence of different attributes on grades.
- Conducting educational research to understand socio-economic impacts on learning and inform school policies or support programmes.
Coverage
The dataset covers students enrolled in secondary education across two specific schools in Portugal: Gabriel Pereira and Mousinho da Silveira. It includes demographic information such as age (15 to 22), sex, home address type (urban/rural), family size, and parental cohabitation status. Social and lifestyle factors like family relationships, free time, social outings, and internet access are also included. The data was collected through school reports and questionnaires, but a specific time range for data collection is not mentioned.
License
Attribution 4.0 International (CC BY 4.0)
Who Can Use It
- Education Professionals: For insights into student behaviour and academic drivers.
- Machine Learning Engineers: To develop predictive models for student success or classification tasks.
- Social Scientists: For exploring the impact of social and family backgrounds on educational attainment.
- Public Health Researchers: To study the relationship between alcohol consumption and academic performance.
- Data Analysts: For exploratory data analysis and visualising educational trends.
Dataset Name Suggestions
- Portuguese Student Performance Factors
- Student Academic Achievement in Portugal
- Secondary Education Performance Study
- Youth Alcohol and Study Impact
- Portuguese School Grades Dataset
Attributes
Original Data Source: Youth Alcohol and Study Impact