Donor Response Prediction Dataset
Data Science and Analytics
Tags and Keywords
Trusted By



"No reviews yet"
Free
About
This dataset, titled Finding Donors for CharityML, was created to examine how potential donors react to new information regarding a charity's effectiveness. It originates from an experiment by Freedom from Hunger, which varied direct marketing letters to include or exclude details about their programme's impact, as measured by scientific research. While adding scientific impact information did not alter the average likelihood or amount of donations, it revealed notable differences in donor behaviour: large prior donors were more inclined to donate and gave more, whereas small prior donors were less likely to give. This pattern suggests the existence of two distinct donor types: "warm glow" donors, who may react negatively to analytical effectiveness information, and "altruism" donors, who respond positively. The dataset is currently used by Nancy for the Udacity ML Charity Competition and is primarily intended for students to practise data analysis on a sizeable dataset, despite potentially not being entirely up to date.
Columns
The dataset, presented as
census.csv
, contains 14 columns:- age: A continuous variable indicating age, ranging from 17 to 90 years. The mean age is approximately 38.5.
- workclass: A categorical variable detailing the type of employment, including values such as Private, Self-emp-not-inc, Federal-gov, Local-gov, State-gov, Without-pay, and Never-worked. Private is the most frequent category, accounting for 74%.
- education_level: A categorical variable representing the highest level of education achieved, such as Bachelors, Some-college, HS-grad, Masters, and Doctorate. HS-grad is the most common at 33%.
- education-num: A continuous numerical representation of the education level, ranging from 1 to 16, with a mean of 10.1.
- marital-status: A categorical variable describing the marital status, including Married-civ-spouse, Divorced, Never-married, Separated, and Widowed. Married-civ-spouse is the most prevalent status at 47%.
- occupation: A categorical variable indicating the individual's occupation, with categories like Tech-support, Craft-repair, Prof-specialty, Sales, and Exec-managerial. Both Craft-repair and Prof-specialty are commonly occurring occupations, each at 13%.
- relationship: A categorical variable describing the individual's family relationship, such as Wife, Own-child, Husband, Not-in-family, and Unmarried. Husband is the most frequent relationship at 41%.
- race: A categorical variable representing the individual's race, including Black, White, Asian-Pac-Islander, Amer-Indian-Eskimo, and Other. White is the dominant race at 86%.
- sex: A categorical variable indicating gender, with categories Female and Male. Male is more frequent at 68%.
- capital-gain: A continuous variable representing capital gains, with values ranging from 0 to 100k and a mean of 1.1k.
- capital-loss: A continuous variable representing capital losses, with values ranging from 0 to 4.36k and a mean of 88.6.
- hours-per-week: A continuous variable indicating the number of hours worked per week, ranging from 1 to 99 hours, with a mean of 40.9.
- native-country: A categorical variable listing the individual's native country, with 41 unique values including United-States, Mexico, Canada, and England. United-States is by far the most common, accounting for 91%.
- income: A continuous variable indicating income level, categorised as <=50K (75% of records) or >50K (25% of records).
All columns have 45.2 thousand valid records, with no mismatched or missing values reported.
Distribution
The dataset is provided as a CSV file (
census.csv
) and has a size of 5.36 MB. It is structured with 14 distinct columns and contains 45.2 thousand records. An annual update frequency is expected for this dataset.Usage
This dataset is ideal for machine learning and data analysis projects aimed at understanding and predicting donor behaviour. Specific applications include:
- Modelling donor responses to charity solicitations.
- Identifying and segmenting different donor types, such as "warm glow" versus "altruism" donors.
- Practising data analysis techniques, particularly for students.
- Developing predictive models for charity fundraising effectiveness.
- Research into social issues and advocacy, leveraging deep learning or statistical analysis methods.
Coverage
The dataset primarily focuses on demographic and socio-economic attributes of individuals, along with their capital gains/losses and work hours, which are used to infer income and donor propensity. While specific collecting years are mentioned as being known, they are not detailed in the available information, suggesting the data may not be current but remains valuable for training and analysis. The geographic scope is largely represented by the United States, given that 91% of individuals list it as their native country.
License
CC0 1.0 Public Domain
Who Can Use It
This dataset is particularly useful for:
- Students and educators seeking real-world data for data analysis and machine learning exercises.
- Machine learning practitioners participating in competitions or building predictive models related to philanthropy.
- Charity organisations and fundraising professionals interested in understanding donor motivations and optimising outreach strategies.
- Researchers in fields such as behavioural economics, social science, and non-profit studies.
Dataset Name Suggestions
- CharityML Donor Behaviour
- Donor Response Prediction Dataset
- Philanthropic Giving Study
- Charity Effectiveness Impact Data
- Freedom from Hunger Donor Experiment
Attributes
Original Data Source: Donor Response Prediction Dataset