Animal Lifestyle and Traits Dataset
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset is an animal-focused collection designed for various analytical and machine learning tasks, including classification, regression, clustering, visualisation, and exploratory data analysis. It contains approximately 1000 records featuring data on three distinct cat breeds: Maine Coon, Ragdoll, and Angora. The information includes details such as breed, age, gender, body length, weight, fur colour and pattern, eye colour, sleeping and playing times, and geographical location (country, latitude, and longitude). The data was artificially generated and is available in both clean and dirty versions, making it suitable for data cleaning exercises.
Columns
- Breed: Specifies the cat's breed (Ragdoll, Maine Coon, or Angora). Ragdoll accounts for 41%, Maine Coon 32%, with others making up 27% of the records.
- Age_in_years: The cat's age, ranging from 0.08 to 11.3 years, with a mean of 4.85 years.
- Age_in_months: The cat's age in months, spanning 1 to 135 months, with a mean of 58.1 months.
- Gender: Indicates if the cat is male or female, with an equal distribution of 50% each.
- Neutered_or_spayed: A boolean field showing whether the cat has been neutered or spayed (58% true, 42% false).
- Body_length: The body length of the cat, ranging from 10.00 to 102.00, with a mean of 44.
- Weight: The cat's weight, from 0.50 to 12.10, with a mean of 5.49.
- Fur_colour_dominant: The primary fur colour, with 'seal' being most common at 28% and 'white' at 25%.
- Fur_pattern: Describes the fur pattern, with 'solid' being the most prevalent at 43% and 'colorpoint' at 32%.
- Eye_colour: The colour of the cat's eyes, with 'blue' being most common at 51% and 'yellow' at 24%.
- Allowed_outdoor: A boolean field indicating if the cat is allowed outdoors (9% true, 91% false).
- Preferred_food: The cat's preferred food type, either 'wet' (70%) or 'dry' (30%).
- Owner_play_time_minutes: The amount of time in minutes the owner plays with the cat, ranging from 0 to 60 minutes, with a mean of 23 minutes.
- Sleep_time_hours: The cat's sleep duration in hours, ranging from 8 to 22 hours, with a mean of 15.9 hours.
- Country: The country of residence, with USA (62%) and UK (13%) being the most frequent.
- Latitude: The geographical latitude of the cat's location, ranging from 37.8 to 53.8, with a mean of 44.4.
- Longitude: The geographical longitude of the cat's location, ranging from -123 to 13.4, with a mean of -60.2.
Distribution
The dataset is primarily available in a CSV format and comprises approximately 1000 items or records. Specifically, the clean version,
cat_breeds_clean.csv
, is 104.39 kB in size and contains 1071 valid records across its 17 columns. There are two versions: a clean version and a dirty version, specifically provided for data cleaning purposes.Usage
This dataset is ideally suited for:
- Machine Learning: Enabling tasks such as classification (e.g., breed identification), regression (e.g., predicting weight or age), and clustering.
- Exploratory Data Analysis (EDA): For gaining insights into cat characteristics and behaviours.
- Data Visualisation: To graphically represent trends and patterns within the feline data.
- Data Cleaning: The 'dirty' version offers a practical challenge for refining data preprocessing skills.
- Geospatial Analysis: Utilising the country, latitude, and longitude data to explore geographical distributions.
Coverage
- Geographic Scope: The dataset includes data from various countries, predominantly the USA (62%) and UK (13%), with corresponding latitude and longitude values provided.
- Demographic Scope: It covers three specific cat breeds: Maine Coon, Ragdoll, and Angora. Information on gender (male and female) and neutered/spayed status is also included.
- Time Range: While no specific time range for data collection is given, individual cat ages range from 0.08 to 11.3 years, providing an age distribution.
- Data Availability: The data is artificially generated, ensuring a balanced representation across different attributes such as gender and providing a controlled environment for analytical tasks.
License
CC0: Public Domain
Who Can Use It
This dataset is beneficial for a wide range of users, including:
- Data Scientists and Machine Learning Engineers for building and testing models (e.g., predicting cat health based on attributes).
- Data Analysts and Researchers for exploring animal behaviours, characteristics, and distributions.
- Students and Beginners in data science for learning data cleaning, EDA, visualisation, and fundamental ML techniques.
- Anyone interested in geospatial studies related to animal populations or characteristics.
Dataset Name Suggestions
- Feline Breed Characteristics Dataset
- Domestic Cat Attributes for ML
- Artificially Generated Cat Data
- Animal Lifestyle and Traits Dataset
- Cat Population Metrics
Attributes
Original Data Source: Animal Lifestyle and Traits Dataset