Opendatabay APP

Veterinary Horse Prognosis Dataset

Data Science and Analytics

Tags and Keywords

Health

Horse

Survival

Medical

Prediction

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Veterinary Horse Prognosis Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is designed for predicting whether a horse can survive based on its past medical conditions, specifically focusing on the 'outcome' variable [1]. It contains a variety of clinical and physical attributes that describe the horse's health status, presenting a classification problem suitable for machine learning models [1, 2]. The dataset includes detailed information on binary representations converted into readable words, although it contains a significant number of missing values (NA's), which poses a challenge for data analysis and model building [1].

Columns

  • surgery?: Indicates whether the horse underwent surgery (Yes) or was treated without surgery (No) [3, 4].
  • Age: Specifies if the horse is an Adult (over 6 months) or Young (under 6 months) [3, 4].
  • Hospital Number: A unique numeric identifier assigned to each horse case. Note that this may not be unique if a horse is treated multiple times [3, 5].
  • rectal temperature: Measures the horse's temperature in degrees Celsius (linear). Normal is 37.8°C; elevated temperatures suggest infection, while reduced temperatures may indicate late shock [3, 5]. This parameter typically changes as the problem progresses [3].
  • pulse: The heart rate in beats per minute (linear). A normal adult horse's pulse is 30-40 bpm; elevated rates suggest painful lesions or circulatory shock [6, 7].
  • respiratory rate: The breathing rate (linear). A normal rate is 8-10 breaths per minute, though its usefulness is debatable due to fluctuations [6, 8].
  • temperature of extremities: A subjective indicator of peripheral circulation, categorised as Normal, Warm, Cool, or Cold. Cool to cold extremities can suggest shock, while hot extremities should correlate with an elevated rectal temperature [6, 8, 9].
  • peripheral pulse: A subjective measure of pulse quality, categorised as normal, increased, reduced, or absent. Normal or increased pulses indicate adequate circulation, while reduced or absent suggest poor perfusion [9, 10].
  • mucous membranes: A subjective measurement of colour, with possible values including normal pink, bright pink, pale pink, pale cyanotic, bright red/injected, and dark cyanotic. These colours indicate different circulatory conditions, from normal to serious compromise or septicaemia [9-11].
  • capillary refill time: A clinical judgement indicating circulation. Values are < 3 seconds or >= 3 seconds, with longer refill times indicating poorer circulation [11, 12].
  • pain: A subjective judgement of the horse's pain level, ranging from alert (no pain) to continuous severe pain. This variable should not be treated as ordered or discrete, and higher pain levels often correlate with the need for surgery [11-13].
  • peristalsis: Indicates gut activity, with possible values including hypermotile, normal, hypomotile, or absent. Activity typically decreases with distension or toxicity [13, 14].
  • abdominal distension: An important parameter indicating abdominal swelling, categorised as none, slight, moderate, or severe. Severe distension often requires surgery to relieve pressure [13-15].
  • nasogastric tube: Refers to gas coming out of the tube (none, slight, significant). A large gas cap can cause discomfort [15, 16].
  • nasogastric reflux: Indicates the amount of fluid reflux, with values of none, > 1 litre, or < 1 litre. Greater reflux suggests a serious obstruction [17, 18].
  • nasogastric reflux PH: A linear scale from 0 to 14 (7 is neutral), with normal values in the 3-4 range [17, 18].
  • rectal examination - feces: Describes the state of feces during rectal examination (normal, increased, decreased, absent). Absent feces often indicate an obstruction [17, 19, 20].
  • abdomen: Describes findings from abdominal examination, including normal, other, firm feces in the large intestine, distended small intestine, or distended large intestine. Distension often indicates a surgical lesion [19, 20].
  • packed cell volume: The percentage of red blood cells by volume (linear). Normal range is 30-50; higher levels suggest compromised circulation or dehydration [19, 21].
  • total protein: Total protein levels in gms/dL (linear). Normal values are 6-7.5 gms/dL; higher values indicate greater dehydration [19, 21, 22].
  • abdomocentesis appearance: Describes the appearance of fluid obtained from the abdominal cavity (clear, cloudy, serosanguinous). Cloudy or serosanguinous fluid indicates a compromised gut [22, 23].
  • abdomcentesis total protein: Total protein levels in gms/dL from abdominal fluid (linear). Higher levels suggest a compromised gut [22, 23].
  • outcome: The final result for the horse: lived, died, or was euthanised [22, 24].
  • surgical lesion?: Retrospectively determines if the problem (lesion) was surgical (Yes or No). This value is always known [24, 25].
  • type of lesion (lesion_1, lesion_2, lesion_3): A composite of three numbers describing the lesion.
    • lesion_1 (site): Indicates the anatomical site of the lesion (e.g., gastric, sm intestine, lg colon, cecum, uterus, bladder, etc.) [25, 26].
    • lesion_2 (type): Specifies the type of lesion (simple, strangulation, inflammation, other) [26, 27].
    • lesion_3 (subtype): Details the subtype (mechanical, paralytic, n/a) and a specific code for the lesion (e.g., obturation, volvulus/torsion, hernia, displacement) [27, 28].
  • cp_data: Indicates if pathology data is present (Yes or No). This variable is noted as being of no significance as pathology data is not included for these cases [27, 29].

Distribution

This dataset is provided as a CSV file named horse.csv, with a file size of 53.42 kB. It comprises 28 distinct columns and contains 299 records or rows [2, 4, 5, 7, 8, 10, 12, 14, 16, 18, 20, 21, 23, 24, 26, 28, 29]. A significant characteristic is the presence of numerous missing values across several columns, requiring careful handling during analysis [1].

Usage

This dataset is ideal for classification problems, particularly for building models to predict horse survival [1, 2]. It can be used for tasks such as:
  • Developing and evaluating machine learning algorithms to predict the outcome (lived, died, euthanised) of horses based on medical conditions [2, 22].
  • Hyperparameter tuning and comparing evaluation metrics of various classification algorithms [2].
  • Practising data cleaning techniques, especially for handling missing values through imputation or other methods [1].

Coverage

The sources do not provide specific details on the geographic, time range, or demographic scope of the horses included in this dataset.

License

CC0: Public Domain

Who Can Use It

This dataset is suitable for:
  • Data scientists and machine learning practitioners interested in binary and multi-class classification problems.
  • Students and beginners in data science looking for a real-world dataset with missing values to practice data cleaning and model building [1, 2].
  • Veterinary researchers or professionals interested in predictive modelling for animal health outcomes.

Dataset Name Suggestions

  • Horse Survival Prediction Dataset
  • Equine Outcome Prediction Data
  • Horse Colic Survival Data
  • Veterinary Horse Prognosis Dataset

Attributes

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

03/08/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format