Predicting survival in dairy cattle using machine learning

Esther Margaretha Maria van der Heide

Research output: Thesisinternal PhD, WU


Although cows can live to be twenty years old, the average lifespan for a dairy cow is only five to six years. Improving the lifespan of dairy cows would have several benefits such as increasing farm profitability and reducing the environmental impact of milk production. However, the complexity of survival makes it difficult to improve this trait in practice. In this thesis, I proposed using phenotypic prediction of survival to select young cows for the dairy herd, improving survival through increased lifespan of selected cows and better heifer management. The aim of this thesis was to investigate if it was possible to predict survival phenotype accurately enough to be of use in selection. I investigated three different methods to predict survival: multiple logistic regression, random forest and naive Bayes. In chapters two to four of this thesis I predicted the survival trait “survival to second lactation” using all three aforementioned methods. In chapter five, I predicted the survival trait “number of parities reached” using only the random forest method. Random forest and naive Bayes proved the best methods for predicting survival to second lactation, although predictive performance overall was low. The correlations between predictions for individual cows were much lower than expected, which indicated that the models predicted individual cows differently. Therefore, in chapter four I investigated if combining the results into an ensemble could improve predictive performance. An ensemble using multiple logistic regression resulted in the largest increase in performance, although none of the explored ensemble methods improved performance consistently across datasets. I further investigated if there was a benefit in including genomic information or a farm-specific effect. In chapter two, I investigated the benefit of combining genomic and phenotypic information. Genomic breeding values especially improved the prediction of survival early in life, with breeding values for fertility and longevity remained informative even after first calving. In chapter five I tested several different methods to include a farm effect and described the advantages and disadvantages of the various approaches. The results of this thesis provide valuable insights in the challenges of predicting survival traits and the suitability of various (machine learning) methods for the prediction of survival in dairy cattle.

Original languageEnglish
QualificationDoctor of Philosophy
Awarding Institution
  • Wageningen University
  • Veerkamp, Roel, Promotor
  • Ducro, Bart, Co-promotor
  • Kamphuis, Claudia, Co-promotor
Award date11 Sept 2020
Place of PublicationWageningen
Print ISBNs9789463954273
Publication statusPublished - 11 Sept 2020


Dive into the research topics of 'Predicting survival in dairy cattle using machine learning'. Together they form a unique fingerprint.

Cite this