Prediction of metabolic status of dairy cows in early lactation with on-farm cow data and machine learning algorithms

Wei Xu, Ariette T.M. van Knegsel, Jacques J.M. Vervoort, Rupert M. Bruckmaier, Renny J. van Hoeij, Bas Kemp, Edoardo Saccenti*

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

1 Citation (Scopus)


Metabolic status of dairy cows in early lactation can be evaluated using the concentrations of plasma β-hydroxybutyrate (BHB), free fatty acids (FFA), glucose, insulin, and insulin-like growth factor 1 (IGF-1). These plasma metabolites and metabolic hormones, however, are difficult to measure on farm. Instead, easily obtained on-farm cow data, such as milk production traits, have the potential to predict metabolic status. Here we aimed (1) to investigate whether metabolic status of individual cows in early lactation could be clustered based on their plasma values and (2) to evaluate machine learning algorithms to predict metabolic status using on-farm cow data. Through lactation wk 1 to 7, plasma metabolites and metabolic hormones of 334 cows were measured weekly and used to cluster each cow into 1 of 3 clusters per week. The cluster with the greatest plasma BHB and FFA and the lowest plasma glucose, insulin, and IGF-1 was defined as poor metabolic status; the cluster with the lowest plasma BHB and FFA and the greatest plasma glucose, insulin, and IGF-1 was defined as good metabolic status; and the intermediate cluster was defined as average metabolic status. Most dairy cows were classified as having average or good metabolic status, and a limited number of cows had poor metabolic status (10–50 cows per lactation week). On-farm cow data, including dry period length, parity, milk production traits, and body weight, were used to predict good or average metabolic status with 8 machine learning algorithms. Random Forest (error rate ranging from 12.4 to 22.6%) and Support Vector Machine (SVM; error rate ranging from 12.4 to 20.9%) were the top 2 best-performing algorithms to predict metabolic status using on-farm cow data. Random Forest had a higher sensitivity (range: 67.8–82.9% during wk 1 to 7) and negative predictive value (range: 89.5–93.8%) but lower specificity (range: 76.7–88.5%) and positive predictive value (range: 58.1–78.4%) than SVM. In Random Forest, milk yield, fat yield, protein percentage, and lactose yield had important roles in prediction, but their rank of importance differed across lactation weeks. In conclusion, dairy cows could be clustered for metabolic status based on plasma metabolites and metabolic hormones. Moreover, on-farm cow data can predict cows in good or average metabolic status, with Random Forest and SVM performing best of all algorithms.

Original languageEnglish
Pages (from-to)10186-10201
JournalJournal of Dairy Science
Issue number11
Early online date30 Aug 2019
Publication statusPublished - Nov 2019



  • cattle
  • cluster analysis
  • energy metabolism
  • Random Forest

Cite this