a b s t r a c t This study explored the potential of using decision-tree induction to develop models for the detection of clinical mastitis with automatic milking. Sensor data (including electrical conductivity and colour) of over 711,000 quarter milkings were collected from December 2006 till August 2007 at six Dutch dairy herds milking automatically. Farmer recordings of quarter milkings with visible signs of mastitis were considered as gold standard positive cases (n = 97), quarter milkings that were recorded as being visually normal as gold standard negatives (n = 339). Randomly chosen quarter milkings that were not visually checked, that were outside a 2-week range before or after a gold standard positive case and that were not manually or automatically separated were added to end up with 3000 gold standard negatives. Decision trees, with varying confidence factors and cost matrices to study their effect on performance characteristics, were developed with the probability of having clinical mastitis for each quarter milking as output. Detection performance of decision trees was estimated using 10-fold cross-validation. Evaluated performance characteristics were the sensitivity and specificity, both calculated at a threshold value of 0.50 for the probability estimate for clinical mastitis. The transformed partial area under the curve was used to summarise the diagnostic ability of decision trees within a specified range of interest (specificity =97%). Receiver operating characteristic curves visualized all combinations of sensitivity and specificity of decision trees within this range. Results showed that decision trees are easy to interpret when visualised. The lower the confidence factor, the smaller the decision trees: a cost insensitive decision tree with a confidence factor of 0.05 needed only eleven test nodes to classify all 3097 records with a sensitivity of 23.7% and a specificity of 99.2%. The decision tree with default parameter settings showed a transformed partial area under the curve value of 0.6420. By introducing costs for false negative classifications this value increased to 0.6476. At a specificity level of 99%, the decision tree with the highest transformed partial area under the curve value showed a sensitivity of 29.8%. Detection performances of the different decision trees were comparable with those of models currently used by automatic milking systems. As it was possible to achieve these results with the use of a rather simple decision tree algorithm, we believe that decision tree induction shows potential for detecting clinical mastitis with automatic milking.
- detection model
- roc curve