A large amount of data is collected routinely in meat inspection in pig slaughterhouses. A time series clustering approach is presented and applied that groups farms based on similar statistical characteristics of meat inspection data over time. A three step characteristic-based clustering approach was used from the idea that the data contain more info than the incidence figures. A stratified subset containing 511,645 pigs was derived as a study set from 3.5 years of meat inspection data. The monthly averages of incidence of pleuritis and of pneumonia of 44 Dutch farms (delivering 5149 batches to 2 pig slaughterhouses) were subjected to 1) derivation of farm level data characteristics 2) factor analysis and 3) clustering into groups of farms. The characteristic-based clustering was able to cluster farms for both lung aberrations. Three groups of data characteristics were informative, describing incidence, time pattern and degree of autocorrelation. The consistency of clustering similar farms was confirmed by repetition of the analysis in a larger dataset. The robustness of the clustering was tested on a substantially extended dataset. This confirmed the earlier results, three data distribution aspects make up the majority of distinction between groups of farms and in these groups (clusters) the majority of the farms was allocated comparable to the earlier allocation (75% and 62% for pleuritis and pneumonia, respectively). The difference between pleuritis and pneumonia in their seasonal dependency was confirmed, supporting the biological relevance of the clustering. Comparison of the identified clusters of statistically comparable farms can be used to detect farm level risk factors causing the health aberrations beyond comparison on disease incidence and trend alone.
- Big data
- Characteristic-based clustering
- Meat inspection data
- Time series