Machine learning to further improve the decision which boar ejaculates to process into artificial insemination doses

Claudia Kamphuis*, Pascal Duenk, Roel Franciscus Veerkamp, Bram Visser, Gurnoor Singh, Annette Nigsch, Rudi Maria De Mol, Marleen Leonarda Wilhelmina Johanna Broekhuijse

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Current artificial insemination (AI) laboratory practices assess semen quality of each boar ejaculate to decide which ones to process into AI doses. This decision is aided with two, world-wide used, motility parameters that come available through computer assisted semen analysis (CASA). This decision process, however, still results in AI doses with variable and sometimes suboptimal fertility outcomes (e.g., small litter size). The hypothesis was that the decision which ejaculates to process into AI doses can be improved by adding more data from CASA systems, and data from other sources, in combination with a data-driven model. Available data consisted of ejaculates that passed the initial decision, and thus, were processed into AI doses and used to inseminate sows. Data were divided into a training set (6793 records) and a validation set (1191 records) for model development, and an independent test set (1434 records) for performance assessment. Gradient Boosting Machine (GBM) models were developed to predict four fertility phenotypes of interest (gestation length, total number born, number born alive, and number of stillborn piglets). Each fertility phenotype was considered as a numeric and as a binary outcome parameter, totaling to eight different fertility phenotypes. Data used to further improve the decision process originated from four sources: 1) CASA information, 2) boar ejaculate information, 3) breeding value estimations, and 4) weather information. These data were used to create seven prediction sets, where each new set added parameters to the ones included in the previous set. The GBM models predicted fertility phenotypes with low correlations (for numeric phenotypes) and area under the curve values (for binary phenotypes) on the test data. Hence, results demonstrated that a combination of more data and GBM did not enable further improvement of the AI dose quality checks, resulting in the rejection of our hypothesis. However, our study revealed parameters affecting boar ejaculate fertility which were not used in today's decision process. These parameters (listed in the top 10 in at least four GBM models) included one parameter associated with boar ejaculate information, two with breeding value estimations, five with CASA information, and one with weather information. These parameters, therefore, should be further investigated for their potential value when assessing the quality of boar ejaculates in daily routine AI doses processing.

Original languageEnglish
Pages (from-to)112-121
Number of pages10
JournalTheriogenology
Volume144
DOIs
Publication statusPublished - Mar 2020

Fingerprint

Artificial Insemination
artificial intelligence
boars
artificial insemination
Semen Analysis
Fertility
semen
Phenotype
phenotype
dosage
Weather
breeding value
meteorological data
Breeding
Litter Size
Information Storage and Retrieval
Machine Learning
litter size
Area Under Curve
sows

Keywords

  • Boar semen
  • Fertility phenotypes
  • Machine learning
  • Prediction model

Cite this

@article{4ee709fb2d3d4ccfabe75dc0e14bdf53,
title = "Machine learning to further improve the decision which boar ejaculates to process into artificial insemination doses",
abstract = "Current artificial insemination (AI) laboratory practices assess semen quality of each boar ejaculate to decide which ones to process into AI doses. This decision is aided with two, world-wide used, motility parameters that come available through computer assisted semen analysis (CASA). This decision process, however, still results in AI doses with variable and sometimes suboptimal fertility outcomes (e.g., small litter size). The hypothesis was that the decision which ejaculates to process into AI doses can be improved by adding more data from CASA systems, and data from other sources, in combination with a data-driven model. Available data consisted of ejaculates that passed the initial decision, and thus, were processed into AI doses and used to inseminate sows. Data were divided into a training set (6793 records) and a validation set (1191 records) for model development, and an independent test set (1434 records) for performance assessment. Gradient Boosting Machine (GBM) models were developed to predict four fertility phenotypes of interest (gestation length, total number born, number born alive, and number of stillborn piglets). Each fertility phenotype was considered as a numeric and as a binary outcome parameter, totaling to eight different fertility phenotypes. Data used to further improve the decision process originated from four sources: 1) CASA information, 2) boar ejaculate information, 3) breeding value estimations, and 4) weather information. These data were used to create seven prediction sets, where each new set added parameters to the ones included in the previous set. The GBM models predicted fertility phenotypes with low correlations (for numeric phenotypes) and area under the curve values (for binary phenotypes) on the test data. Hence, results demonstrated that a combination of more data and GBM did not enable further improvement of the AI dose quality checks, resulting in the rejection of our hypothesis. However, our study revealed parameters affecting boar ejaculate fertility which were not used in today's decision process. These parameters (listed in the top 10 in at least four GBM models) included one parameter associated with boar ejaculate information, two with breeding value estimations, five with CASA information, and one with weather information. These parameters, therefore, should be further investigated for their potential value when assessing the quality of boar ejaculates in daily routine AI doses processing.",
keywords = "Boar semen, Fertility phenotypes, Machine learning, Prediction model",
author = "Claudia Kamphuis and Pascal Duenk and Veerkamp, {Roel Franciscus} and Bram Visser and Gurnoor Singh and Annette Nigsch and {De Mol}, {Rudi Maria} and Broekhuijse, {Marleen Leonarda Wilhelmina Johanna}",
year = "2020",
month = "3",
doi = "10.1016/j.theriogenology.2019.12.017",
language = "English",
volume = "144",
pages = "112--121",
journal = "Theriogenology",
issn = "0093-691X",
publisher = "Elsevier",

}

Machine learning to further improve the decision which boar ejaculates to process into artificial insemination doses. / Kamphuis, Claudia; Duenk, Pascal; Veerkamp, Roel Franciscus; Visser, Bram; Singh, Gurnoor; Nigsch, Annette; De Mol, Rudi Maria; Broekhuijse, Marleen Leonarda Wilhelmina Johanna.

In: Theriogenology, Vol. 144, 03.2020, p. 112-121.

Research output: Contribution to journalArticleAcademicpeer-review

TY - JOUR

T1 - Machine learning to further improve the decision which boar ejaculates to process into artificial insemination doses

AU - Kamphuis, Claudia

AU - Duenk, Pascal

AU - Veerkamp, Roel Franciscus

AU - Visser, Bram

AU - Singh, Gurnoor

AU - Nigsch, Annette

AU - De Mol, Rudi Maria

AU - Broekhuijse, Marleen Leonarda Wilhelmina Johanna

PY - 2020/3

Y1 - 2020/3

N2 - Current artificial insemination (AI) laboratory practices assess semen quality of each boar ejaculate to decide which ones to process into AI doses. This decision is aided with two, world-wide used, motility parameters that come available through computer assisted semen analysis (CASA). This decision process, however, still results in AI doses with variable and sometimes suboptimal fertility outcomes (e.g., small litter size). The hypothesis was that the decision which ejaculates to process into AI doses can be improved by adding more data from CASA systems, and data from other sources, in combination with a data-driven model. Available data consisted of ejaculates that passed the initial decision, and thus, were processed into AI doses and used to inseminate sows. Data were divided into a training set (6793 records) and a validation set (1191 records) for model development, and an independent test set (1434 records) for performance assessment. Gradient Boosting Machine (GBM) models were developed to predict four fertility phenotypes of interest (gestation length, total number born, number born alive, and number of stillborn piglets). Each fertility phenotype was considered as a numeric and as a binary outcome parameter, totaling to eight different fertility phenotypes. Data used to further improve the decision process originated from four sources: 1) CASA information, 2) boar ejaculate information, 3) breeding value estimations, and 4) weather information. These data were used to create seven prediction sets, where each new set added parameters to the ones included in the previous set. The GBM models predicted fertility phenotypes with low correlations (for numeric phenotypes) and area under the curve values (for binary phenotypes) on the test data. Hence, results demonstrated that a combination of more data and GBM did not enable further improvement of the AI dose quality checks, resulting in the rejection of our hypothesis. However, our study revealed parameters affecting boar ejaculate fertility which were not used in today's decision process. These parameters (listed in the top 10 in at least four GBM models) included one parameter associated with boar ejaculate information, two with breeding value estimations, five with CASA information, and one with weather information. These parameters, therefore, should be further investigated for their potential value when assessing the quality of boar ejaculates in daily routine AI doses processing.

AB - Current artificial insemination (AI) laboratory practices assess semen quality of each boar ejaculate to decide which ones to process into AI doses. This decision is aided with two, world-wide used, motility parameters that come available through computer assisted semen analysis (CASA). This decision process, however, still results in AI doses with variable and sometimes suboptimal fertility outcomes (e.g., small litter size). The hypothesis was that the decision which ejaculates to process into AI doses can be improved by adding more data from CASA systems, and data from other sources, in combination with a data-driven model. Available data consisted of ejaculates that passed the initial decision, and thus, were processed into AI doses and used to inseminate sows. Data were divided into a training set (6793 records) and a validation set (1191 records) for model development, and an independent test set (1434 records) for performance assessment. Gradient Boosting Machine (GBM) models were developed to predict four fertility phenotypes of interest (gestation length, total number born, number born alive, and number of stillborn piglets). Each fertility phenotype was considered as a numeric and as a binary outcome parameter, totaling to eight different fertility phenotypes. Data used to further improve the decision process originated from four sources: 1) CASA information, 2) boar ejaculate information, 3) breeding value estimations, and 4) weather information. These data were used to create seven prediction sets, where each new set added parameters to the ones included in the previous set. The GBM models predicted fertility phenotypes with low correlations (for numeric phenotypes) and area under the curve values (for binary phenotypes) on the test data. Hence, results demonstrated that a combination of more data and GBM did not enable further improvement of the AI dose quality checks, resulting in the rejection of our hypothesis. However, our study revealed parameters affecting boar ejaculate fertility which were not used in today's decision process. These parameters (listed in the top 10 in at least four GBM models) included one parameter associated with boar ejaculate information, two with breeding value estimations, five with CASA information, and one with weather information. These parameters, therefore, should be further investigated for their potential value when assessing the quality of boar ejaculates in daily routine AI doses processing.

KW - Boar semen

KW - Fertility phenotypes

KW - Machine learning

KW - Prediction model

U2 - 10.1016/j.theriogenology.2019.12.017

DO - 10.1016/j.theriogenology.2019.12.017

M3 - Article

VL - 144

SP - 112

EP - 121

JO - Theriogenology

JF - Theriogenology

SN - 0093-691X

ER -