Using a data lake in animal sciences

D. Schokker*, I.N. Athanasiadis, B. Visser, R.F. Veerkamp, C. Kamphuis

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference paperAcademicpeer-review

Abstract

In the livestock domain, Big Data is becoming more common and is being anchored into the mind-set of researchers. With the increasing availability of large amounts of data of varying nature, there is the challenge of how to store, combine, and analyse these data efficiently. With this study, we explored the possibility of using a data lake for storing and analysing sensor data, using an animal experiment as the use case, to improve scalability and interoperability. The use case was an experiment within Breed4Food (a public-private partnership), in which the gait score of 200 turkeys was determined. In the experiment, a gait score was traditionally assigned to each animal by a highly-skilled person who visually inspected them walking. Next to it, a set of sensor data streams was recorded for each animal, specifically inertial measurement units (IMUs), a 3D-video camera, and a force plate, with the ambition to explore the effectiveness of these data streams as predictors for estimating the gait score. The resulting sensor output, i.e. raw data, were successfully stored in its original format in the data lake. Subsequently, for each sensor output we performed extract, transform, and load activities, by executing custom-made scripts to generate tab or comma separated files. Lastly, by using Apache Spark it was possible to easily perform parallel processing of the data, allowing for fast computing. In conclusion, we managed to set up a data lake, load animal experimental data and run preliminary analyses. The data lake allowed for easy scale up of both data loading and analyses, which is desired for dynamic analyses pipelines, especially when more data are collected in the future.

Original languageEnglish
Title of host publicationPrecision Livestock Farming 2019
Subtitle of host publicationPapers Presented at the 9th European Conference on Precision Livestock Farming, ECPLF 2019
EditorsBernadette O'Brien, Deirdre Hennessy, Laurence Shalloo
Pages140-144
Number of pages5
ISBN (Electronic)9781841706542
Publication statusPublished - Aug 2019
Event9th European Conference on Precision Livestock Farming, ECPLF 2019 - Cork, Ireland
Duration: 26 Aug 201929 Aug 2019

Publication series

NamePrecision Livestock Farming 2019 - Papers Presented at the 9th European Conference on Precision Livestock Farming, ECPLF 2019

Conference

Conference9th European Conference on Precision Livestock Farming, ECPLF 2019
CountryIreland
CityCork
Period26/08/1929/08/19

Fingerprint

animal science
sensors (equipment)
gait
lakes
video cameras
animal experimentation
walking
laboratory animals
animals
livestock
researchers
extracts

Keywords

  • Animal experiment
  • Data lake
  • Scalability
  • Sensor data

Cite this

Schokker, D., Athanasiadis, I. N., Visser, B., Veerkamp, R. F., & Kamphuis, C. (2019). Using a data lake in animal sciences. In B. O'Brien, D. Hennessy, & L. Shalloo (Eds.), Precision Livestock Farming 2019: Papers Presented at the 9th European Conference on Precision Livestock Farming, ECPLF 2019 (pp. 140-144). (Precision Livestock Farming 2019 - Papers Presented at the 9th European Conference on Precision Livestock Farming, ECPLF 2019).
Schokker, D. ; Athanasiadis, I.N. ; Visser, B. ; Veerkamp, R.F. ; Kamphuis, C. / Using a data lake in animal sciences. Precision Livestock Farming 2019: Papers Presented at the 9th European Conference on Precision Livestock Farming, ECPLF 2019. editor / Bernadette O'Brien ; Deirdre Hennessy ; Laurence Shalloo. 2019. pp. 140-144 (Precision Livestock Farming 2019 - Papers Presented at the 9th European Conference on Precision Livestock Farming, ECPLF 2019).
@inproceedings{5133bcc63069427dbdb5402336c518ba,
title = "Using a data lake in animal sciences",
abstract = "In the livestock domain, Big Data is becoming more common and is being anchored into the mind-set of researchers. With the increasing availability of large amounts of data of varying nature, there is the challenge of how to store, combine, and analyse these data efficiently. With this study, we explored the possibility of using a data lake for storing and analysing sensor data, using an animal experiment as the use case, to improve scalability and interoperability. The use case was an experiment within Breed4Food (a public-private partnership), in which the gait score of 200 turkeys was determined. In the experiment, a gait score was traditionally assigned to each animal by a highly-skilled person who visually inspected them walking. Next to it, a set of sensor data streams was recorded for each animal, specifically inertial measurement units (IMUs), a 3D-video camera, and a force plate, with the ambition to explore the effectiveness of these data streams as predictors for estimating the gait score. The resulting sensor output, i.e. raw data, were successfully stored in its original format in the data lake. Subsequently, for each sensor output we performed extract, transform, and load activities, by executing custom-made scripts to generate tab or comma separated files. Lastly, by using Apache Spark it was possible to easily perform parallel processing of the data, allowing for fast computing. In conclusion, we managed to set up a data lake, load animal experimental data and run preliminary analyses. The data lake allowed for easy scale up of both data loading and analyses, which is desired for dynamic analyses pipelines, especially when more data are collected in the future.",
keywords = "Animal experiment, Data lake, Scalability, Sensor data",
author = "D. Schokker and I.N. Athanasiadis and B. Visser and R.F. Veerkamp and C. Kamphuis",
year = "2019",
month = "8",
language = "English",
series = "Precision Livestock Farming 2019 - Papers Presented at the 9th European Conference on Precision Livestock Farming, ECPLF 2019",
pages = "140--144",
editor = "Bernadette O'Brien and Deirdre Hennessy and Laurence Shalloo",
booktitle = "Precision Livestock Farming 2019",

}

Schokker, D, Athanasiadis, IN, Visser, B, Veerkamp, RF & Kamphuis, C 2019, Using a data lake in animal sciences. in B O'Brien, D Hennessy & L Shalloo (eds), Precision Livestock Farming 2019: Papers Presented at the 9th European Conference on Precision Livestock Farming, ECPLF 2019. Precision Livestock Farming 2019 - Papers Presented at the 9th European Conference on Precision Livestock Farming, ECPLF 2019, pp. 140-144, 9th European Conference on Precision Livestock Farming, ECPLF 2019, Cork, Ireland, 26/08/19.

Using a data lake in animal sciences. / Schokker, D.; Athanasiadis, I.N.; Visser, B.; Veerkamp, R.F.; Kamphuis, C.

Precision Livestock Farming 2019: Papers Presented at the 9th European Conference on Precision Livestock Farming, ECPLF 2019. ed. / Bernadette O'Brien; Deirdre Hennessy; Laurence Shalloo. 2019. p. 140-144 (Precision Livestock Farming 2019 - Papers Presented at the 9th European Conference on Precision Livestock Farming, ECPLF 2019).

Research output: Chapter in Book/Report/Conference proceedingConference paperAcademicpeer-review

TY - GEN

T1 - Using a data lake in animal sciences

AU - Schokker, D.

AU - Athanasiadis, I.N.

AU - Visser, B.

AU - Veerkamp, R.F.

AU - Kamphuis, C.

PY - 2019/8

Y1 - 2019/8

N2 - In the livestock domain, Big Data is becoming more common and is being anchored into the mind-set of researchers. With the increasing availability of large amounts of data of varying nature, there is the challenge of how to store, combine, and analyse these data efficiently. With this study, we explored the possibility of using a data lake for storing and analysing sensor data, using an animal experiment as the use case, to improve scalability and interoperability. The use case was an experiment within Breed4Food (a public-private partnership), in which the gait score of 200 turkeys was determined. In the experiment, a gait score was traditionally assigned to each animal by a highly-skilled person who visually inspected them walking. Next to it, a set of sensor data streams was recorded for each animal, specifically inertial measurement units (IMUs), a 3D-video camera, and a force plate, with the ambition to explore the effectiveness of these data streams as predictors for estimating the gait score. The resulting sensor output, i.e. raw data, were successfully stored in its original format in the data lake. Subsequently, for each sensor output we performed extract, transform, and load activities, by executing custom-made scripts to generate tab or comma separated files. Lastly, by using Apache Spark it was possible to easily perform parallel processing of the data, allowing for fast computing. In conclusion, we managed to set up a data lake, load animal experimental data and run preliminary analyses. The data lake allowed for easy scale up of both data loading and analyses, which is desired for dynamic analyses pipelines, especially when more data are collected in the future.

AB - In the livestock domain, Big Data is becoming more common and is being anchored into the mind-set of researchers. With the increasing availability of large amounts of data of varying nature, there is the challenge of how to store, combine, and analyse these data efficiently. With this study, we explored the possibility of using a data lake for storing and analysing sensor data, using an animal experiment as the use case, to improve scalability and interoperability. The use case was an experiment within Breed4Food (a public-private partnership), in which the gait score of 200 turkeys was determined. In the experiment, a gait score was traditionally assigned to each animal by a highly-skilled person who visually inspected them walking. Next to it, a set of sensor data streams was recorded for each animal, specifically inertial measurement units (IMUs), a 3D-video camera, and a force plate, with the ambition to explore the effectiveness of these data streams as predictors for estimating the gait score. The resulting sensor output, i.e. raw data, were successfully stored in its original format in the data lake. Subsequently, for each sensor output we performed extract, transform, and load activities, by executing custom-made scripts to generate tab or comma separated files. Lastly, by using Apache Spark it was possible to easily perform parallel processing of the data, allowing for fast computing. In conclusion, we managed to set up a data lake, load animal experimental data and run preliminary analyses. The data lake allowed for easy scale up of both data loading and analyses, which is desired for dynamic analyses pipelines, especially when more data are collected in the future.

KW - Animal experiment

KW - Data lake

KW - Scalability

KW - Sensor data

M3 - Conference paper

T3 - Precision Livestock Farming 2019 - Papers Presented at the 9th European Conference on Precision Livestock Farming, ECPLF 2019

SP - 140

EP - 144

BT - Precision Livestock Farming 2019

A2 - O'Brien, Bernadette

A2 - Hennessy, Deirdre

A2 - Shalloo, Laurence

ER -

Schokker D, Athanasiadis IN, Visser B, Veerkamp RF, Kamphuis C. Using a data lake in animal sciences. In O'Brien B, Hennessy D, Shalloo L, editors, Precision Livestock Farming 2019: Papers Presented at the 9th European Conference on Precision Livestock Farming, ECPLF 2019. 2019. p. 140-144. (Precision Livestock Farming 2019 - Papers Presented at the 9th European Conference on Precision Livestock Farming, ECPLF 2019).