TY - GEN
T1 - Using a data lake in animal sciences
AU - Schokker, D.
AU - Athanasiadis, I.N.
AU - Visser, B.
AU - Veerkamp, R.F.
AU - Kamphuis, C.
PY - 2019/8
Y1 - 2019/8
N2 - In the livestock domain, Big Data is becoming more common and is being anchored into the mind-set of researchers. With the increasing availability of large amounts of data of varying nature, there is the challenge of how to store, combine, and analyse these data efficiently. With this study, we explored the possibility of using a data lake for storing and analysing sensor data, using an animal experiment as the use case, to improve scalability and interoperability. The use case was an experiment within Breed4Food (a public-private partnership), in which the gait score of 200 turkeys was determined. In the experiment, a gait score was traditionally assigned to each animal by a highly-skilled person who visually inspected them walking. Next to it, a set of sensor data streams was recorded for each animal, specifically inertial measurement units (IMUs), a 3D-video camera, and a force plate, with the ambition to explore the effectiveness of these data streams as predictors for estimating the gait score. The resulting sensor output, i.e. raw data, were successfully stored in its original format in the data lake. Subsequently, for each sensor output we performed extract, transform, and load activities, by executing custom-made scripts to generate tab or comma separated files. Lastly, by using Apache Spark it was possible to easily perform parallel processing of the data, allowing for fast computing. In conclusion, we managed to set up a data lake, load animal experimental data and run preliminary analyses. The data lake allowed for easy scale up of both data loading and analyses, which is desired for dynamic analyses pipelines, especially when more data are collected in the future.
AB - In the livestock domain, Big Data is becoming more common and is being anchored into the mind-set of researchers. With the increasing availability of large amounts of data of varying nature, there is the challenge of how to store, combine, and analyse these data efficiently. With this study, we explored the possibility of using a data lake for storing and analysing sensor data, using an animal experiment as the use case, to improve scalability and interoperability. The use case was an experiment within Breed4Food (a public-private partnership), in which the gait score of 200 turkeys was determined. In the experiment, a gait score was traditionally assigned to each animal by a highly-skilled person who visually inspected them walking. Next to it, a set of sensor data streams was recorded for each animal, specifically inertial measurement units (IMUs), a 3D-video camera, and a force plate, with the ambition to explore the effectiveness of these data streams as predictors for estimating the gait score. The resulting sensor output, i.e. raw data, were successfully stored in its original format in the data lake. Subsequently, for each sensor output we performed extract, transform, and load activities, by executing custom-made scripts to generate tab or comma separated files. Lastly, by using Apache Spark it was possible to easily perform parallel processing of the data, allowing for fast computing. In conclusion, we managed to set up a data lake, load animal experimental data and run preliminary analyses. The data lake allowed for easy scale up of both data loading and analyses, which is desired for dynamic analyses pipelines, especially when more data are collected in the future.
KW - Animal experiment
KW - Data lake
KW - Scalability
KW - Sensor data
M3 - Conference paper
T3 - Precision Livestock Farming 2019 - Papers Presented at the 9th European Conference on Precision Livestock Farming, ECPLF 2019
SP - 140
EP - 144
BT - Precision Livestock Farming 2019
A2 - O'Brien, Bernadette
A2 - Hennessy, Deirdre
A2 - Shalloo, Laurence
T2 - 9th European Conference on Precision Livestock Farming, ECPLF 2019
Y2 - 26 August 2019 through 29 August 2019
ER -