Interactive machine vision for wildlife conservation

Benjamin Kellenberger

Research output: Thesisinternal PhD, WU


The loss rate of endangered animal species has reached levels that are critical enough for our time to be called the sixth mass extinction. Families of vertebrates and large mammals, such as Rhinocerotidae, are likely to become extinct in a few years unless countermeasures are taken. Before doing so, however, it is imperative to assess current animal population sizes through wildlife censuses. Furthermore, conservation efforts require animal populations to be monitored over time, which implies conducting census repetitions over multiple years.

Recent developments in technology have paved the way for animal census efforts of unprecedented accuracies and scales, predominantly through the employment of Unmanned Aerial Vehicles (UAVs). UAVs allow for acquiring aerial imagery of vast areas over e.g. a wildlife reserve, and thereby provide evidence of the abundance and location of individuals in a safe manner. Hitherto, the main challenge of UAV-enforced animal censuses has been the stage of manual photo-interpretation, in which animals have to be tediously identified and annotated by hand in potentially tens of thousands of aerial images.

To this end, automated image understanding through Machine Learning (ML) and Computer Vision (CV) provides exciting potential for accelerating applications that rely on large-scale datasets, such as image-based aerial animal censuses. Employing machines to detect animals could greatly reduce the efforts required by humans, and therefore lead to vastly increased efficiency in the census process overall.

This thesis aims at advancing wildlife conservation efforts by means of automated machine vision methodologies. In a first step, this entails finding new ways to optimize CV algorithms for the task of animal detection in UAV imagery. In a second step, it requires procedures to reuse such detection models for new image data in the context of census repetitions for population monitoring. However, the benefit of machine vision reaches beyond a mere automation of photo-interpretation: a recurrent key principle of this thesis is the concept of interactivity, where CV models and humans work hand-in-hand by reinforcing each other. The result is a census monitoring environment for UAV images, in which machine vision technology actively assists humans in the process. Effectively, when all methodologies proposed throughout this thesis are combined, human annotation efforts are reduced to a fraction, and further simplified in complexity.

Chapter 2 addresses the challenges of employing state-of-the-art CV models, known as Convolutional Neural Networks (CNNs), for aerial wildlife detection. Multiple heuristics are presented to train such models properly, all of which target different obstacles of the model training process. Experiments show a significant increase in animal prediction quality, if a CNN is optimized in an appropriate way.

Chapter 3 employs this CNN for reusage over new data acquisitions, e.g. in a census monitoring setting. Simply running the CNN over a new dataset to predict animals directly is often not possible, due to differences in characteristics between the datasets known as domain shifts. This chapter presents methodologies to adapt CNNs to new datasets with minimal effort possible, and employs humans in the process in an interactive manner to do so. Results show that less than half a percent of the images need to be reviewed by humans to find more than 80% of the animals in the new campaign.

In Chapter 4, human annotation efforts themselves are addressed and reduced in complexity. Traditional settings require human annotators to draw bounding boxes around animals, which may become prohibitively expensive for large image datasets. This chapter instead explores the concept of weakly-supervised object detection, where only simple presence/absence information of animals per image is requested from the annotators. Unlike bounding boxes, an image-wide annotation can be provided in a second. It was found that a CNN, trained on this simpler information alone, is already able to localize animals by itself to a certain degree. However, if spatial bounding boxes are added for just three training images, the CNN predicts animals with the same accuracy as its fully-supervised sibling from Chapter 2.

Finally, Chapter 5 combines all findings and models into an integrated census software environment, denoted as Annotation Interface that Does Everything (AIDE). To the best of the author’s knowledge, AIDE is the first software solution that explicitly integrates machine vision technology into the labeling process in an interactive manner: in AIDE, CNNs are used to predict animals in a large set of unlabeled data, and further learn directly from annotations provided by humans on the images. The result is a positive feedback loop where humans and machine reinforce each other. A conducted user study shows that machine vision support provides a four-fold increase in the number of animals found in a given time, compared to an unassisted annotation setting on the same dataset. At the time of writing, AIDE is actively employed by conservation agencies in Tanzania and under consideration by other forces around the globe for potential usage.

This thesis highlights the importance of interactive machine vision for wildlife conservation, and provides solutions that not only advance the field in a scientific context, but also have a direct impact on wildlife conservation through population monitoring.

Original languageEnglish
QualificationDoctor of Philosophy
Awarding Institution
  • Wageningen University
  • Tuia, Devis, Promotor
Award date6 Apr 2020
Place of PublicationWageningen
Print ISBNs9789463952736
Publication statusPublished - 6 Apr 2020


  • Cum laude


Dive into the research topics of 'Interactive machine vision for wildlife conservation'. Together they form a unique fingerprint.

Cite this