Automated object detectors on Unmanned Aerial Vehi-cles (UAVs) are increasingly employed for a wide rangeof tasks. However, to be accurate in their specific taskthey need expensive ground truth in the form of boundingboxes or positional information. Weakly-Supervised Ob-ject Detection (WSOD) overcomes this hindrance by local-izing objects with only image-level labels that are faster andcheaper to obtain, but is not on par with fully-supervisedmodels in terms of performance. In this study we proposeto combine both approaches in a model that is principallyapt for WSOD, but receives full position ground truth fora small number of images. Experiments show that withjust 1% of densely annotated images, but simple image-level counts as remaining ground truth, we effectively matchthe performance of fully-supervised models on a challeng-ing dataset with scarcely occurring wildlife on UAV imagesfrom the African savanna. As a result, with a very limitedamount of precise annotations our model can be trainedwith ground truth that is orders of magnitude cheaper andfaster to obtain while still providing the same detection per-formance.
|Title of host publication||2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)|
|Place of Publication||Long Beach, CA, USA|
|Publisher||Institute of Electrical and Electronics Engineers Inc.|
|Number of pages||9|
|Publication status||Published - 2019|