Towards Automatic Wild Animal Monitoring: Identification of Animal Species in Camera-trap Images using Very Deep Convolutional Neural Networks (1603.06169v2)

Published 20 Mar 2016 in cs.CV

Abstract: Non intrusive monitoring of animals in the wild is possible using camera trapping framework, which uses cameras triggered by sensors to take a burst of images of animals in their habitat. However camera trapping framework produces a high volume of data (in the order on thousands or millions of images), which must be analyzed by a human expert. In this work, a method for animal species identification in the wild using very deep convolutional neural networks is presented. Multiple versions of the Snapshot Serengeti dataset were used in order to probe the ability of the method to cope with different challenges that camera-trap images demand. The method reached 88.9% of accuracy in Top-1 and 98.1% in Top-5 in the evaluation set using a residual network topology. Also, the results show that the proposed method outperforms previous approximations and proves that recognition in camera-trap images can be automated.

Citations (284)

View on Semantic Scholar

Summary

The paper demonstrates that deep ConvNets effectively automate species classification in camera-trap images, reducing the need for manual analysis.
It employs state-of-the-art architectures like ResNet-101 with transfer learning and various dataset balancing techniques to tackle image unbalance.
The study reports Top-1 and Top-5 accuracies of 88.9% and 98.1% respectively on segmented images, underscoring deep learning's potential in conservation.

Automatic Wild Animal Monitoring Using Deep ConvNets: A Summary

The paper advances the domain of non-intrusive wildlife monitoring by applying deep learning for the automatic classification of animal species in images captured by camera traps. This approach is highly relevant due to the enormous volume of data generated by such traps, which traditionally necessitates manual analysis by experts. Here, deep Convolutional Neural Networks (ConvNets), specifically designed for image recognition tasks, provide a solution to this problem, demonstrating their applicability in ecological monitoring.

Methodology and Datasets

The research leverages very deep ConvNets to tackle the issue of animal identification in camera-trap images, adapting several state-of-the-art architectures, including ResNet-50, AlexNet, VGGNet, and GoogLeNet, among others. These networks are either fine-tuned from pre-trained models or used as feature extractors, employing transfer learning principles to enhance performance on the specific task of species classification.

The experiments are grounded on the robust Snapshot Serengeti dataset, featuring images captured in Tanzania, a rich database annotated by citizen scientists and experts. The paper highlights the dataset's unbalanced nature, revealing that such skewness challenges model performance. To address this, the researchers crafted various dataset versions: unbalanced ( $D1$ ), balanced ( $D2$ ), conditioned on the presence of animals in the foreground ( $D3$ ), and a manually segmented version ( $D4$ ).

Results

The experiments show that dataset $D4$ , manually segmented images, yields the highest accuracy with ResNet-101, achieved Top-1 and Top-5 accuracies of 88.9% and 98.1%, respectively. The model’s capacity to efficiently classify species from partial animal images underscores the strength of deep learning in dealing with low-quality input data typical of camera-trap datasets.

The researchers also tested their models on an additional dataset from Panama, comparing their results to a previous method. The findings consistently demonstrate the superiority of deeper ConvNet architectures over earlier approaches, substantiating that such complexity enhances the model's generalization abilities for camera-trap image recognition tasks.

Analysis and Implications

The paper draws attention to several issues inherent in camera-trap classification, such as fine-grained intra-class distinctions (e.g., between similar gazelle species), and the vital impact of image condition on classification accuracy. Furthermore, the results suggest that models require significant amounts of diverse and ideally balanced data, or robust segmentation preprocessing, for optimal performance.

A noteworthy point is the paper's demonstration of the robustness of deep learning models to potential annotation errors in the dataset, specific to instances where crowdsourced annotations are used. This is crucial for large-scale ecological data annotations and paves the way for leveraging citizen science for data processing without compromising accuracy.

Conclusion and Future Directions

The paper concludes that the camera-trap species recognition problem can be automated effectively using deep learning, contingent on adequate data preparation and model sophistication. For future work, the authors suggest improving species recognition by incorporating sequential image analysis, given that camera traps often capture bursts of images. Additionally, ongoing research aims to refine segmentation algorithms, addressing one of the critical preprocessing steps.

Overall, this research marks a significant stride towards automating wildlife monitoring processes, highlighting both the challenges and the promise of applying deep learning to ecological datasets. The exposure of this technology to complex real-world data sets exemplifies its potential to support biodiversity conservation efforts on a global scale.

PDF Markdown