Papers
Topics
Authors
Recent
Search
2000 character limit reached

Exploring Different Levels of Supervision for Detecting and Localizing Solar Panels on Remote Sensing Imagery

Published 19 Sep 2023 in cs.CV | (2309.10421v1)

Abstract: This study investigates object presence detection and localization in remote sensing imagery, focusing on solar panel recognition. We explore different levels of supervision, evaluating three models: a fully supervised object detector, a weakly supervised image classifier with CAM-based localization, and a minimally supervised anomaly detector. The classifier excels in binary presence detection (0.79 F1-score), while the object detector (0.72) offers precise localization. The anomaly detector requires more data for viable performance. Fusion of model results shows potential accuracy gains. CAM impacts localization modestly, with GradCAM, GradCAM++, and HiResCAM yielding superior results. Notably, the classifier remains robust with less data, in contrast to the object detector.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. IEA, “Electricity market report – update 2023,” https://www.iea.org/reports/electricity-market-report-update-2023, License: CC BY 4.0, Tech. Rep.
  2. ——, “Installation of about 600 million heat pumps covering 20buildings heating needs required by 2030,” https://www.iea.org/reports/installation-of-about-600-million-heat-pumps-covering-20-of-buildings-heating-needs-required-by-2030, License: CC BY 4.0, Tech. Rep.
  3. S. Dawn, V. Saxena, and B. Sharma, “Remote sensing image registration techniques: A survey,” in Proc. Image and Signal Processing (ICISP 2010).   Canada: Springer.
  4. C. Sager, C. Janiesch, and P. Zschech, “A survey of image labelling for computer vision applications,” Journal of Business Analytics, vol. 4, no. 2, pp. 91–110, 2021.
  5. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: Common Objects in Context,” in Proc. ECCV, 2014.
  6. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in Proc. IEEE CVPR, 2009.
  7. C. Fasana, S. Pasini, F. Milani, and P. Fraternali, “Weakly supervised object detection for remote sensing images: A survey,” Remote Sensing, vol. 14, no. 21, p. 5362, 2022.
  8. R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization,” in Proc. IEEE ICCV, 2017.
  9. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” NIPS, vol. 25, pp. 1097–1105, 2012.
  10. K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in Proc. IEEE CVPR, 2016, pp. 770–778.
  11. Z. Zou, K. Chen, Z. Shi, Y. Guo, and J. Ye, “Object detection in 20 years: A survey,” Proceedings of the IEEE, 2023.
  12. R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Region-based convolutional networks for accurate object detection and segmentation,” IEEE PAMI, vol. 38, no. 1, pp. 142–158, 2015.
  13. S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” in NIPS, vol. 28, 2015. [Online]. Available: https://proceedings.neurips.cc/paper/2015/file/14bfa6bb14875e45bba028a21ed38046-Paper.pdf
  14. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” in Proc. IEEE CVPR, 2016, pp. 779–788.
  15. J. Bastings and K. Filippova, “The elephant in the interpretability room: Why use attention as explanation when we have saliency methods?” arXiv preprint arXiv:2010.05607, 2020.
  16. C. Rudin, “Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead,” Nature machine intelligence, vol. 1, no. 5, pp. 206–215, 2019.
  17. A. Chattopadhay, A. Sarkar, P. Howlader, and V. N. Balasubramanian, “Grad-CAM++: Generalized Gradient-Based Visual Explanations for Deep Convolutional Networks,” in IEEE WACV.   IEEE, 2018, pp. 839–847.
  18. R. L. Draelos and L. Carin, “Use hirescam instead of grad-cam for faithful explanations of convolutional neural networks,” arXiv e-prints, pp. arXiv–2011, 2020.
  19. S. Srinivas and F. Fleuret, “Full-gradient representation for neural network visualization,” Advances in neural information processing systems, vol. 32, 2019.
  20. M. B. Muhammad and M. Yeasin, “Eigen-cam: Class activation map using principal components,” in Intl. Joint Conference on Neural Networks (IJCNN).   IEEE, 2020.
  21. J. Gildenblat and contributors, “Pytorch library for cam methods,” https://github.com/jacobgil/pytorch-grad-cam, 2021.
  22. F. Zhang, B. Du, L. Zhang, and M. Xu, “Weakly supervised learning based on coupled convolutional neural networks for aircraft detection,” IEEE Transactions on Geoscience and Remote Sensing, vol. 54, no. 9, pp. 5553–5563, 2016.
  23. X. Qian, C. Li, W. Wang, X. Yao, and G. Cheng, “Semantic segmentation guided pseudo label mining and instance re-detection for weakly supervised object detection in remote sensing images,” Intl. Journal of Applied Earth Observation and Geoinformation, vol. 119, 2023.
  24. D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.
  25. W. Chen, H. Xu, Z. Li, D. Pei, J. Chen, H. Qiao, Y. Feng, and Z. Wang, “Unsupervised anomaly detection for intricate kpis via adversarial training of vae,” in IEEE Conf. on Computer Communications (INFOCOM), 2019, pp. 1891–1899.
  26. J. Silva-Rodríguez, V. Naranjo, and J. Dolz, “Looking at the whole picture: constrained unsupervised anomaly segmentation,” arXiv preprint arXiv:2109.00482, 2021.
  27. T. Schlegl, P. Seeböck, S. M. Waldstein, G. Langs, and U. Schmidt-Erfurth, “f-anogan: Fast unsupervised anomaly detection with generative adversarial networks,” Medical image analysis, vol. 54, pp. 30–44, 2019.
  28. “Pytorch faster r-cnn implementation.”
  29. D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” International Conference on Learning Representations, 12 2014.
  30. “Pytorch resnet-50,” https://pytorch.org/vision/main/models/generated/torchvision.models.resnet50.
  31. K. Bradbury, R. Saboo, J. Malof, T. Johnson, A. Devarajan, W. Zhang, L. Collins, R. Newell, A. Streltsov, and W. Hu, “Distributed Solar Photovoltaic Array Location and Extent Data Set for Remote Sensing Object Identification,” 7 2020. [Online]. Available: https://figshare.com/articles/dataset/Distributed_Solar_Photovoltaic_Array_Location_and_Extent_Data_Set_for_Remote_Sensing_Object_Identification/3385780
  32. K. Bradbury, R. Saboo, T. L. Johnson, J. M. Malof, A. Devarajan, W. Zhang, L. M. Collins, and R. G. Newell, “Distributed solar photovoltaic array location and extent dataset for remote sensing object identification,” Scientific Data, vol. 3, no. 1, p. 160106, Dec 2016. [Online]. Available: https://doi.org/10.1038/sdata.2016.106
  33. J. Yu, Z. Wang, A. Majumdar, and R. Rajagopal, “Deepsolar: A machine learning framework to efficiently construct a solar deployment database in the united states,” Joule, vol. 2, no. 12, pp. 2605–2617, 2018. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2542435118305701
  34. “Is AQ or F score the last word in determining individual effort,” Journal of Educational Psychology, vol. 34, no. 9, pp. 513–525, 1943.
  35. O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” in Medical Image Computing and Computer-Assisted Intervention (MICCAI).   Springer, 2015, pp. 234–241.
Citations (1)

Summary

  • The paper shows that CAM-based classifiers achieve superior presence detection (F1=0.791) compared to fully supervised methods, offering scalable annotation efficiency.
  • It demonstrates that fully supervised object detection achieves the highest localization accuracy (DICE up to 0.810) with detailed box-level annotations using Faster R-CNN.
  • The study finds that anomaly detection underperforms due to high false positives, underscoring the need for improved normality modeling and potential fusion approaches.

Comparative Analysis of Supervision Levels for Solar Panel Detection and Localization in Remote Sensing Imagery

Introduction

The proliferation of remote sensing data enables scalable environmental assessments but necessitates efficient extraction of actionable information regarding objects of interest, such as solar panels. This work presents a rigorous comparative evaluation of three recognition frameworks—object detection, image classification utilizing CAM, and anomaly detection—each operating under distinct supervision regimes. The explicit focus is to quantify and analyze the trade-offs between label granularity, detection/localization accuracy, and annotation cost for solar panel recognition in high-resolution satellite imagery. Figure 1

Figure 1

Figure 1

Figure 1: Examples of remote sensing imagery with prominent distractors complicating object detection and localization.

Dataset and Preprocessing

The evaluation leverages the "Distributed Solar Photovoltaic Array Location and Extent Data Set for Remote Sensing Object Identification," composed of 601 RGB satellite images (0.33m GSD, 5000×50005000\times5000 pixels) from four Californian cities. Solar panel instances are polygonally annotated, affording fine-grained ground truth for localization. Preprocessing includes non-overlapping cropping into 200×200200\times200 pixel tiles, aggressive class balancing, polygon-to-box conversion for detection, and binary image labeling for classification tasks. Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 3: Dataset illustration showing the progression from large high-res images to tile-level annotations with and without solar panels.

Methods

Fully-Supervised Object Detection

The Faster R-CNN architecture receives box-level supervision, optimized using the Adam optimizer with weighted cross-entropy and Smooth L1 localization losses. Post-processing converts predictions into binary segmentation masks for metric computation.

Weakly-Supervised Classification with CAM

A ResNet-50-based classifier operates with binary image-level presence labels. Localization is approximated via several CAM variants (GradCAM, GradCAM++, HiResCAM, FullGrad, EigenCAM, EigenGradCAM), converting activation heatmaps to segmentation maps through thresholding.

Minimally-Supervised Anomaly Detection

A VAE is parameterized with a ResNet-50 encoder and a symmetric deconvolutional decoder. Trained solely on negatives (images without solar panels), anomaly maps are derived from reconstruction errors, post-processed using CAM for spatial localization of potential panels.

Evaluation Protocol

Presence detection uses F1-score, while localization utilizes DICE and IoU restricted to true positives. The analysis encompasses hyperparameter sensitivity, threshold calibrations, and systematic decrease in training data to probe data efficiency.

Results

Detection and Localization Performance

Classification with CAM yields superior presence detection (F1 = 0.791), even outperforming fully supervised detection (F1 = 0.720). Contrariwise, Faster R-CNN achieves the most precise localization (DICE = 0.722 with polygons, 0.810 with boxes), benefiting from spatially explicit supervision. Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 5: Localization results from the object detector, with both correct (a-c) and incorrect (d-f) predictions.

CAM-based classifiers, particularly with GradCAM++ and HiResCAM, localize solar panels with moderate accuracy (DICE ≈ 0.39). However, qualitative analysis reveals they reliably activate on solar panel regions and can focus only on a subset of panels if multiple instances exist per image—this is rooted in the global image-level supervision. Figure 6

Figure 6

Figure 6

Figure 6

Figure 6

Figure 6

Figure 7: Detection explanations via GradCAM for correct (a-c) and incorrect (d-f) classifications.

Figure 8

Figure 8

Figure 8

Figure 2: False positive classifier activations where strong visual similarities to solar panels result in errors likely to confound both models and humans.

Figure 9

Figure 9

Figure 9

Figure 9

Figure 9

Figure 9

Figure 4: CAM-based heatmap visualizations with diverse explanatory methods for imagery containing numerous solar panels.

The anomaly detector exhibits poor detection/localization (F1 = 0.168, DICE = 0.174) and considerable false positives, routinely flagging swimming pools and other rare objects as anomalies—reflecting the intrinsic ambiguity of one-class learning without positive exemplars. Figure 10

Figure 10

Figure 10

Figure 10

Figure 10

Figure 10

Figure 6: Anomaly detection heatmaps highlight both solar panels (a-c) and frequent spurious anomalies such as swimming pools (d-f).

Symmetry and Model Complementarity

Error analysis reveals that classification and detection often err on disparate samples, while anomaly detection predominantly fails independently of the former two. Error asymmetry analysis suggests complementary decision fusion could modestly enhance performance.

Data Efficiency

The classification model demonstrates remarkable data efficiency—retaining moderate accuracy with significantly reduced training data, whereas detection and anomaly models degrade rapidly below 40% of the data. At 40% data, classification attains F1 = 0.643, compared to 0.314 for detection and 0.140 for anomaly detection.

Computational Efficiency

Classification models train and evaluate substantially faster than detection or anomaly systems. The choice of CAM method directly impacts test-time cost—HiResCAM, EigenCAM, and FullGrad increase evaluation time significantly compared to GradCAM-derived explanations.

Implications and Future Directions

The empirical findings underscore the practical value of weakly supervised classification with CAM-based explainability for large-scale solar panel mapping—this approach substantially lowers annotation cost and can scale with less-labeled data, though at a trade-off of less precise localization. The modest influence of CAM variability indicates that choice among top methods (GradCAM, GradCAM++, HiResCAM) is less critical for remote sensing imagery.

Fully supervised detection remains preferable when fine localization is paramount, with the caveat of increased annotation and computational demands. Anomaly detection, without access to positive exemplars or fine-grained negative class constraints, underperforms and demands methodological advances (e.g., robust normality modeling, better out-of-distribution discrimination) to be viable for practical remote sensing.

Future exploration should consider semi-supervised or active learning to further reduce annotation burden, improved fusion strategies for leveraging complementary model strengths, and domain-adaptive explainability tailored for diverse object morphologies inherent in remote sensing.

Conclusion

This comparative study rigorously delineates the operational regimes, strengths, and limitations of detection, classification-with-CAM, and anomaly detection for solar panel identification in satellite imagery. Classification with CAMs offers a compelling balance of accuracy and annotation cost for detection-oriented tasks, while fully supervised detection provides the highest localization fidelity. The experiments reinforce the notion that supervision granularity should be dictated by application-driven localization needs and available labeling resources; nuanced multi-method approaches, potentially incorporating model fusion or semi-supervision, signify productive avenues for future research in scalable remote object recognition.


Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.