Papers
Topics
Authors
Recent
Search
2000 character limit reached

Metamorphic Testing of Image Captioning Systems via Image-Level Reduction

Published 20 Nov 2023 in cs.SE | (2311.11791v3)

Abstract: The Image Captioning (IC) technique is widely used to describe images in natural language. Recently, some IC system testing methods have been proposed. However, these methods still rely on pre-annotated information and hence cannot really alleviate the oracle problem in testing. Besides, their method artificially manipulates objects, which may generate unreal images as test cases and thus lead to less meaningful testing results. Thirdly, existing methods have various requirements on the eligibility of source test cases, and hence cannot fully utilize the given images to perform testing. To tackle these issues, in this paper, we propose REIC to perform metamorphic testing for IC systems with some image-level reduction transformations like image cropping and stretching. Instead of relying on the pre-annotated information, REIC uses a localization method to align objects in the caption with corresponding objects in the image, and checks whether each object is correctly described or deleted in the caption after transformation. With the image-level reduction transformations, REIC does not artificially manipulate any objects and hence can avoid generating unreal follow-up images. Besides, it eliminates the requirement on the eligibility of source test cases in the metamorphic transformation process, as well as decreases the ambiguity and boosts the diversity among the follow-up test cases, which consequently enables testing to be performed on any test image and reveals more distinct valid violations. We employ REIC to test five popular IC systems. The results demonstrate that REIC can sufficiently leverage the provided test images to generate follow-up cases of good reality, and effectively detect a great number of distinct violations, without the need for any pre-annotated information.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. “The replication package for metaic and rome,” https://github.com/RobustNLP/TestIC.
  2. “Microsoft azure cognitive services.” https://azure.microsoft.com/en-us/services/cognitive-services, 2023.
  3. M. Stefanini, M. Cornia, L. Baraldi, S. Cascianelli, G. Fiameni, and R. Cucchiara, “From show to tell: A survey on deep learning-based image captioning,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 1, pp. 539–559, 2023.
  4. K. Xu, J. Ba, R. Kiros, K. Cho, A. C. Courville, R. Salakhutdinov, R. S. Zemel, and Y. Bengio, “Show, attend and tell: Neural image caption generation with visual attention,” CoRR, vol. abs/1502.03044, 2015.
  5. X. Li, X. Yin, C. Li, P. Zhang, q’a Xiaowei Hu, L. Zhang, L. Wang, H. Hu, L. Dong, F. Wei, Y. Choi, and J. Gao, “Oscar: Object-semantics aligned pre-training for vision-language tasks,” in Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XXX, ser. Lecture Notes in Computer Science, vol. 12375.   Springer, 2020, pp. 121–137.
  6. P. Zhang, X. Li, X. Hu, J. Yang, L. Zhang, L. Wang, Y. Choi, and J. Gao, “Vinvl: Revisiting visual representations in vision-language models,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021.   Computer Vision Foundation / IEEE, 2021, pp. 5579–5588.
  7. Y. Pan, T. Yao, Y. Li, and T. Mei, “X-linear attention networks for image captioning,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020.   Computer Vision Foundation / IEEE, 2020, pp. 10 968–10 977.
  8. P. Wang, A. Yang, R. Men, J. Lin, S. Bai, Z. Li, J. Ma, C. Zhou, J. Zhou, and H. Yang, “OFA: unifying architectures, tasks, and modalities through a simple sequence-to-sequence learning framework,” in International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, ser. Proceedings of Machine Learning Research, vol. 162.   PMLR, 2022, pp. 23 318–23 340.
  9. B. Yu, Z. Zhong, X. Qin, J. Yao, Y. Wang, and P. He, “Automated testing of image captioning systems,” in ISSTA ’22: 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, South Korea, July 18 - 22, 2022.   ACM, 2022, pp. 467–479.
  10. H. Ahsan, D. Bhatt, K. Shah, and N. Bhalla, “Multi-modal image captioning for the visually impaired,” in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, NAACL-HLT 2021, Online, June 6-11, 2021.   Association for Computational Linguistics, 2021, pp. 53–60.
  11. H. Sarhan and S. Hegelich, “Understanding and evaluating harms of ai-generated image captions in political images,” Frontiers in Political Science, 2023.
  12. T. Levinboim, A. V. Thapliyal, P. Sharma, and R. Soricut, “Quality estimation for image captions based on large-scale human evaluations,” in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, June 6-11, 2021.   Association for Computational Linguistics, 2021, pp. 3157–3166.
  13. P. L. Dognin, I. Melnyk, Y. Mroueh, I. Padhi, M. Rigotti, J. Ross, Y. Schiff, R. A. Young, and B. Belgodere, “Image captioning as an assistive technology: Lessons learned from vizwiz 2020 challenge,” J. Artif. Intell. Res., vol. 73, pp. 437–459, 2022.
  14. B. Yu, Z. Zhong, J. Li, Y. Yang, S. He, and P. He, “ROME: testing image captioning systems via recursive object melting,” in Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2023, Seattle, WA, USA, July 17-21, 2023.   ACM, 2023, pp. 766–778.
  15. E. T. Barr, M. Harman, P. McMinn, M. Shahbaz, and S. Yoo, “The oracle problem in software testing: A survey,” IEEE Trans. Software Eng., vol. 41, no. 5, pp. 507–525, 2015.
  16. T. Y. Chen, S. C. Cheung, and S. Yiu, “Metamorphic testing: A new approach for generating next test cases,” CoRR, vol. abs/2002.12543, 2020.
  17. J. Lu, C. Xiong, D. Parikh, and R. Socher, “Knowing when to look: Adaptive attention via a visual sentinel for image captioning,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017.   IEEE Computer Society, 2017, pp. 3242–3250.
  18. P. Anderson, X. He, C. Buehler, D. Teney, M. Johnson, S. Gould, and L. Zhang, “Bottom-up and top-down attention for image captioning and visual question answering,” in 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018.   Computer Vision Foundation / IEEE Computer Society, 2018, pp. 6077–6086.
  19. T. Yao, Y. Pan, Y. Li, and T. Mei, “Exploring visual relationship for image captioning,” in Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XIV, ser. Lecture Notes in Computer Science, vol. 11218.   Springer, 2018, pp. 711–727.
  20. J. Aneja, A. Deshpande, and A. G. Schwing, “Convolutional image captioning,” in 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018.   Computer Vision Foundation / IEEE Computer Society, 2018, pp. 5561–5570.
  21. S. Herdade, A. Kappeler, K. Boakye, and J. Soares, “Image captioning: Transforming objects into words,” in Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. B. Fox, and R. Garnett, Eds., 2019, pp. 11 135–11 145.
  22. L. Niu, W. Cong, L. Liu, Y. Hong, B. Zhang, J. Liang, and L. Zhang, “Making images real again: A comprehensive survey on deep image composition,” CoRR, vol. abs/2106.14490, 2021.
  23. R. Suvorov, E. Logacheva, A. Mashikhin, A. Remizova, A. Ashukha, A. Silvestrov, N. Kong, H. Goka, K. Park, and V. Lempitsky, “Resolution-robust large mask inpainting with fourier convolutions,” in IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2022, Waikoloa, HI, USA, January 3-8, 2022.   IEEE, 2022, pp. 3172–3182.
  24. T. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: common objects in context,” in Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V, ser. Lecture Notes in Computer Science, vol. 8693.   Springer, 2014, pp. 740–755.
  25. M. Everingham, L. V. Gool, C. K. I. Williams, J. M. Winn, and A. Zisserman, “The pascal visual object classes (VOC) challenge,” Int. J. Comput. Vis., vol. 88, no. 2, pp. 303–338, 2010.
  26. R. Carlson, H. Do, and A. Denton, “A clustering approach to improving test case prioritization: An industrial case study,” in IEEE 27th International Conference on Software Maintenance, ICSM 2011, Williamsburg, VA, USA, September 25-30, 2011.   IEEE Computer Society, 2011, pp. 382–391.
  27. M. Mahdieh, S. Mirian-Hosseinabadi, and M. Mahdieh, “Test case prioritization using test case diversification and fault-proneness estimations,” Autom. Softw. Eng., vol. 29, no. 2, p. 50, 2022.
  28. Z. Zhao, P. Zheng, S. Xu, and X. Wu, “Object detection with deep learning: A review,” IEEE Trans. Neural Networks Learn. Syst., vol. 30, no. 11, pp. 3212–3232, 2019.
  29. X. Han, J. Yang, H. Hu, L. Zhang, J. Gao, and P. Zhang, “Image scene graph generation (SGG) benchmark,” CoRR, vol. abs/2107.12604, 2021.
  30. R. Řehůřek and P. Sojka, “Software framework for topic modelling with large corpora,” in Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks.   ELRA, May 2010, pp. 45–50.
  31. E. Grave, P. Bojanowski, P. Gupta, A. Joulin, and T. Mikolov, “Learning word vectors for 157 languages,” in Proceedings of the Eleventh International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7-12, 2018.   European Language Resources Association (ELRA), 2018.
  32. R. Cann, C. Maienborn, K. Heusinger, and P. Portner, “Sense relations,” Semantics-Lexical Structures and Adjectives, pp. 172–200, 2019.
  33. G. A. Miller, “Wordnet: A lexical database for english,” Commun. ACM, vol. 38, no. 11, pp. 39–41, 1995.
  34. V. Petsiuk, A. Das, and K. Saenko, “RISE: randomized input sampling for explanation of black-box models,” in British Machine Vision Conference 2018, BMVC 2018, Newcastle, UK, September 3-6, 2018.   BMVA Press, 2018, p. 151.
  35. V. Petsiuk, R. Jain, V. Manjunatha, V. I. Morariu, A. Mehra, V. Ordonez, and K. Saenko, “Black-box explanation of object detectors via saliency maps,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021.   Computer Vision Foundation / IEEE, 2021, pp. 11 443–11 452.
  36. S. Segura, G. Fraser, A. B. Sánchez, and A. R. Cortés, “A survey on metamorphic testing,” IEEE Trans. Software Eng., vol. 42, no. 9, pp. 805–824, 2016.
  37. H. Spieker and A. Gotlieb, “Adaptive metamorphic testing with contextual bandits,” J. Syst. Softw., vol. 165, p. 110574, 2020.
  38. X. Xie, S. Jin, and S. Chen, “Qaasker++{}^{\mbox{+}}start_FLOATSUPERSCRIPT + end_FLOATSUPERSCRIPT: A novel testing method for question answering software via asking recursive questions,” Autom. Softw. Eng., vol. 30, no. 1, p. 14, 2023.
  39. X. Xie, P. Yin, and S. Chen, “Boosting the revealing of detected violations in deep learning testing: A diversity-guided method,” in 37th IEEE/ACM International Conference on Automated Software Engineering, ASE 2022, Rochester, MI, USA, October 10-14, 2022.   ACM, 2022, pp. 17:1–17:13.
  40. Z. Sun, J. M. Zhang, Y. Xiong, M. Harman, M. Papadakis, and L. Zhang, “Improving machine translation systems via isotopic replacement,” in 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022.   ACM, 2022, pp. 1181–1192.
  41. X. Xie, Y. Duan, S. Chen, and J. Xuan, “Towards the robustness of multiple object tracking systems,” in 33rd IEEE International Symposium on Software Reliability Engineering, ISSRE 2022, Charlotte, NC, USA, October 31 - Nov. 3, 2022.   IEEE, 2022, pp. 402–413.
  42. S. Chen, S. Jin, and X. Xie, “Validation on machine reading comprehension software without annotated labels: A property-based method,” in 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021, Athens, Greece, August 23-28, 2021.   ACM, 2021, pp. 590–602.
  43. J. Shao, “Testing object detection for autonomous driving systems via 3d reconstruction,” in 43rd IEEE/ACM International Conference on Software Engineering: Companion Proceedings, ICSE Companion 2021, Madrid, Spain, May 25-28, 2021.   IEEE, 2021, pp. 117–119.
  44. J. Huang, J. Zhang, W. Wang, P. He, Y. Su, and M. R. Lyu, “AEON: a method for automatic evaluation of NLP test cases,” in ISSTA ’22: 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, South Korea, July 18 - 22, 2022.   ACM, 2022, pp. 202–214.
  45. P. Qi, Y. Zhang, Y. Zhang, J. Bolton, and C. D. Manning, “Stanza: A python natural language processing toolkit for many human languages,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, ACL 2020, Online, July 5-10, 2020.   Association for Computational Linguistics, 2020, pp. 101–108.
  46. Y. Cao, C. Li, Y. Peng, and H. Ru, “MCS-YOLO: A multiscale object detection method for autonomous driving road environment recognition,” IEEE Access, vol. 11, pp. 22 342–22 354, 2023.
  47. K. Wei, T. Li, F. Huang, J. Chen, and Z. He, “Cancer classification with data augmentation based on generative adversarial networks,” Frontiers Comput. Sci., vol. 16, no. 2, p. 162601, 2022.
  48. J. M. Zhang, M. Harman, L. Ma, and Y. Liu, “Machine learning testing: Survey, landscapes and horizons,” IEEE Trans. Software Eng., vol. 48, no. 2, pp. 1–36, 2022.
  49. Y. Tian, K. Pei, S. Jana, and B. Ray, “Deeptest: automated testing of deep-neural-network-driven autonomous cars,” in Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018.   ACM, 2018, pp. 303–314.
  50. K. Pei, Y. Cao, J. Yang, and S. Jana, “Deepxplore: Automated whitebox testing of deep learning systems,” in Proceedings of the 26th Symposium on Operating Systems Principles, Shanghai, China, October 28-31, 2017.   ACM, 2017, pp. 1–18.
  51. Z. Zhang, P. Wang, H. Guo, Z. Wang, Y. Zhou, and Z. Huang, “Deepbackground: Metamorphic testing for deep-learning-driven image recognition systems accompanied by background-relevance,” Inf. Softw. Technol., vol. 140, p. 106701, 2021.
  52. V. Riccio, G. Jahangirova, A. Stocco, N. Humbatova, M. Weiss, and P. Tonella, “Testing machine learning based systems: a systematic mapping,” Empir. Softw. Eng., vol. 25, no. 6, pp. 5193–5254, 2020.
  53. J. Guo, Y. Jiang, Y. Zhao, Q. Chen, and J. Sun, “Dlfuzz: differential fuzzing testing of deep learning systems,” in Proceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL, USA, November 04-09, 2018.   ACM, 2018, pp. 739–743.
  54. A. Guo, Y. Feng, and Z. Chen, “Lirtest: augmenting lidar point clouds for automated testing of autonomous driving systems,” in ISSTA ’22: 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, South Korea, July 18 - 22, 2022.   ACM, 2022, pp. 480–492.
  55. S. Wang and Z. Su, “Metamorphic object insertion for testing object detection systems,” in 35th IEEE/ACM International Conference on Automated Software Engineering, ASE 2020, Melbourne, Australia, September 21-25, 2020.   IEEE, 2020, pp. 1053–1065.
  56. Y. Yuan, S. Wang, M. Jiang, and T. Y. Chen, “Perception matters: Detecting perception failures of VQA models using metamorphic testing,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021.   Computer Vision Foundation / IEEE, 2021, pp. 16 908–16 917.
  57. S. Chen, S. Jin, and X. Xie, “Testing your question answering software via asking recursively,” in 36th IEEE/ACM International Conference on Automated Software Engineering, ASE 2021, Melbourne, Australia, November 15-19, 2021.   IEEE, 2021, pp. 104–116.
  58. M. Zhang, Y. Zhang, L. Zhang, C. Liu, and S. Khurshid, “Deeproad: Gan-based metamorphic testing and input validation framework for autonomous driving systems,” in Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, September 3-7, 2018.   ACM, 2018, pp. 132–142.
  59. A. Dwarakanath, M. Ahuja, S. Sikand, R. M. Rao, R. P. J. C. Bose, N. Dubash, and S. Podder, “Identifying implementation bugs in machine learning based image classifiers using metamorphic testing,” in Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2018, Amsterdam, The Netherlands, July 16-21, 2018.   ACM, 2018, pp. 118–128.
  60. T. Baltrusaitis, C. Ahuja, and L. Morency, “Multimodal machine learning: A survey and taxonomy,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 2, pp. 423–443, 2019.
  61. P. Ji, Y. Feng, J. Liu, Z. Zhao, and Z. Chen, “Asrtest: automated testing for deep-neural-network-driven speech recognition systems,” in ISSTA ’22: 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, South Korea, July 18 - 22, 2022.   ACM, 2022, pp. 189–201.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.