Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Mobile App Crowdsourced Test Report Consistency Detection via Deep Image-and-Text Fusion Understanding (2108.07401v3)

Published 17 Aug 2021 in cs.SE

Abstract: Crowdsourced testing, as a distinct testing paradigm, has attracted much attention in software testing, especially in mobile application (app) testing field. Compared with in-house testing, crowdsourced testing shows superiority with the diverse testing environments when faced with the mobile testing fragmentation problem. However, crowdsourced testing also encounters the low-quality test report problem caused by unprofessional crowdworkers involved with different expertise. In order to handle the submitted reports of uneven quality, app developers have to distinguish high-quality reports from low-quality ones to help the bug inspection. One kind of typical low-quality test report is inconsistent test reports, which means the textual descriptions are not focusing on the attached bug-occurring screenshots. According to our empirical survey, only 18.07% crowdsourced test reports are consistent. Inconsistent reports cause waste on mobile app testing. To solve the inconsistency problem, we propose ReCoDe to detect the consistency of crowdsourced test reports via deep image-and-text fusion understanding. ReCoDe is a two-stage approach that first classifies the reports based on textual descriptions into different categories according to the bug feature. In the second stage, ReCoDe has a deep understanding of the GUI image features of the app screenshots and then applies different strategies to handle different types of bugs to detect the consistency of the crowdsourced test reports. We conduct an experiment on a dataset with over 22k test reports to evaluate ReCoDe, and the results show the effectiveness of ReCoDe in detecting the consistency of crowdsourced test reports. Besides, a user study is conducted to prove the practical value of ReCoDe in effectively helping app developers improve the efficiency of reviewing the crowdsourced test reports.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. Y. Feng, J. A. Jones, Z. Chen, and C. Fang, “Multi-objective test report prioritization using image understanding,” in Proceedings of the 2016 31st IEEE/ACM International Conference on Automated Software Engineering.   IEEE, 2016, pp. 202–213.
  2. S. Yu, C. Fang, Z. Cao, X. Wang, T. Li, and Z. Chen, “Prioritize crowdsourced test reports via deep screenshot understanding,” in Proceedings of the 2021 IEEE/ACM 43rd International Conference on Software Engineering.   IEEE, 2021, pp. 946–956.
  3. R. Gao, Y. Wang, Y. Feng, Z. Chen, and W. E. Wong, “Successes, challenges, and rethinking–an industrial investigation on crowdsourced mobile application testing,” Empirical Software Engineering, vol. 24, no. 2, pp. 537–561, 2019.
  4. Y. Feng, Z. Chen, J. A. Jones, C. Fang, and B. Xu, “Test report prioritization to assist crowdsourced testing,” in Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, 2015, pp. 225–236.
  5. J. Wang, M. Li, S. Wang, T. Menzies, and Q. Wang, “Images don’t lie: Duplicate crowdtesting reports detection with screenshot information,” Information and Software Technology, vol. 110, pp. 139–155, 2019.
  6. S. Yu, “Crowdsourced report generation via bug screenshot understanding,” in Proceedings of the 2019 34th IEEE/ACM International Conference on Automated Software Engineering.   IEEE, 2019, pp. 1277–1279.
  7. G. Li, N. Duan, Y. Fang, M. Gong, and D. Jiang, “Unicoder-vl: A universal encoder for vision and language by cross-modal pre-training,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, 2020, pp. 11 336–11 344.
  8. J. Lu, D. Batra, D. Parikh, and S. Lee, “Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks,” arXiv preprint arXiv:1908.02265, 2019.
  9. W. Su, X. Zhu, Y. Cao, B. Li, L. Lu, F. Wei, and J. Dai, “Vl-bert: Pre-training of generic visual-linguistic representations,” arXiv preprint arXiv:1908.08530, 2019.
  10. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
  11. S. Yu, C. Fang, Y. Yun, and Y. Feng, “Layout and image recognition driving cross-platform automated mobile testing,” in Proceedings of the 2021 IEEE/ACM 43rd International Conference on Software Engineering.   IEEE, 2021, pp. 1561–1571.
  12. L. Wei, Y. Liu, and S.-C. Cheung, “Taming android fragmentation: Characterizing and detecting compatibility issues for android apps,” in Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, 2016, pp. 226–237.
  13. N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “Smote: synthetic minority over-sampling technique,” Journal of artificial intelligence research, vol. 16, pp. 321–357, 2002.
  14. H. Han, W.-Y. Wang, and B.-H. Mao, “Borderline-smote: a new over-sampling method in imbalanced data sets learning,” in International conference on intelligent computing.   Springer, 2005, pp. 878–887.
  15. L. Shen, Z. Lin, and Q. Huang, “Relay backpropagation for effective learning of deep convolutional neural networks,” in European conference on computer vision.   Springer, 2016, pp. 467–482.
  16. P. Liu, L. Li, Y. Zhao, X. Sun, and J. Grundy, “Androzooopen: Collecting large-scale open source android apps for the research community,” in Proceedings of the 17th International Conference on Mining Software Repositories, 2020, pp. 548–552.
  17. T. Wendland, J. Sun, J. Mahmud, S. H. Mansur, S. Huang, K. Moran, J. Rubin, and M. Fazzini, “Andror2: A dataset of manually-reproduced bug reports for android apps,” in 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR).   IEEE, 2021, pp. 600–604.
  18. K. Allix, T. F. Bissyandé, J. Klein, and Y. Le Traon, “Androzoo: Collecting millions of android apps for the research community,” in 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).   IEEE, 2016, pp. 468–471.
  19. S. Scalabrino, G. Bavota, M. Linares-Vásquez, M. Lanza, and R. Oliveto, “Data-driven solutions to detect api compatibility issues in android: an empirical study,” in 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR).   IEEE, 2019, pp. 288–298.
  20. F.-X. Geiger, I. Malavolta, L. Pascarella, F. Palomba, D. Di Nucci, and A. Bacchelli, “A graph-based dataset of commit history of real-world android apps,” in Proceedings of the 15th International Conference on Mining Software Repositories, 2018, pp. 30–33.
  21. H. Wang, H. Li, L. Li, Y. Guo, and G. Xu, “Why are android apps removed from google play? a large-scale empirical study,” in 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR).   IEEE, 2018, pp. 231–242.
  22. Z. Liu, C. Chen, J. Wang, Y. Huang, J. Hu, and Q. Wang, “Owl eyes: Spotting ui display issues via visual understanding,” in Proceedings of the 2020 35th IEEE/ACM International Conference on Automated Software Engineering.   IEEE, 2020, pp. 398–409.
  23. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in neural information processing systems, 2017, pp. 5998–6008.
  24. V. Mnih, N. Heess, A. Graves et al., “Recurrent models of visual attention,” in Advances in neural information processing systems, 2014, pp. 2204–2212.
  25. M. Hu and B. Liu, “Mining and summarizing customer reviews,” in Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, 2004, pp. 168–177.
  26. B. Liu, M. Hu, and J. Cheng, “Opinion observer: analyzing and comparing opinions on the web,” in Proceedings of the 14th international conference on World Wide Web, 2005, pp. 342–351.
  27. T. Dozat and C. D. Manning, “Deep biaffine attention for neural dependency parsing,” arXiv preprint arXiv:1611.01734, 2016.
  28. S. Zhang, L. Wang, K. Sun, and X. Xiao, “A practical chinese dependency parser based on a large-scale dataset,” arXiv preprint arXiv:2009.00901, 2020.
  29. M. Zhu, Y. Zhang, W. Chen, M. Zhang, and J. Zhu, “Fast and accurate shift-reduce constituent parsing,” in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2013, pp. 434–443.
  30. I. Salman, A. T. Misirli, and N. Juristo, “Are students representatives of professionals in software engineering experiments?” in Proceedings of the 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, vol. 1.   IEEE, 2015, pp. 666–676.
  31. J. Wang, S. Wang, Q. Cui, and Q. Wang, “Local-based active classification of test report to assist crowdsourced testing,” in Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, 2016, pp. 190–201.
  32. O. Chaparro, C. Bernal-Cárdenas, J. Lu, K. Moran, A. Marcus, M. Di Penta, D. Poshyvanyk, and V. Ng, “Assessing the quality of the steps to reproduce in bug reports,” in Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2019, pp. 86–96.
  33. M. Xie, Q. Wang, G. Yang, and M. Li, “Cocoon: Crowdsourced testing quality maximization under context coverage constraint,” in Proceedings of the 2017 IEEE 28th International Symposium on Software Reliability Engineering.   IEEE, 2017, pp. 316–327.
  34. Q. Cui, S. Wang, J. Wang, Y. Hu, Q. Wang, and M. Li, “Multi-objective crowd worker selection in crowdsourced testing.” in SEKE, vol. 17, 2017, pp. 218–223.
  35. Q. Cui, J. Wang, G. Yang, M. Xie, Q. Wang, and M. Li, “Who should be selected to perform a task in crowdsourced testing?” in Proceedings of the 2017 IEEE 41st Annual Computer Software and Applications Conference, vol. 1.   IEEE, 2017, pp. 75–84.
  36. J. Wang, Y. Yang, S. Wang, Y. Hu, D. Wang, and Q. Wang, “Context-aware in-process crowdworker recommendation,” in Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, 2020, pp. 1535–1546.
  37. B. Jiang, Z. Zhang, W. K. Chan, and T. Tse, “Adaptive random test case prioritization,” in Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering.   IEEE, 2009, pp. 233–244.
  38. C. Sun, D. Lo, X. Wang, J. Jiang, and S.-C. Khoo, “A discriminative model approach for accurate duplicate bug report retrieval,” in Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1, 2010, pp. 45–54.
  39. A. Sureka and P. Jalote, “Detecting duplicate bug report using character n-gram-based features,” in Proceedings of the 2010 Asia Pacific Software Engineering Conference.   IEEE, 2010, pp. 366–374.
  40. A. T. Nguyen, T. T. Nguyen, T. N. Nguyen, D. Lo, and C. Sun, “Duplicate bug report detection with a combination of information retrieval and topic modeling,” in Proceedings of the 2012 27th IEEE/ACM International Conference on Automated Software Engineering.   IEEE, 2012, pp. 70–79.
  41. S. Banerjee, B. Cukic, and D. Adjeroh, “Automated duplicate bug report classification using subsequence matching,” in Proceedings of the 2012 IEEE 14th International Symposium on High-Assurance Systems Engineering.   IEEE, 2012, pp. 74–81.
  42. T. Prifti, S. Banerjee, and B. Cukic, “Detecting bug duplicate reports through local references,” in Proceedings of the 7th International Conference on Predictive Models in Software Engineering, 2011, pp. 1–9.
  43. C. Sun, D. Lo, S.-C. Khoo, and J. Jiang, “Towards more accurate retrieval of duplicate bug reports,” in 2011 26th IEEE/ACM International Conference on Automated Software Engineering.   IEEE, 2011, pp. 253–262.
  44. J. Zhou and H. Zhang, “Learning to rank duplicate bug reports,” in Proceedings of the 21st ACM international conference on Information and knowledge management, 2012, pp. 852–861.
  45. Y. Huang, J. Wang, S. Wang, Z. Liu, Y. Hu, and Q. Wang, “Quest for the golden approach: An experimental evaluation of duplicate crowdtesting reports detection,” in Proceedings of the 14th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, 2020, pp. 1–12.
  46. Y. Tian, D. Lo, and C. Sun, “Drone: Predicting priority of reported bugs by multi-factor analysis,” in Proceedings of the 2013 IEEE International Conference on Software Maintenance.   IEEE, 2013, pp. 200–209.
  47. S. Banerjee, Z. Syed, J. Helmick, and B. Cukic, “A fusion approach for classifying duplicate problem reports,” in Proceedings of the 2013 IEEE 24th International Symposium on Software Reliability Engineering.   IEEE, 2013, pp. 208–217.
  48. A. Alipour, A. Hindle, and E. Stroulia, “A contextual approach towards more accurate duplicate bug report detection,” in Proceedings of the 2013 10th Working Conference on Mining Software Repositories.   IEEE, 2013, pp. 183–192.
  49. J. Wang, Q. Cui, Q. Wang, and S. Wang, “Towards effectively test report classification to assist crowdsourced testing,” in Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, 2016, pp. 1–10.
  50. A. Hindle, A. Alipour, and E. Stroulia, “A contextual approach towards more accurate duplicate bug report detection and ranking,” Empirical Software Engineering, vol. 21, no. 2, pp. 368–410, 2016.
  51. M. Fazzini, M. Prammer, M. d’Amorim, and A. Orso, “Automatically translating bug reports into test cases for mobile apps,” in Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2018, pp. 141–152.
  52. Y. Zhao, T. Su, Y. Liu, W. Zheng, X. Wu, R. Kavuluru, W. G. Halfond, and T. Yu, “Recdroid+: Automated end-to-end crash reproduction from bug reports for android apps,” ACM Transactions on Software Engineering and Methodology (TOSEM), vol. 31, no. 3, pp. 1–33, 2022.
  53. T. A. Nguyen and C. Csallner, “Reverse engineering mobile application user interfaces with remaui (t),” in Proceedings of the 2016 IEEE/ACM International Conference on Automated Software Engineering, 2016.
  54. K. Moran, C. Bernal-Cárdenas, M. Curcio, R. Bonett, and D. Poshyvanyk, “Machine learning-based prototyping of graphical user interfaces for mobile apps,” IEEE Transactions on Software Engineering, vol. 46, no. 2, pp. 196–221, 2018.
  55. C. Chen, T. Su, G. Meng, Z. Xing, and Y. Liu, “From ui design image to gui skeleton: a neural machine translator to bootstrap mobile gui implementation,” in Proceedings of the 40th International Conference on Software Engineering.   ACM, 2018, pp. 665–676.
  56. X. Xiao, X. Wang, Z. Cao, H. Wang, and P. Gao, “Iconintent: automatic identification of sensitive ui widgets based on icon classification for android apps,” in Proceedings of the 2019 IEEE/ACM 41st International Conference on Software Engineering.   IEEE, 2019, pp. 257–268.
  57. J. Chen, C. Chen, Z. Xing, X. Xu, L. Zhut, G. Li, and J. Wang, “Unblind your apps: Predicting natural-language labels for mobile gui components by deep learning,” in Proceedings of the 2020 IEEE/ACM 42nd International Conference on Software Engineering.   IEEE, 2020, pp. 322–334.
  58. J. Chen, M. Xie, Z. Xing, C. Chen, X. Xu, L. Zhu, and G. Li, “Object detection for graphical user interface: old fashioned or deep learning or a combination?” in Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2020, pp. 1202–1214.
  59. S. Yu, C. Fang, Y. Feng, W. Zhao, and Z. Chen, “Lirat: Layout and image recognition driving automated mobile testing of cross-platform,” in 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).   IEEE, 2019, pp. 1066–1069.
  60. T. Xu, M. Pan, Y. Pei, G. Li, X. Zeng, T. Zhang, Y. Deng, and X. Li, “Guider: Gui structure and vision co-guided test script repair for android apps,” in Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2021, pp. 191–203.
  61. N. Cooper, C. Bernal-Cárdenas, O. Chaparro, K. Moran, and D. Poshyvanyk, “It takes two to tango: Combining visual and textual information for detecting duplicate video-based bug reports,” in Proceedings of the 2021 IEEE/ACM 43rd International Conference on Software Engineering.   IEEE, 2021, pp. 957–969.
  62. Z. Liu, C. Chen, J. Wang, Y. Huang, J. Hu, and Q. Wang, “Nighthawk: Fully automated localizing ui display issues via visual understanding,” IEEE Transactions on Software Engineering, 2022.
  63. S. Feng and C. Chen, “Gifdroid: Automated replay of visual bug reports for android apps,” in Proceedings of the 2022 IEEE/ACM 43rd International Conference on Software Engineering.   IEEE, 2022.
  64. Z. Liu, C. Chen, J. Wang, Y. Huang, J. Hu, and Q. Wang, “Guided bug crush: Assist manual gui testing of android apps via hint moves,” in Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Shengcheng Yu (16 papers)
  2. Chunrong Fang (71 papers)
  3. Quanjun Zhang (36 papers)
  4. Zhihao Cao (8 papers)
  5. Yexiao Yun (4 papers)
  6. Zhenfei Cao (2 papers)
  7. Kai Mei (30 papers)
  8. Zhenyu Chen (91 papers)
Citations (8)

Summary

We haven't generated a summary for this paper yet.