Papers
Topics
Authors
Recent
Search
2000 character limit reached

On (Mis)perceptions of testing effectiveness: an empirical study

Published 11 Feb 2024 in cs.SE | (2402.07222v1)

Abstract: A recurring problem in software development is incorrect decision making on the techniques, methods and tools to be used. Mostly, these decisions are based on developers' perceptions about them. A factor influencing people's perceptions is past experience, but it is not the only one. In this research, we aim to discover how well the perceptions of the defect detection effectiveness of different techniques match their real effectiveness in the absence of prior experience. To do this, we conduct an empirical study plus a replication. During the original study, we conduct a controlled experiment with students applying two testing techniques and a code review technique. At the end of the experiment, they take a survey to find out which technique they perceive to be most effective. The results show that participants' perceptions are wrong and that this mismatch is costly in terms of quality. In order to gain further insight into the results, we replicate the controlled experiment and extend the survey to include questions about participants' opinions on the techniques and programs. The results of the replicated study confirm the findings of the original study and suggest that participants' perceptions might be based not on their opinions about complexity or preferences for techniques but on how well they think that they have applied the techniques.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. Altman, D.: Practial Statistics for Medical Research. Chapman and Hall (1991)
  2. In: Proceedings of Requirements Engineering for Software Quality (2002)
  3. The Canadian Journal of Statistics 27, 3–23 (1999)
  4. IEEE Transactions on Software Engineering 13(2), 1278–1296 (1987)
  5. Empirical Software Engineering 1(2), 133–164 (1996)
  6. International Thomson Computer Press (1990)
  7. Bhattacharya, P.: Quantitative decision-making in software engineering. Ph.D. thesis, University of California Riverside (2012)
  8. Software Engineering Journal pp. 43–51 (1992)
  9. Biffl, S.: Analysis of the impact of reading technique and inspector capability on individual inspection performance. In: 7th Asia-Pacific Software Engineering Conference, pp. 136–145 (2000)
  10. IEEE Transactions on Software Engineering 30(11), 770–793 (2004)
  11. Computers in Human Behavior 52, 373–378 (2015)
  12. Journal of Systems and Software 86(6), 1613–1637 (2013)
  13. Deak, A.: Understanding socio-technical factors influencing testers in software development organizations. In: 36th Annual Computer Software and Applications Conference (COMPSAC’12), pp. 438–441 (2012)
  14. In: Proceedings of the 38th international conference on software engineering, pp. 108–119 (2016)
  15. Software Quality Journal pp. 1–39 (2016)
  16. IEEE Transactions on Software Engineering 40(10), 1025–1041 (2014)
  17. Empirical Software Engineering (2017). DOI https://doi.org/10.1007/s10664-016-9471-3
  18. In: 24th International Conference on Software Engineering, p. 47–57 (2002)
  19. IEEE software 22(1), 58–65 (2005)
  20. Everitt, B.: The analysis of contingency tables. In: Monographs statistics and applied probability, 45. Chapman & Hall/CRC (2000)
  21. Empirical Software Engineering (2017). DOI https://doi.org/10.1007/s10664-017-9523-3
  22. Wiley & Sons (2003)
  23. In: Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering, EASE’17, pp. 65–69 (2017)
  24. In: Information Systems and Technologies (CISTI), 2017 12th Iberian Conference on, pp. 1–6 (2017)
  25. In: 2nd International Workshop on CrowdSourcing in Software Engineering (CSI-SE), pp. 32–37 (2015)
  26. Communication Methods and Measures 1, 77–89 (2007)
  27. In: Proceedings of the 16th International Conference on Software Engineering, pp. 191–200 (1994)
  28. Empirical Software Engineering 19(6), 1921–1955 (2014)
  29. In: Proceedings of the Fifth European Software Engineering Conference, pp. 84–89
  30. In: 8th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE’15), pp. 1–7 (2015)
  31. In: European, Mediterranean & Middle Eastern Conference on Information Systems (2010)
  32. pp. 637–642 (2009)
  33. Information and Software Technology 56(8), 973–990 (2014)
  34. Duxbury Thomson Learning (2000)
  35. Biometrics 33, 159–174 (1977)
  36. Linger, R.: Structured Programming: Theory and Practice (The Systems programming series). Addison-Wesley (1979)
  37. Empirical Software Engineering 11(1), 119–142 (2006)
  38. In: 9th International Conference on Software Engineering and Applications (ICSOFT-EA), pp. 305–314 (2014)
  39. IEEE Transactions on Software Engineering 41(6), 545–564 (2015)
  40. Myers, G.: A controlled experiment in program testing and code walkthroughs/inspections. Communications of the ACM 21(9), 760–768 (1978)
  41. Wiley-Interscience (2004)
  42. Empirical Software Engineering 20(6), 1898–1917 (2015)
  43. ACM Transactions on Software Engineering and Methodology 5(2), 99–118 (1996)
  44. IEEE Transactions on Software Engineering 20(5), 337–344 (1994)
  45. IEEE Transactions on Software Engineering 21(6), 563–575 (1995)
  46. Information and Software Technology 39, 763–775 (1997)
  47. Empirical Software Engineering 13, 211–218 (2008)
  48. Empirical Software Engineering 9, 77–110 (2004)
  49. Empirical Software Engineering 10(4), 437–466 (2005)
  50. IEEE Transactions on Software Engineering 35(4), 551–565 (2009)
  51. Weyuker, E.: The complexity of data flow criteria for test data selection. Information Processing Letters 19(2), 103–109 (1984)
  52. Springer (2014)
  53. Software Quality Journal 4, 69–83 (1995)
  54. BMC Medical Research Methodology 16(93) (2016)
  55. Series on Software Engineering and Knowledge Engineering 12, 229–263 (2003)
Citations (2)

Summary

  • The paper reveals a 31 percentage point defect detection gap due to misaligned perceptions of testing techniques among developers.
  • The study employs a controlled experiment with students using Equivalence Partitioning, Branch Testing, and Code Reading to compare perceived versus actual effectiveness.
  • The findings advocate for integrating empirical feedback tools and enhanced training to mitigate bias in selecting testing techniques.

An Empirical Study on Developers' Perceptions and the Reality of Testing Techniques

Introduction

The paper "On (Mis)perceptions of testing effectiveness: an empirical study" (2402.07222) addresses a critical issue in software development – the accuracy of developers' perceptions regarding the effectiveness of testing techniques. By examining the extent to which these perceptions align with actual efficacy, the study aims to mitigate potential decision-making errors that compromise software quality. A controlled experiment with students, followed by a replication, is used to investigate how preconceived notions of testing effectiveness impact the application of defect detection techniques.

Methodology

The study employs a controlled experiment involving a cohort of computer science students to avoid bias from prior professional experience. Participants applied two testing techniques (Equivalence Partitioning and Branch Testing) and a code review technique (Code Reading by Stepwise Abstraction) to software artifacts. Their perceived effectiveness was gauged through a survey comparing it against actual effectiveness measured by defect detection rates. The experiment was structured to prevent order and individual variance effects via a crossover design.

Findings

Perceptions and Reality

The results reveal a significant disconnect between perceived and actual technique effectiveness. A striking observation is that 50% of participants possess incorrect perceptions of effectiveness. This misalignment results in a tangible adverse impact on defect detection efficacy, averaging a 31 percentage point reduction. Notably, the study finds no bias towards any specific technique, indicating that perceptions are broadly unreliable and vary among individuals.

Opinions and Bias

Further analyses explored the potential drivers of these misperceptions, focusing on participant opinions. Surprisingly, the technique preferences (EP is preferred) and perceived complexity did not correlate with actual effectiveness. Instead, perceptions seem influenced by participants’ self-assessed performance, underscoring a psychological tendency to equate perceived thorough application with real efficacy. Bias was identifiable for Equivalence Partitioning, a technique favored despite not consistently being the most effective.

Implications and Recommendations

This study highlights critical implications for both novice developers and the broader software engineering community. Developers should be cautious of relying on personal judgment to select testing techniques, as perceptions are not reliable indicators of technique effectiveness. The findings suggest several strategic actions to ameliorate misperceptions: develop tools to provide immediate feedback to developers on technique effectiveness, enhance access to empirical evidence from studies, and further investigate the specific conditions under which different techniques perform optimally.

Conclusion

This research provides foundational evidence that developers' perceptions of testing effectiveness are often misaligned with real performance. This insight has considerable implications for training and technique selection within the industry. By identifying this gap, the study advocates for systematic integration of empirical evidence into development workflows, ultimately enhancing software quality. Future work should aim to refine the profiling of effective testing strategies depending on code characteristics and defect types, enriching the decision-making toolkit available to practitioners.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.