Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Apples, Oranges, and Software Engineering: Study Selection Challenges for Secondary Research on Latent Variables (2402.08706v1)

Published 13 Feb 2024 in cs.SE

Abstract: Software engineering (SE) is full of abstract concepts that are crucial for both researchers and practitioners, such as programming experience, team productivity, code comprehension, and system security. Secondary studies aimed at summarizing research on the influences and consequences of such concepts would therefore be of great value. However, the inability to measure abstract concepts directly poses a challenge for secondary studies: primary studies in SE can operationalize such concepts in many ways. Standardized measurement instruments are rarely available, and even if they are, many researchers do not use them or do not even provide a definition for the studied concept. SE researchers conducting secondary studies therefore have to decide a) which primary studies intended to measure the same construct, and b) how to compare and aggregate vastly different measurements for the same construct. In this experience report, we discuss the challenge of study selection in SE secondary research on latent variables. We report on two instances where we found it particularly challenging to decide which primary studies should be included for comparison and synthesis, so as not to end up comparing apples with oranges. Our report aims to spark a conversation about developing strategies to address this issue systematically and pave the way for more efficient and rigorous secondary studies in software engineering.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Luis Anunciacao. 2018. An Overview of the History and Methodological Aspects of Psychometrics: History and Methodological Aspects of Psychometrics. Journal for ReAttach Therapy and Developmental Diversities 1, 1 (2018), 44–58.
  2. Lessons from Applying the Systematic Literature Review Process within the Software Engineering Domain. Journal of systems and software 80, 4 (2007), 571–583.
  3. On the Application of Measurement Theory in Software Engineering. Empirical Software Engineering 1, 1 (1996), 61–88. https://doi.org/10.1007/BF00125812
  4. Timothy A. Brown. 2015. Confirmatory Factor Analysis for Applied Research. Guilford publications.
  5. Eye movements in code reading: Relaxing the linear order. In 2015 IEEE 23rd International Conference on Program Comprehension. IEEE, 255–265.
  6. John B Carroll. 2005. The three-stratum theory of cognitive abilities. Contemporary intellectual assessment: theories, tests, and issues (2005).
  7. Replications of Software Engineering Experiments. Empirical Software Engineering 19, 2 (April 2014), 267–276. https://doi.org/10.1007/s10664-013-9290-8
  8. Denis Cousineau. 2005. The Rise of Quantitative Methods in Psychology. Tutorials in Quantitative Methods for Psychology 1, 1 (2005), 1–3.
  9. Lee J. Cronbach and Paul E. Meehl. 1955. Construct Validity in Psychological Tests. Psychological bulletin 52, 4 (1955), 281.
  10. Daniela S. Cruzes and Tore Dybå. 2011. Research Synthesis in Software Engineering: A Tertiary Study. Information and Software Technology 53, 5 (2011), 440–455.
  11. Six Years of Systematic Literature Reviews in Software Engineering: An Updated Tertiary Study. Information and Software Technology 53, 9 (2011), 899–913.
  12. Empirical Methodologies in Software Engineering. In Eleventh Annual International Workshop on Software Technology and Engineering Practice. IEEE, 52–58.
  13. Rebecca DerSimonian and Nan Laird. 1986. Meta-Analysis in Clinical Trials. Controlled clinical trials 7, 3 (1986), 177–188.
  14. Rebecca DerSimonian and Nan Laird. 2015. Meta-Analysis in Clinical Trials Revisited. Contemporary clinical trials 45 (2015), 139–145.
  15. Robert F. DeVellis. 2006. Classical Test Theory. Medical care (2006), S50–S59. arXiv:41219505
  16. Susan E. Embretson and Steven P. Reise. 2013. Item Response Theory. Psychology Press.
  17. R. Michael Furr. 2021. Psychometrics: An Introduction. SAGE publications.
  18. Psychometrics in Behavioral Software Engineering: A Methodological Introduction with Guidelines. ACM Trans. Softw. Eng. Methodol. 31, 1, Article 7 (sep 2021), 36 pages. https://doi.org/10.1145/3469888
  19. Thom Holwerda. 2008. WTFs/m. https://web.archive.org/web/20231122170548/https://www.osnews.com/story/19266/wtfsm/
  20. A Systematic Literature Review and Meta-Analysis on Cross Project Defect Prediction. IEEE Transactions on Software Engineering 45, 2 (2017), 111–147.
  21. Systematizing the Meta-Analytical Process in Software Engineering. In 2021 2nd European Symposium on Software Engineering. ACM, Larissa Greece, 1–5. https://doi.org/10.1145/3501774.3501775
  22. Andreas Jedlitschka and Marcus Ciolkowski. 2004. Towards Evidence in Software Engineering. In Proceedings. 2004 International Symposium on Empirical Software Engineering, 2004. ISESE’04. IEEE, 261–270.
  23. A Systematic Review of Effect Size in Software Engineering Experiments. Information and Software Technology 49, 11-12 (2007), 1073–1086.
  24. Barbara Kitchenham. 2008. The Role of Replications in Empirical Software Engineering—a Word of Warning. Empirical Software Engineering 13, 2 (April 2008), 219–221. https://doi.org/10.1007/s10664-008-9061-0
  25. SEGRESS: Software Engineering Guidelines for Reporting Secondary Studies. IEEE Transactions on Software Engineering 49, 3 (2022), 1273–1298.
  26. Systematic Literature Reviews in Software Engineering–a Tertiary Study. Information and software technology 52, 8 (2010), 792–805.
  27. Barbara Ann Kitchenham. 2015. Evidence-Based Software Engineering and Systematic Reviews. Chapman & Hall / CRC Innovations in Software Engineering and Software Development Series, Vol. v.4. CRC Press, Boca Raton.
  28. Ian McChesney and Raymond Bond. 2019. Eye tracking analysis of computer program comprehension in programmers with dyslexia. Empirical Software Engineering 24, 3 (2019), 1109–1154.
  29. Robert R. McCrae. 2020. The Five-Factor Model of personality traits: consensus and controversy. In The Cambridge Handbook of Personality Psychology (2nd ed.). Cambridge University Press.
  30. James Miller. 2000. Applying Meta-Analytical Procedures to Software Engineering Experiments. Journal of Systems and Software 54, 1 (Jan. 2000), 29–39. https://doi.org/10.1016/S0164-1212(00)00024-8
  31. Use of the CONSORT Statement and Quality of Reports of Randomized TrialsA Comparative Before-and-After Evaluation. JAMA 285, 15 (April 2001), 1992–1995. https://doi.org/10.1001/jama.285.15.1992
  32. An empirical validation of cognitive complexity as a measure of source code understandability. In Proceedings of the 14th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM) (Bari, Italy). Association for Computing Machinery, New York, NY, USA, Article 5, 12 pages. https://doi.org/10.1145/3382494.3410636
  33. P David Pearson and Gina N Cervetti. 2015. Fifty years of reading comprehension theory and practice. Research-based practices for teaching Common Core literacy (2015), 1–24.
  34. What Drives the Reading Order of Programmers? An Eye Tracking Study. In Proceedings of the 28th International Conference on Program Comprehension (Seoul, Republic of Korea) (ICPC ’20). Association for Computing Machinery, New York, NY, USA, 342–353. https://doi.org/10.1145/3387904.3389279
  35. Systematic Mapping Studies in Software Engineering. In 12th International Conference on Evaluation and Assessment in Software Engineering (EASE) 12. 1–10.
  36. Paul Ralph and Sebastian Baltes. 2022. Paving the Way for Mature Secondary Research: The Seven Types of Literature Review. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Singapore) (ESEC/FSE 2022). Association for Computing Machinery, New York, NY, USA, 1632–1636. https://doi.org/10.1145/3540250.3560877
  37. Paul Ralph and Ewan Tempero. 2018. Construct Validity in Software Engineering Research and Software Metrics. In Proceedings of the 22nd International Conference on Evaluation and Assessment in Software Engineering 2018 (Christchurch, New Zealand) (EASE ’18). Association for Computing Machinery, New York, NY, USA, 13–23. https://doi.org/10.1145/3210459.3210461
  38. Martin Schmettow and Wolfgang Vietze. 2008. Introducing Item Response Theory for Measuring Usability Inspection Processes. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, Florence Italy, 893–902. https://doi.org/10.1145/1357054.1357196
  39. Janet Siegmund. 2022. MIP Talk: Measuring Programming Experience: 2012 vs. 2022. https://www.youtube.com/watch?v=_OwQiUyVp-M
  40. Measuring and modeling programming experience. Empirical Software Engineering 19 (2014), 1299–1334.
  41. A Catalogue of Reporting Guidelines for Health Research. European Journal of Clinical Investigation 40, 1 (Jan. 2010), 35–53. https://doi.org/10.1111/j.1365-2362.2009.02234.x
  42. Dag IK Sjøberg and Gunnar R. Bergersen. 2023. Improving the Reporting of Threats to Construct Validity. arXiv preprint arXiv:2306.05336 (2023). arXiv:2306.05336
  43. Dag I.K. Sjøberg and Gunnar R. Bergersen. 2022. Construct Validity in Software Engineering. IEEE Transactions on Software Engineering 49, 3 (2022), 1374–1396. https://doi.org/10.1109/TSE.2022.3176725
  44. Gregory T. Smith. 2005. On Construct Validity: Issues of Method and Measurement. Psychological assessment 17, 4 (2005), 396.
  45. Ewan Tempero and Paul Ralph. 2016. A Model for Defining Coupling Metrics. In 2016 23rd Asia-Pacific Software Engineering Conference (APSEC). IEEE, 145–152. https://doi.org/10.1109/APSEC.2016.030
  46. Drew Westen and Robert Rosenthal. 2003. Quantifying Construct Validity: Two Simple Measures. Journal of personality and social psychology 84, 3 (2003), 608.
  47. Marvin Wyrich. 2023. Source Code Comprehension: A Contemporary Definition and Conceptual Model for Empirical Investigation. arXiv:2310.11301 [cs.SE]
  48. 40 Years of Designing Code Comprehension Experiments: A Systematic Mapping Study. ACM Comput. Surv. (2023), 42 pages. https://doi.org/10.1145/3626522

Summary

We haven't generated a summary for this paper yet.