An Empirical Study on the Characteristics of Database Access Bugs in Java Applications (2405.15008v1)
Abstract: Database-backed applications rely on the database access code to interact with the underlying database management systems (DBMSs). Although many prior studies aim at database access issues like SQL anti-patterns or SQL code smells, there is a lack of study of database access bugs during the maintenance of database-backed applications. In this paper, we empirically investigate 423 database access bugs collected from seven large-scale Java open source applications that use relational database management systems (e.g., MySQL or PostgreSQL). We study the characteristics (e.g., occurrence and root causes) of the bugs by manually examining the bug reports and commit histories. We find that the number of reported database and non-database access bugs share a similar trend but their modified files in bug fixing commits are different. Additionally, we generalize categories of the root causes of database access bugs, containing five main categories (SQL queries, Schema, API, Configuration, SQL query result) and 25 unique root causes. We find that the bugs pertaining to SQL queries, Schema, and API cover 84.2% of database access bugs across all studied applications. In particular, SQL queries bug (54%) and API bug (38.7%) are the most frequent issues when using JDBC and Hibernate, respectively. Finally, we provide a discussion on the implications of our findings for developers and researchers.
- [n. d.]. channable/dbcritic. https://github.com/channable/dbcritic. (Accessed on 03/25/2023).
- [n. d.]. DB OPTIMIZATION SERVICE - Holistic.dev. https://holistic.dev/. (Accessed on 03/25/2023).
- 2009. What Java ORM do you prefer, and why? https://stackoverflow.com/questions/452385/what-java-orm-do-you-prefer-and-why
- 2012. JPA or JDBC, how are they different? https://stackoverflow.com/questions/11881548/jpa-or-jdbc-how-are-they-different/
- 2021. Replication Package. https://github.com/SPEAR-SE/empirical-db-issue-data
- 2022. PYPL PopularitY of Programming Language index. https://pypl.github.io/PYPL.html. (Accessed on 12/21/2022).
- ADempiere. 2021. ADempiere Business Suite. https://github.com/adempiere/adempiere
- Students’ Semantic Mistakes in Writing Seven Different Types of SQL Queries. In Proceedings of the 2016 ACM Conference on Innovation and Technology in Computer Science Education (Arequipa, Peru) (ITiCSE ’16). Association for Computing Machinery, New York, NY, USA, 272–277.
- A survey of problematic database code fragments in software systems. Engineering Reports 3, 10 (2021), e12441.
- Andrea Arcuri. 2019. RESTful API Automated Test Case Generation with EvoMaster. ACM Trans. Softw. Eng. Methodol. 28, 1, Article 3 (jan 2019), 37 pages.
- Andrea Arcuri and Juan P. Galeotti. 2020. Handling SQL Databases in Automated System Test Generation. ACM Trans. Softw. Eng. Methodol. 29, 4, Article 22 (jul 2020), 31 pages.
- Cleaning Antipatterns in an SQL Query Log. In 2018 IEEE 34th International Conference on Data Engineering (ICDE). 1751–1752.
- S. Brass and C. Goldberg. 2004. Semantic errors in SQL queries: a quite complete list. In Fourth International Conference onQuality Software, 2004. QSIC 2004. Proceedings. 250–257.
- BroadleafCommerce. 2021. BroadleafCommerce - Enterprise eCommerce framework based on Spring. https://github.com/BroadleafCommerce/BroadleafCommerce
- JRebel by Perforce. 2014. Java Tools and Technologies Landscape 2014. https://www.jrebel.com/resources/java-tools-and-technologies-landscape-2014
- Search-Based Test Data Generation for SQL Queries. In Proceedings of the 40th International Conference on Software Engineering (Gothenburg, Sweden) (ICSE ’18). Association for Computing Machinery, New York, NY, USA, 1220–1230.
- Fault-based testing of database application programs with conceptual data model. In Fifth International Conference on Quality Software (QSIC’05). 187–196.
- CacheOptimizer: Helping Developers Configure Caching Frameworks for Hibernate-Based Database-Centric Web Applications. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (Seattle, WA, USA) (FSE 2016). Association for Computing Machinery, New York, NY, USA, 666–677.
- Detecting Problems in the Database Access Code of Large Scale Systems - An Industrial Experience Report. In 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C). 71–80.
- Detecting Performance Anti-Patterns for Applications Developed Using Object-Relational Mapping. In Proceedings of the 36th International Conference on Software Engineering (Hyderabad, India) (ICSE 2014). Association for Computing Machinery, New York, NY, USA, 1001–1012.
- Finding and Evaluating the Performance Impact of Redundant Data Access for Applications that are Developed Using Object-Relational Mapping Frameworks. IEEE Transactions on Software Engineering 42, 12 (2016), 1148–1161.
- An Empirical Study on the Practice of Maintaining Object-Relational Mapping Code in Java Systems. In Proceedings of the 13th International Conference on Mining Software Repositories (Austin, Texas) (MSR ’16). Association for Computing Machinery, New York, NY, USA, 165–176.
- Optimizing Database-Backed Applications with Query Synthesis. SIGPLAN Not. 48, 6 (jun 2013), 3–14.
- Jacob Cohen. 1960. A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement 20, 1 (1960), 37–46.
- C. Coronel and S. Morris. 2018. Database Systems: Design, Implementation, & Management (13 ed.). Cengage Learning, Chapter 15.
- Schema Evolution in Wikipedia - Toward a Web Information System Benchmark.. In ICEIS (1) (2008-08-13), José Cordeiro and Joaquim Filipe (Eds.). 3231–332.
- Christine Dancey. 2007. Statistics without maths for psychology. Pearson/Prentice Hall, Harlow, England New York.
- DBeaver. 2021. Free universal database tool and SQL client. https://github.com/dbeaver/dbeaver
- SQLCheck: Automated Detection and Diagnosis of SQL Anti-Patterns. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD ’20). Association for Computing Machinery, New York, NY, USA, 2331–2345.
- dotCMS. 2021. Source Code for dotCMS Hybrid Content Management System. https://github.com/dotCMS/core
- Dynamic Test Input Generation for Database Applications. In Proceedings of the 2007 International Symposium on Software Testing and Analysis (London, United Kingdom) (ISSTA ’07). Association for Computing Machinery, New York, NY, USA, 151–162.
- GitHub. 2022. The top programming languages. https://octoverse.github.com/2022/top-programming-languages Last accessed Nov. 2022.
- X-data: Generating test data for killing SQL mutants. In 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010). 876–879.
- William G.J. Halfond and Alessandro Orso. 2006. Command-Form Coverage for Testing Database Applications. In 21st IEEE/ACM International Conference on Automated Software Engineering (ASE’06). 69–80.
- Hibernate. 2021a. Built-in constraints. https://docs.jboss.org/hibernate/validator/6.0/reference/en-US/html_single/#section-builtin-constraints
- Hibernate. 2021b. HQL and JPQL. https://docs.jboss.org/hibernate/orm/5.3/userguide/html_single/Hibernate_User_Guide.html#hql
- Taxonomy of Real Faults in Deep Learning Systems. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (Seoul, South Korea) (ICSE ’20). Association for Computing Machinery, New York, NY, USA, 1110–1121.
- Yue Jia and Mark Harman. 2011. An Analysis and Survey of the Development of Mutation Testing. IEEE Transactions on Software Engineering 37, 5 (2011), 649–678.
- Search-Based Testing of Relational Schema Integrity Constraints Across Multiple Database Management Systems. In 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation. 31–40.
- B. Karwin. 2010. SQL Antipatterns: Avoiding the Pitfalls of Database Programming. Pragmatic Bookshelf.
- J. Richard Landis and Gary G. Koch. 1977. The Measurement of Observer Agreement for Categorical Data. Biometrics 33, 1 (1977), 159–174.
- Quantifying the Performance Impact of SQL Antipatterns on Mobile Applications. In 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME). 53–64.
- ManageForce. 2016. System Failure - The Cost of Database Downtime. http://www.manageforce.com/blog/dba-suffering-from-system-failure-infographic Last accessed Nov. 2021.
- Impact Analysis of Database Schema Changes. In Proceedings of the 30th International Conference on Software Engineering (Leipzig, Germany) (ICSE ’08). Association for Computing Machinery, New York, NY, USA, 451–460.
- The Effectiveness of Test Coverage Criteria for Relational Database Schema Integrity Constraints. ACM Trans. Softw. Eng. Methodol. 25, 1, Article 8 (dec 2015), 49 pages.
- Automatic Detection and Removal of Ineffective Mutants for the Mutation Analysis of Relational Database Schemas. IEEE Transactions on Software Engineering 45, 5 (2019), 427–463.
- metasfresh. 2021. We do Open Source ERP - Fast, Flexible & Free Software to scale your Business. https://github.com/metasfresh/metasfresh
- Detecting and Preventing Program Inconsistencies under Database Schema Evolution. In 2016 IEEE International Conference on Software Quality, Reliability and Security (QRS). 262–273.
- Microsoft. 2019. getMoreResults Skips resultsets[BUG] #969. https://github.com/microsoft/mssql-jdbc/issues/969
- On the Prevalence, Impact, and Evolution of SQL Code Smells in Data-Intensive Systems. In Proceedings of the 17th International Conference on Mining Software Repositories (Seoul, Republic of Korea) (MSR ’20). Association for Computing Machinery, New York, NY, USA, 327–338.
- MySQL. 2022. MySQL Connectors. https://www.mysql.com/products/connector/
- Csaba Nagy and Anthony Cleve. 2018. SQLInspect: A Static Analyzer to Inspect Database Usage in Java Applications. In 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion (ICSE-Companion). 93–96.
- Openfire. 2021. An XMPP server licensed under the Open Source Apache License. https://github.com/igniterealtime/Openfire
- OpenMRS. 2021. OpenMRS API and web application code. https://github.com/openmrs/openmrs-core
- Oracle. 2021a. Java Software. https://www.oracle.com/java/ Last accessed Nov. 2021.
- Oracle. 2021b. MySQL. https://www.mysql.com/
- Guided Test Generation for Database Applications via Synthesized Database Interactions. ACM Trans. Softw. Eng. Methodol. 23, 2, Article 12 (apr 2014), 27 pages.
- PostgreSQL. 2022. Software Catalogue - Drivers and interfaces. https://www.postgresql.org/download/products/2-drivers-and-interfaces/
- An Empirical Analysis of the Co-Evolution of Schema and Code in Database Applications. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering (Saint Petersburg, Russia) (ESEC/FSE 2013). Association for Computing Machinery, New York, NY, USA, 125–135.
- Redgate. [n. d.]. 119 SQL Code Smells. https://www.red-gate.com/library/119-sql-code-smells
- Tanmoy Sarkar. 2013. Testing database applications using coverage analysis and mutation analysis. Ph. D. Dissertation. Iowa State University.
- Database-Access Performance Antipatterns in Database-Backed Web Applications. In 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME). 58–69.
- Smelly Relations: Measuring and Understanding Database Schema Quality. In 2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP). 55–64.
- SlashData. 2022. State of the Developer Nation. https://slashdata-website-cms.s3.amazonaws.com/sample_reports/VZtJWxZw5Q9NDSAQ.pdf
- María José Suárez-Cabal and Javier Tuya. 2004. Using an SQL Coverage Measurement for Testing Database Applications. In Proceedings of the 12th ACM SIGSOFT Twelfth International Symposium on Foundations of Software Engineering (Newport Beach, CA, USA) (SIGSOFT ’04/FSE-12). Association for Computing Machinery, New York, NY, USA, 253–262.
- SQLMutation: A tool to generate mutants of SQL database queries. In Second Workshop on Mutation Analysis (Mutation 2006 - ISSRE Workshops 2006). 1–1.
- Full Predicate Coverage for Testing SQL Database Queries. Softw. Test. Verif. Reliab. 20, 3 (sep 2010), 237–288.
- Mutating database queries. Information and Software Technology 49, 4 (2007), 398–417.
- Synthesizing Database Programs for Schema Refactoring. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (Phoenix, AZ, USA) (PLDI 2019). Association for Computing Machinery, New York, NY, USA, 286–300.
- The Impact of Equivalent, Redundant and Quasi Mutants on Database Schema Mutation Analysis. In 2014 14th International Conference on Quality Software. 57–66.
- Understanding Database Performance Inefficiencies in Real-World Web Applications. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM ’17). 1299–1308.
- Managing Data Constraints in Database-Backed Web Applications. Association for Computing Machinery, New York, NY, USA, 302–303.
- How Not to Structure Your Database-Backed Web Applications: A Study of Performance Bugs in the Wild. In Proceedings of the 40th International Conference on Software Engineering (Gothenburg, Sweden) (ICSE ’18). Association for Computing Machinery, New York, NY, USA, 800–810.
- Chixiang Zhou and Phyllis Frankl. 2011. JDAMA: Java Database Application Mutation Analyser. Softw. Test. Verif. Reliab. 21, 3 (sep 2011), 241–263.
- Daniel Zwillinger and Stephen Kokoska. 2000. CRC standard probability and statistics tables and formulae. Chapman & Hall/CRC, Boca Raton.