Architecting Data-Intensive Applications : From Data Architecture Design to Its Quality Assurance (2401.12011v3)
Abstract: Context - The exponential growth of data is becoming a significant concern. Managing this data has become incredibly challenging, especially when dealing with various sources in different formats and speeds. Moreover, Ensuring data quality has become increasingly crucial for effective decision-making and operational processes. Data Architecture is crucial in describing, collecting, storing, processing, and analyzing data to meet business needs. Providing an abstract view of data-intensive applications is essential to ensure that the data is transformed into valuable information. We must take these challenges seriously to ensure we can effectively manage and use the data to our advantage. Objective - To establish an architecture framework that enables a comprehensive description of the data architecture and effectively streamlines data quality monitoring. Method - The architecture framework utilizes Model Driven Engineering (MDE) techniques. Its backing of data-intensive architecture descriptions empowers with an automated generation for data quality checks. Result - The Framework offers a comprehensive solution for data-intensive applications to model their architecture efficiently and monitor the quality of their data. It automates the entire process and ensures precision and consistency in data. With DAT, architects and analysts gain access to a powerful tool that simplifies their workflow and empowers them to make informed decisions based on reliable data insights. Conclusion - We have evaluated the DAT on more than five cases within various industry domains, demonstrating its exceptional adaptability and effectiveness.
- Evolution of data management. Computer, 29(10):38–46, 1996.
- M. Abughazala and H. Muccini. Modeling data analytics architecture for iot applications using dat. In 2023 IEEE 20th International Conference on Software Architecture Companion (ICSA-C), pages 284–291, 2023.
- Pydaqu: Python data quality code generation based on data architecture. In 2023 ACM/IEEE International Conference on Model Driven Engineering Languages and Systems Companion (MODELS-C), pages 60–64. IEEE, 2023.
- Dat: Data architecture modeling tool for data-driven applications. In European Conference on Software Architecture, pages 90–101. Springer, 2022.
- Architecture description framework for data-intensive applications. In 2023 Fourth International Conference on Intelligent Data Science Technologies and Applications (IDSTA), pages 99–106. IEEE, 2023.
- A review on iot with big data analytics. In 2021 9th International Conference on Information and Communication Technology (ICoICT), pages 160–164. IEEE, 2021.
- Rise of big data–issues and challenges. In 2018 21st Saudi Computer Society National Computer Conference (NCC), pages 1–6. IEEE, 2018.
- Big data: concepts, technologies and applications in the public sector. Int J Comput Electr Autom Control Inform Eng, 10:1629–35, 2016.
- F. Almeida and C. Calistru. The main challenges and issues of big data management. 2013.
- Software engineering for machine learning: A case study. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), pages 291–300. IEEE, 2019.
- Model-based big data analytics-as-a-service: take big data to the next level. IEEE Transactions on Services Computing, 14(2):516–529, 2018.
- Infrastructure-as-code for data-intensive architectures: a model-driven development approach. In 2018 IEEE international conference on software architecture (ICSA), pages 156–15609. IEEE, 2018.
- The role of structured and unstructured data managing mechanisms in the internet of things. Cluster computing, 23(2):1185–1198, 2020.
- Big Data: Concepts, Technology, and Architecture. John Wiley and Sons, 2021.
- K. L. Barriball and A. While. Collecting data using a semi-structured interview: a discussion paper. Journal of Advanced Nursing-Institutional Subscription, 19(2):328–335, 1994.
- Big data management and analytics metamodel for iot-enabled smart buildings. IEEE Access, 8:169740–169758, 2020.
- S. E. Bibri. The anatomy of the data-driven smart sustainable city: instrumentation, datafication, computerization and related applications. Journal of Big Data, 6(1):1–43, 2019.
- R. Bogdan and S. K. Biklen. Qualitative research for education. Allyn & Bacon Boston, MA, 1997.
- Architectural software patterns for the development of iot smart applications. arXiv preprint arXiv:2003.04781, 2020.
- Engineering ai systems: A research agenda. Artificial Intelligence Paradigms for Smart Cyber-Physical Systems, pages 1–19, 2021.
- C. Szyperski. Component Software. Beyond Object Oriented Programming. Addison Wesley, 1998.
- L. Cai and Y. Zhu. The challenges of data quality and data quality assessment in the big data era. Data science journal, 14, 2015.
- A survey on big data analytics solutions deployment. In Software Architecture: 13th European Conference, ECSA 2019, Paris, France, September 9–13, 2019, Proceedings 13, pages 195–210. Springer, 2019.
- Accordant: A domain specific-model and devops approach for big data analytics architectures. Journal of Systems and Software, 172:110869, 2021.
- New horizons for a data-driven economy: a roadmap for usage and exploitation of big data in Europe. Springer Nature, 2016.
- Data-intensive applications, challenges, techniques and technologies: A survey on big data. Information sciences, 275:314–347, 2014.
- Agile big data analytics for web-based systems: an architecture-centric approach. IEEE Transactions on Big Data, 2(3):234–248, 2016.
- D. Coghlan. Action research: Exploring perspectives on a philosophy of practical knowing. Academy of Management Annals, 5(1):53–87, 2011.
- I. D. Corporation. How idc’s industry cloudpath and saaspath surveys can inform your cloud/saas strategy, 2019.
- N. Cowie. Observation. Qualitative research in applied linguistics: A practical introduction, pages 165–181, 2009.
- D. S. Cruzes and T. Dyba. Recommended steps for thematic synthesis in software engineering. In 2011 international symposium on empirical software engineering and measurement, pages 275–284. IEEE, 2011.
- DBT. Data build tool, 2022.
- J. Densmore. Data Pipelines Pocket Reference. " O’Reilly Media, Inc.", 2021.
- I. Dey. Qualitative data analysis: A user friendly guide for social scientists. Routledge, 2003.
- Selecting empirical methods for software engineering research. Guide to advanced empirical software engineering, pages 285–311, 2008.
- I. Eclipse Foundation. Eclipse epsilon, 2009.
- I. Eclipse Foundation. Eclipse modeling framework, 2009.
- I. Eclipse Foundation. Graphical model editor development with eugenia/gmf, 2009.
- An overview of internet of things (iot) and data analytics in agriculture: Benefits and challenges. IEEE Internet of things Journal, 5(5):3758–3773, 2018.
- Data storage in big data context: A survey. In 2016 Third International Conference on Systems of Collaboration (SysCo), pages 1–4. IEEE, 2016.
- An advanced big data quality framework based on weighted metrics. Big Data and Cognitive Computing, 6(4):153, 2022.
- D. Engineers. Data architecture, 2018.
- A. Erraissi and A. Belangour. Data sources and ingestion big data layers: meta-modeling of key concepts and features. International Journal of Engineering and Technology, 7(4):3607–3612, 2018.
- A big data visualization layer meta-model proposition. In 2019 8th International Conference on Modeling Simulation and Applied Optimization (ICMSAO), pages 1–5. IEEE, 2019.
- G. Expectations. Great expectations, 2021.
- Bigqa: Declarative big data quality assessment. ACM Journal of Data and Information Quality, 2023.
- I. Gartner. How to improve your data quality, 2021.
- Lambda+, the renewal of the lambda architecture: Category theory to the rescue. In International Conference on Advanced Information Systems Engineering, pages 381–396. Springer, 2021.
- A performance modeling framework for lambda architecture based applications. Future Generation Computer Systems, 86:1032–1041, 2018.
- Towards a model-driven design tool for big data architectures. In Proceedings of the 2nd international workshop on BIG data software engineering, pages 37–43, 2016.
- K. F. Hyde. Recognising deductive processes in qualitative research. Qualitative market research: An international journal, 3(2):82–90, 2000.
- IBM. What is a data architecture?, 2018.
- Chapter 1.1 - an introduction to data architecture. In W. Inmon, D. Linstedt, and M. Levins, editors, Data Architecture (Second Edition), pages 1–5. Academic Press, second edition edition, 2019.
- Chapter 1.6 - the life cycle of data: Understanding data over time. In W. Inmon, D. Linstedt, and M. Levins, editors, Data Architecture (Second Edition), pages 33–37. Academic Press, second edition edition, 2019.
- Data Architecture: A Primer for the Data Scientist: A Primer for the Data Scientist. Academic Press, 2019.
- D. International. DAMA-DMBOK: Data Management Body of Knowledge (2nd Edition). Technics Publications, LLC, Denville, NJ, USA, 2017.
- ISO/IEC/IEEE. ISO/IEC/IEEE 42010:2022 Systems and software engineering – Architecture description, 2022.
- S. Jesiļevska. Data quality dimensions to ensure optimal data quality. Romanian Economic Journal, 20(63), 2017.
- Quality assurance technologies of big data applications: A systematic literature review. Applied Sciences, 10(22):8052, 2020.
- S. John Walker. Big data: A revolution that will transform how we live, work, and think, 2014.
- Big data architectures: A detailed and application oriented review. Int. Journal Innov. Technol. Explor. Eng, 8:2182–2190, 2019.
- F. Kalna and A. Belangour. A meta-model for diverse data sources in business intelligence. American Journal of Embedded Systems and Applications, 7(1):1–8, 2019.
- Collecting feedback during software engineering experiments. Empirical Software Engineering, 10:113–147, 2005.
- Data quality in internet of things: A state-of-the-art survey. Journal of Network and Computer Applications, 73:57–81, 2016.
- Lambda architecture for cost-effective batch and speed big data processing. In 2015 IEEE International Conference on Big Data (Big Data), pages 2785–2792. IEEE, 2015.
- M. Kleppmann. Designing data-intensive applications, 2019.
- A. Labs. Deequ - unit tests for data, 2018.
- Qualitative descriptive research: An acceptable design. Pacific Rim international journal of nursing research, 16(4):255–256, 2012.
- L. Liu and L. Chi. Evolutional data quality: A theory-specific view. In ICIQ, pages 292–304, 2002.
- A taxonomy of software engineering challenges for machine learning systems: An empirical investigation. In Agile Processes in Software Engineering and Extreme Programming: 20th International Conference, XP 2019, Montréal, QC, Canada, May 21–25, 2019, Proceedings 20, pages 227–243. Springer International Publishing, 2019.
- Data-intensive systems, knowledge management, and software engineering. CRC Press Boca Raton, FL, 2021.
- The sage handbook of applied social research methods. SAGE Publications, Inc., Thousand Oaks. doi, 10:9781483348858, 2009.
- J. McKay and P. Marshall. The dual imperatives of action research. Information Technology & People, 14(1):46–59, 2001.
- Qualitative research in practice: Examples for discussion and analysis. John Wiley & Sons, 2019.
- S. Mishra and A. Misra. Structured and unstructured big data analytics. In 2017 International Conference on Current Trends in Computer, Electrical, Electronics and Communication (CTCEEC), pages 740–746. IEEE, 2017.
- A review of big data environment and its related technologies. In 2016 International Conference on Information Communication and Embedded Systems (ICICES), pages 1–5. IEEE, 2016.
- J. Moran-Ellis. Real world research: A resource for social scientists and practitioner-researchers. Sociology, 28(2):642–644, 1994.
- H. Muccini and M. Sharaf. Caps: Architecture description of situational aware cyber physical systems. In 2017 IEEE International Conference on Software Architecture (ICSA), pages 211–220. IEEE, 2017.
- From ad-hoc data analytics to dataops. In Proceedings of the International Conference on Software and System Processes, pages 165–174, 2020.
- Auditing and assessment of data traffic flows in an iot architecture. In 2018 IEEE 4th International Conference on Collaboration and Internet Computing (CIC), pages 388–391. IEEE, 2018.
- Protocol analysis: a neglected practice. Communications of the ACM, 49(2):117–122, 2006.
- M. Pathirage. Kappa architecture-where every thing is a stream. linea]. Available: http://milinda. pathirage. org/kappa-architecture. com/.[Ultimo acceso: 12 Febrero 2019], 2020.
- A uml profile for the design, quality assessment and deployment of data-intensive applications. Software and Systems Modeling, 18:3577–3614, 2019.
- Case studies for software engineers. In Proceedings of the 28th international conference on software engineering, pages 1045–1046, 2006.
- Action research as a model for industry-academia collaboration in the software engineering context. In Proceedings of the 2014 international workshop on Long-term industrial collaboration on software engineering, pages 55–62, 2014.
- Data management challenges in production machine learning. In Proceedings of the 2017 ACM International Conference on Management of Data, pages 1723–1726, 2017.
- Data lifecycle challenges in production machine learning: a survey. ACM SIGMOD Record, 47(2):17–28, 2018.
- Modelling data pipelines. In 2020 46th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), pages 13–20. IEEE, 2020.
- The internet of things and big data analytics: integrated platforms and industry use cases. CRC Press, 2020.
- A. Ramasamy and S. Chowdhury. Big data quality dimensions: a systematic literature review. JISTEM-Journal of Information Systems and Technology Management, 17, 2020.
- R. Rawat and R. Yadav. Big data: Big data analysis, issues and challenges and technologies. IOP Conference Series: Materials Science and Engineering, 1022(1):012014, jan 2021.
- P. Runeson and M. Höst. Guidelines for conducting and reporting case study research in software engineering. Empirical software engineering, 14:131–164, 2009.
- J. Saldaña. The coding manual for qualitative researchers. sage, 2021.
- N. Sawant and H. Shah. Big data application architecture. In Big data Application Architecture Q and A, pages 9–28. Springer, 2013.
- C. B. Seaman. Qualitative methods in empirical studies of software engineering. IEEE Transactions on software engineering, 25(4):557–572, 1999.
- Arduino realization of caps iot architecture descriptions. In Proceedings of the 12th European Conference on Software Architecture: Companion Proceedings, pages 1–4, 2018.
- An architecture framework for modelling and simulation of situational-aware cyber-physical systems. In European Conference on Software Architecture, pages 95–111. Springer, 2017.
- Data quality: A survey of data quality dimensions. In 2012 International Conference on Information Retrieval & Knowledge Management, pages 300–304, 2012.
- Y. Simmhan and S. Perera. Big data analytics platforms for real-time applications in iot. In Big data analytics, pages 115–135. Springer, 2016.
- Data architectures’ evolution and protection. In 2022 International Conference on Electrical, Computer and Energy Technologies (ICECET), pages 1–6. IEEE, 2022.
- Software engineering data collection for field studies. In Guide to Advanced Empirical Software Engineering, pages 9–34. Springer, 2008.
- S. Soni and A. Singh. Improving data quality using big data framework: A proposed approach. In IOP Conference Series: Materials Science and Engineering, volume 1022, page 012092. IOP Publishing, 2021.
- A. Strauss and J. Corbin. Basics of qualitative research techniques. 1998.
- J. Sutton and Z. Austin. Qualitative research: Data collection, analysis, and management. The Canadian journal of hospital pharmacy, 68(3):226, 2015.
- Systematic mapping study of template-based code generation. Computer Languages, Systems & Structures, 52:43–62, 2018.
- Big data quality framework: a holistic approach to continuous quality management. Journal of Big Data, 8(1):1–41, 2021.
- Big data quality: A survey. In 2018 IEEE International Congress on Big Data (BigData Congress), pages 166–173, 2018.
- Talend. How modern data architecture drives real business results, 2018.
- Talend. What is a data architecture framework?, 2018.
- techtarget. What is data architecture? a data management blueprint, 2018.
- A. A. Tole. Big data challenges. Database systems journal, 4(3), 2013.
- Towardsdatascience. Designing data products, 2018.
- Twitter. Twitter seeing 6 billion api calls per day, 70k per second, 2010.
- C. Verma and R. Pandey. Comparative analysis of gfs and hdfs: technology and architectural landscape. In 2018 10th International Conference on Computational Intelligence and Communication Networks (CICN), pages 54–58. IEEE, 2018.
- Guidelines for industrially-based multiple case studies in software engineering. In 2009 Third International Conference on Research Challenges in Information Science, pages 313–324. IEEE, 2009.
- M. Voelter. A catalog of patterns for program generation. In EuroPLoP, pages 285–320, 2003.
- A. Von Mayrhauser and A. M. Vans. Identification of dynamic comprehension processes during large scale maintenance. IEEE Transactions on Software Engineering, 22(6):424–437, 1996.
- Overview of data quality: Examining the dimensions, antecedents, and impacts of data quality. Journal of the Knowledge Economy, pages 1–20, 2023.
- J. Warren and N. Marz. Big Data: Principles and best practices of scalable realtime data systems. Simon and Schuster, 2015.
- Dataflow management in the internet of things: Sensing, control, and security. Tsinghua Science and Technology, 26(6):918–930, 2021.
- L. P. Wong. Data analysis in qualitative research: A brief guide to using nvivo. Malaysian family physician: the official journal of the Academy of Family Physicians of Malaysia, 3(1):14, 2008.
- A data storage architecture supporting multi-level customization for saas. In 2010 Seventh Web Information Systems and Applications Conference, pages 106–109. IEEE, 2010.
- Iot analytics architectures: challenges, solution proposals and future research directions. In International Conference on Research Challenges in Information Science, pages 76–92. Springer, 2020.