Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A cooperative strategy for diagnosing the root causes of quality requirement violations in multiagent systems (2404.12226v1)

Published 18 Apr 2024 in cs.SE

Abstract: Many modern software systems are built as a set of autonomous software components (also called agents) that collaborate with each other and are situated in an environment. To keep these multiagent systems operational under abnormal circumstances, it is crucial to make them resilient. Existing solutions are often centralised and rely on information manually provided by experts at design time, making such solutions rigid and limiting the autonomy and adaptability of the system. In this work, we propose a cooperative strategy focused on the identification of the root causes of quality requirement violations in multiagent systems. This strategy allows agents to cooperate with each other in order to identify whether these violations come from service providers, associated components, or the communication infrastructure. From this identification process, agents are able to adapt their behaviour in order to mitigate and solve existing abnormalities with the aim of normalising system operation. This strategy consists of an interaction protocol that, together with the proposed algorithms, allow agents playing the protocol roles to diagnose problems to be repaired. We evaluate our proposal with the implementation of a service-oriented system. The results demonstrate that our solution enables the correct identification of different sources of failures, favouring the selection of the most suitable actions to be taken to overcome abnormal situations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. Toward open-world software: Issues and challenges. Computer 39, 36–43.
  2. Self-adapting reliability in distributed software systems. IEEE Transactions on Software Engineering 41, 764–780. doi:10.1109/TSE.2015.2412134.
  3. Smart redundancy for distributed computation, in: 2011 31st International Conference on Distributed Computing Systems, pp. 665–676. doi:10.1109/ICDCS.2011.25.
  4. Anomaly detection: A survey. ACM Computing Surveys 41. doi:10.1145/1541880.1541882.
  5. Self-adaptive and online QoS modeling for cloud-based software services. IEEE Transactions on Software Engineering 43, 453–475. doi:10.1109/TSE.2016.2608826.
  6. Dynamic adaptation of service-based applications: a design for adaptation approach. Journal of Internet Services and Applications 11. doi:10.1186/s13174-020-00123-6.
  7. A survey of autonomic communications. ACM Transactions on Autonomous and Adaptive Systems 1, 223–259. doi:10.1145/1186778.1186782.
  8. Self-organization and resilience for networked systems: Design principles and open research issues. Proceedings of the IEEE 107, 819–834. doi:10.1109/JPROC.2019.2894512.
  9. Towards dynamic rebalancing of bike sharing systems: An event-driven agents approach, in: Oliveira, E., Gama, J., Vale, Z., Lopes Cardoso, H. (Eds.), Progress in Artificial Intelligence, Springer International Publishing, Cham. pp. 309–320.
  10. Cleaning up the mess: A formal framework for autonomously reverting bdi agent actions, in: Proceedings of the 13th International Conference on Software Engineering for Adaptive and Self-Managing Systems, Association for Computing Machinery, New York, NY, USA. p. 108–118. doi:10.1145/3194133.3194156.
  11. Remediating critical cause-effect situations with an extended BDI architecture. Expert Systems with Applications 95, 190–200. doi:10.1016/j.eswa.2017.11.036.
  12. Foundation for Intelligent Physical Agents, 2002. FIPA Contract Net Interaction Protocol Specification. Technical Report. URL: http://www.fipa.org/specs/fipa00029/SC00029H.html.
  13. Self-healing systems — survey and synthesis. Decision Support Systems 42, 2164–2185. doi:10.1016/j.dss.2006.06.011.
  14. Automating coordinated autonomous vehicle control, in: Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC. pp. 1867–1868.
  15. An integrated trust and reputation model for open multi-agent systems. Autonomous Agents and Multi-Agent Systems 13, 119–154. doi:10.1007/s10458-005-6825-4.
  16. Service-oriented agent architecture for unmanned air vehicles, in: IEEE/AIAA Digital Avionics Systems Conference (DASC), pp. 1–19.
  17. An agent-based approach for building complex software systems. Commun. ACM 44, 35––41. doi:10.1145/367211.367250.
  18. Inference in multi-agent causal models. International Journal of Approximate Reasoning 46, 274–299. doi:10.1016/j.ijar.2006.09.005.
  19. Notions of reputation in multi-agents systems: A review, in: Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems, Association for Computing Machinery, New York, NY, USA. pp. 280–287. doi:10.1145/544741.544807.
  20. A multivariate additive noise model for complete causal discovery. Neural Networks 103, 44–54. doi:10.1016/j.neunet.2018.03.013.
  21. On estimation of a probability density function and mode. Annals of Mathematical Statistics 33, 1065–1076. doi:10.1214/aoms/1177704472.
  22. Probabilistic reasoning in intelligent systems: networks of plausible inference. Elsevier.
  23. Self-management of adaptable component-based applications. IEEE Transactions on Software Engineering 39, 403–421. doi:10.1109/TSE.2012.29.
  24. Resilience and survivability in communication networks: Strategies, principles, and survey of disciplines. Computer Networks 54, 1245–1265. doi:10.1016/j.comnet.2010.03.005.
  25. Exploratory Data Analysis. Addison-Wesley.
  26. Combining knowledge and historical data for system-level fault diagnosis of HVAC systems. Engineering Applications of Artificial Intelligence 59, 260–273. doi:10.1016/j.engappai.2016.12.021.
  27. Service level management using QoS monitoring, diagnostics, and adaptation for networked enterprise systems, in: Ninth IEEE International EDOC Enterprise Computing Conference (EDOC’05), pp. 239–248. doi:10.1109/EDOC.2005.30.
  28. Cloudranger: Root cause identification for cloud native systems, in: 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), pp. 492–502. doi:10.1109/CCGRID.2018.00076.
  29. Accountability monitoring and reasoning in service-oriented architectures. Service Oriented Computing and Applications 1, 35–50. doi:10.1007/s11761-007-0001-4.
  30. Fault analysis and debugging of microservice systems: Industrial survey, benchmark system, and empirical study. IEEE Transactions on Software Engineering 14, 1–18.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com