K-SHAP: Policy Clustering Algorithm for Anonymous Multi-Agent State-Action Pairs (2302.11996v5)
Abstract: Learning agent behaviors from observational data has shown to improve our understanding of their decision-making processes, advancing our ability to explain their interactions with the environment and other agents. While multiple learning techniques have been proposed in the literature, there is one particular setting that has not been explored yet: multi agent systems where agent identities remain anonymous. For instance, in financial markets labeled data that identifies market participant strategies is typically proprietary, and only the anonymous state-action pairs that result from the interaction of multiple market participants are publicly available. As a result, sequences of agent actions are not observable, restricting the applicability of existing work. In this paper, we propose a Policy Clustering algorithm, called K-SHAP, that learns to group anonymous state-action pairs according to the agent policies. We frame the problem as an Imitation Learning (IL) task, and we learn a world-policy able to mimic all the agent behaviors upon different environmental states. We leverage the world-policy to explain each anonymous observation through an additive feature attribution method called SHAP (SHapley Additive exPlanations). Finally, by clustering the explanations we show that we are able to identify different agent policies and group observations accordingly. We evaluate our approach on simulated synthetic market data and a real-world financial dataset. We show that our proposal significantly and consistently outperforms the existing methods, identifying different agent strategies.
- Abides-gym: gym environments for multi-agent discrete event simulation and application to financial markets. In Proceedings of the Second ACM International Conference on AI in Finance, pages 1–9, 2021.
- Dynamic inverse reinforcement learning for characterizing animal behavior. In Advances in Neural Information Processing Systems.
- Apprenticeship learning about multiple intentions. In Proceedings of the 28th international conference on machine learning (ICML-11), pages 897–904, 2011.
- Stewardship of global collective behavior. Proceedings of the National Academy of Sciences, 118(27):e2025764118, 2021.
- Trades, quotes and prices: financial markets under the microscope. Cambridge University Press, 2018.
- Abides: Towards high-fidelity market simulation for ai research. arXiv preprint arXiv:1904.12066, 2019.
- Identifiability in inverse reinforcement learning. Advances in Neural Information Processing Systems, 34:12362–12373, 2021.
- Market making and mean reversion. In Proceedings of the 12th ACM conference on Electronic commerce, pages 307–314, 2011.
- Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794, 2016.
- Intrinsically motivated reinforcement learning. Advances in neural information processing systems, 17, 2004.
- Machine learning for active matter. Nature Machine Intelligence, 2(2):94–103, 2020.
- Towards realistic market simulations: a generative adversarial networks approach. In Proceedings of the Second ACM International Conference on AI in Finance, pages 1–9, 2021.
- Learning to simulate realistic limit order book markets from data as a world agent. In Proceedings of the Third ACM International Conference on AI in Finance, pages 428–436, 2022.
- Top-down deep clustering with multi-generator gans. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 7770–7778, 2022.
- Equitable marketplace mechanism design. In Proceedings of the Third ACM International Conference on AI in Finance, pages 232–239, 2022.
- Explainable ai (xai): core ideas, techniques and solutions. ACM Computing Surveys (CSUR), 2022.
- Modeling honey bee behavior for recognition using human trainable models. In Modeling Other Agents from Observations (Workshop at AAMAS), 2004.
- Evaluating strategic structures in multi-agent inverse reinforcement learning. Journal of Artificial Intelligence Research, 71:925–951, 2021.
- Allocative efficiency of markets with zero-intelligence traders: Market as a partial substitute for individual rationality. Journal of political economy, 101(1):119–137, 1993.
- Computational models of collective behavior. Trends in cognitive sciences, 9(9):424–430, 2005.
- Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020.
- Recurrent world models facilitate policy evolution. Advances in neural information processing systems, 31, 2018.
- The diversity of high-frequency traders. Journal of Financial Markets, 16(4):741–770, 2013.
- Algorithm as 136: A k-means clustering algorithm. Journal of the royal statistical society. series c (applied statistics), 28(1):100–108, 1979.
- Generative adversarial imitation learning. Advances in neural information processing systems, 29, 2016.
- Comparing partitions. Journal of classification, 2(1):193–218, 1985.
- Imitation learning: A survey of learning methods. ACM Computing Surveys (CSUR), 50(2):1–35, 2017.
- Context-aware mouse behavior recognition using hidden markov models. IEEE Transactions on Image Processing, 28(3):1133–1148, 2018.
- The flash crash: High-frequency trading in an electronic market. The Journal of Finance, 72(3):967–998, 2017.
- Gregory D Koblentz. Chemical-weapon use in syria: atrocities, attribution, and accountability. The Nonproliferation Review, 26(5-6):575–598, 2019.
- Blake LeBaron. Agent-based computational finance. Handbook of computational economics, 2:1187–1233, 2006.
- Generative attention networks for multi-agent behavioral modeling. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 7195–7202, 2020.
- Biased or limited: Modeling sub-rational human investors in financial markets. arXiv preprint arXiv:2210.08569, 2022.
- Detecting troll behavior via inverse reinforcement learning: A case study of russian trolls in the 2016 us election. In Proceedings of the international AAAI conference on web and social media, volume 14, pages 417–427, 2020.
- A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017.
- Consistent individualized feature attribution for tree ensembles. arXiv preprint arXiv:1802.03888, 2018.
- From local explanations to global understanding with explainable ai for trees. Nature machine intelligence, 2(1):56–67, 2020.
- Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, 2018.
- Umap: Uniform manifold approximation and projection. The Journal of Open Source Software, 3(29):861, 2018.
- Clustergan: Latent space clustering in generative adversarial networks. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 4610–4617, 2019.
- NASDAQ. Nasdaq total view, 2022.
- Algorithms for inverse reinforcement learning. In Icml, volume 1, page 2, 2000.
- Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
- Truly batch model-free inverse reinforcement learning about multiple intentions. In International Conference on Artificial Intelligence and Statistics, pages 2359–2369. PMLR, 2020.
- Why should i trust you? explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016.
- Attributing cyber attacks. Journal of Strategic Studies, 38(1-2):4–37, 2015.
- Peter J Rousseeuw. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics, 20:53–65, 1987.
- Jeremy J Siegel. What is an asset price bubble? an operational definition. European financial management, 9(1):11–24, 2003.
- Multi-agent generative adversarial imitation learning. Advances in neural information processing systems, 31, 2018.
- Trafficsim: Learning to simulate realistic multi-agent behaviors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10400–10409, 2021.
- Reinforcement learning: An introduction. MIT press, 2018.
- Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
- Information theoretic measures for clusterings comparison: is a correction for chance necessary? In Proceedings of the 26th annual international conference on machine learning, pages 1073–1080, 2009.
- Get real: Realism metrics for robust limit order book market simulations. In Proceedings of the First ACM International Conference on AI in Finance, pages 1–8, 2020.
- Welfare effects of market making in continuous double auctions. Journal of Artificial Intelligence Research, 59:613–650, 2017.
- Spoofing the limit order book: A strategic agent-based analysis. Games, 12(2), 2021.
- Identification of animal behavioral strategies by inverse reinforcement learning. PLoS computational biology, 14(5):e1006122, 2018.
- Behavior based learning in identifying high frequency trading strategies. In 2012 IEEE Conference on Computational Intelligence for Financial Engineering & Economics (CIFEr), pages 1–8. IEEE, 2012.
- Towards k-means-friendly spaces: Simultaneous deep learning and clustering. In international conference on machine learning, pages 3861–3870. PMLR, 2017.