Dynamic Knowledge Injection for AIXI Agents (2312.16184v1)
Abstract: Prior approximations of AIXI, a Bayesian optimality notion for general reinforcement learning, can only approximate AIXI's Bayesian environment model using an a-priori defined set of models. This is a fundamental source of epistemic uncertainty for the agent in settings where the existence of systematic bias in the predefined model class cannot be resolved by simply collecting more data from the environment. We address this issue in the context of Human-AI teaming by considering a setup where additional knowledge for the agent in the form of new candidate models arrives from a human operator in an online fashion. We introduce a new agent called DynamicHedgeAIXI that maintains an exact Bayesian mixture over dynamically changing sets of models via a time-adaptive prior constructed from a variant of the Hedge algorithm. The DynamicHedgeAIXI agent is the richest direct approximation of AIXI known to date and comes with good performance guarantees. Experimental results on epidemic control on contact networks validates the agent's practical utility.
- Anderson, R. M. 2013. The population dynamics of infectious diseases: theory and applications. Springer.
- COVID-19 Pandemic Cyclic Lockdown Optimization Using Reinforcement Learning. CoRR, abs/2009.04647.
- Bayesian Learning of Recursively Factored Environments. In Proceedings of the 30th International Conference on Machine Learning, 1211–1219.
- Skip Context Tree Switching. In Proceedings of the 31th International Conference on Machine Learning, 1458–1466.
- Optimization of resource-constrained policies for COVID-19 testing and quarantining. Journal of Communications and Networks, 23(5): 326–339.
- Mathematical Models in Population Biology and Epidemiology. Springer.
- Prediction, learning, and games. Cambridge university press.
- COVID-19 pandemic control: balancing detection policy and lockdown intervention under ICU sustainability.
- Prediction with Expert Evaluators’ Advice. In Gavaldà, R.; Lugosi, G.; Zeugmann, T.; and Zilles, S., eds., Algorithmic Learning Theory, 8–22. Springer Berlin Heidelberg. ISBN 978-3-642-04414-4.
- EpidemiOptim: A Toolbox for the Optimization of Control Policies in Epidemiological Models.
- Learning Higher-Order Programs through Predicate Invention. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, 13655–13658. AAAI Press.
- Dietterich, T. G. 2000. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition. J. Artif. Intell. Res., 13: 227–303.
- Mismatched No More: Joint Model-Policy Optimization for Model-Based RL. arXiv:2110.02758.
- Universal reinforcement learning. IEEE Trans. Inf. Theory, 56(5): 2441–2454.
- Farmer, W. 2008. The Seven Virtues of Simple Type Theory. Journal of Applied Logic, 6(3): 267–286.
- Using and Combining Predictors That Specialize. In Proceedings of the 29th Annual ACM Symposium on Theory of Computing, 334–343. ACM. ISBN 0897918886.
- Self-similar community structure in a network of human interactions. Phys. Rev. E, 68: 065103.
- Sample-Based Tree Search with Fixed and Adaptive State Abstractions. J. Artif. Int. Res., 60(1): 717–777.
- Hutter, M. 2005. Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Berlin: Springer. ISBN 3-540-22139-5.
- Hutter, M. 2009. Feature Reinforcement Learning: Part I. Unstructured MDPs. In J. Artif. Gen. Intell.
- Jordan, M. I. 1995. Why the logistic function? A tutorial discussion on probabilities and neural networks.
- Bandit Based Monte-Carlo Planning. In Fürnkranz, J.; Scheffer, T.; and Spiliopoulou, M., eds., ECML 2006, 282–293. Springer Berlin Heidelberg. ISBN 978-3-540-46056-5.
- Statistical predicate invention. In Ghahramani, Z., ed., Machine Learning, Proceedings of the Twenty-Fourth International Conference, 433–440. ACM.
- The Performance of Universal Encoding. IEEE Trans. Inf. Theor., 27(2): 199–207.
- Objective Mismatch in Model-based Reinforcement Learning. arXiv:2002.04523.
- Lloyd, J. W. 2003. Logic for Learning: Learning Comprehensible Theories from Structured Data. Springer.
- Declarative programming for agent applications. Autonomous Agents Multi Agent Systems, 23(2): 224–272.
- McCallum, A. K. 1996. Reinforcement learning with selective perception and hidden state. University of Rochester.
- Efficient tracking of a growing number of experts. In International Conference on Algorithmic Learning Theory, 517–539.
- Newman, M. 2018. Networks. Oxford University Press.
- Analysis and control of epidemics: A survey of spreading processes on complex networks. IEEE Control Systems Magazine, 36(1): 26–46.
- Epidemic processes in complex networks. Reviews of Modern Physics, 87(3).
- The Network Data Repository with Interactive Graph Analytics and Visualization. In AAAI.
- Solomonoff, R. J. 1997. The Discovery of Algorithmic Probability. J. Comput. Syst. Sci., 55(1): 73–88.
- Context Tree Switching. In Storer, J. A.; and Marcellin, M. W., eds., 2012 Data Compression Conference, 327–336. IEEE Computer Society.
- A Monte-Carlo AIXI Approximation. J. Artif. Intell. Res., 40: 95–142.
- Partition Tree Weighting. In Bilgin, A.; Marcellin, M. W.; Serra-Sagristà, J.; and Storer, J. A., eds., 2013 Data Compression Conference, 321–330. IEEE.
- Vovk, V. 1998. A Game of Prediction with Expert Advice. Journal of Computer and System Sciences, 56(2): 153–173.
- The context-tree weighting method: basic properties. IEEE Transactions on Information Theory, 41(3): 653–664.
- A Direct Approximation of AIXI Using Logical State Abstractions. In Oh, A. H.; Agarwal, A.; Belgrave, D.; and Cho, K., eds., Advances in Neural Information Processing Systems.