Risk-averse Learning with Non-Stationary Distributions (2404.02988v1)
Abstract: Considering non-stationary environments in online optimization enables decision-maker to effectively adapt to changes and improve its performance over time. In such cases, it is favorable to adopt a strategy that minimizes the negative impact of change to avoid potentially risky situations. In this paper, we investigate risk-averse online optimization where the distribution of the random cost changes over time. We minimize risk-averse objective function using the Conditional Value at Risk (CVaR) as risk measure. Due to the difficulty in obtaining the exact CVaR gradient, we employ a zeroth-order optimization approach that queries the cost function values multiple times at each iteration and estimates the CVaR gradient using the sampled values. To facilitate the regret analysis, we use a variation metric based on Wasserstein distance to capture time-varying distributions. Given that the distribution variation is sub-linear in the total number of episodes, we show that our designed learning algorithm achieves sub-linear dynamic regret with high probability for both convex and strongly convex functions. Moreover, theoretical results suggest that increasing the number of samples leads to a reduction in the dynamic regret bounds until the sampling number reaches a specific limit. Finally, we provide numerical experiments of dynamic pricing in a parking lot to illustrate the efficacy of the designed algorithm.
- Elad Hazan. Efficient algorithms for online convex optimization and their applications. Princeton University, 2006.
- No-regret learning in unknown games with correlated payoffs. Advances in Neural Information Processing Systems, 32(24):13624–13633, 2019.
- An online convex optimization approach to proactive network resource allocation. IEEE Transactions on Signal Processing, 65(24):6350–6364, 2017.
- No-regret learning in convex games. In Proc. of the 25th International Conference on Machine Learning, pages 360–367, 2008.
- Elad Hazan et al. Introduction to online convex optimization. Foundations and Trends in Optimization, 2(3-4):157–325, 2016.
- Decision-dependent risk minimization in geometrically decaying dynamic environments. In Proc. of the AAAI Conference on Artificial Intelligence, pages 8081–8088, 2022.
- Learning in stochastic monotone games with decision-dependent data. In International Conference on Artificial Intelligence and Statistics, pages 5891–5912, 2022.
- Performative prediction. In International Conference on Machine Learning, pages 7599–7609, 2020.
- Outside the echo chamber: Optimizing the performative risk. In International Conference on Machine Learning, pages 7710–7720, 2021.
- Non-stationary stochastic optimization. Operations Research, 63(5):1227–1244, 2015.
- Data-driven risk-averse stochastic optimization with Wasserstein metric. Operations Research Letters, 46(2):262–267, 2018.
- Bandit convex optimization in non-stationary environments. The Journal of Machine Learning Research, 22(1):5562–5606, 2021.
- Online stochastic optimization with Wasserstein based non-stationarity. arXiv preprint arXiv:2012.06961, 2020.
- Online stochastic convex optimization: Wasserstein distance variation. arXiv preprint arXiv:2006.01397, 2020.
- Value at risk. Financial Analysts Journal, 56(2):47–67, 2000.
- Conditional value-at-risk for general loss distributions. Journal of Banking & Finance, 26(7):1443–1471, 2002.
- Optimization of conditional value-at-risk. Journal of Risk, 2:21–42, 2000.
- Risk-averse stochastic convex bandit. In Proc. of the 22nd International Conference on Artificial Intelligence and Statistics, pages 39–47, 2019.
- A zeroth-order momentum method for risk-averse online convex games. In Proc. of the 61st IEEE Conference on Decision and Control, pages 5179–5184. IEEE, 2022.
- Risk-averse no-regret learning in online convex games. In International Conference on Machine Learning, pages 22999–23017, 2022.
- Toward a scalable upper bound for a CVaR-lq problem. IEEE Control Systems Letters, 6:920–925, 2021.
- Risk-aware linear quadratic control using conditional value-at-risk. IEEE Transactions on Automatic Control, 2022.
- A risk-sensitive finite-time reachability approach for safety of stochastic dynamic systems. In 2019 American Control Conference, pages 2958–2963. IEEE, 2019.
- Online stochastic optimization with time-varying distributions. IEEE Transactions on Automatic Control, 66(4):1840–1847, 2020.
- Distributionally-aware exploration for CVaR bandits. In NeurIPS 2019 Workshop on Safety and Robustness on Decision Making, 2019.
- Statistical learning with conditional value at risk. arXiv preprint arXiv:2002.05826, 2020.
- Risk-averse offline reinforcement learning. arXiv preprint arXiv:2102.05371, 2021.
- An adaptive news-driven method for CVaR-sensitive online portfolio selection in non-stationary financial markets. In Proc. of the 30th International Joint Conference on Artificial Intelligence, pages 2708–2715, 2021.
- Leonid V Kantorovich. Mathematical methods of organizing and planning production. Management Science, 6(4):366–422, 1960.
- David A Edwards. On the Kantorovich–Rubinstein theorem. Expositiones Mathematicae, 29(4):387–398, 2011.
- Joseph L Doob. The Brownian movement and stochastic equations. Annals of Mathematics, pages 351–369, 1942.
- Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. The Annals of Mathematical Statistics, pages 642–669, 1956.
- Online convex optimization in the bandit setting: gradient descent without a gradient. arXiv preprint cs/0408007, 2004.
- Varying confidence levels for CVaR risk measures and minimax limits. Mathematical Programming, 180:327–370, 2020.
- Convex optimization. Cambridge University Press, 2004.