Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
164 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Stochastic Online Instrumental Variable Regression: Regrets for Endogeneity and Bandit Feedback (2302.09357v3)

Published 18 Feb 2023 in cs.LG and stat.ML

Abstract: Endogeneity, i.e. the dependence of noise and covariates, is a common phenomenon in real data due to omitted variables, strategic behaviours, measurement errors etc. In contrast, the existing analyses of stochastic online linear regression with unbounded noise and linear bandits depend heavily on exogeneity, i.e. the independence of noise and covariates. Motivated by this gap, we study the over- and just-identified Instrumental Variable (IV) regression, specifically Two-Stage Least Squares, for stochastic online learning, and propose to use an online variant of Two-Stage Least Squares, namely O2SLS. We show that O2SLS achieves $\mathcal O(d_{x}d_{z}\log2 T)$ identification and $\widetilde{\mathcal O}(\gamma \sqrt{d_{z} T})$ oracle regret after $T$ interactions, where $d_{x}$ and $d_{z}$ are the dimensions of covariates and IVs, and $\gamma$ is the bias due to endogeneity. For $\gamma=0$, i.e. under exogeneity, O2SLS exhibits $\mathcal O(d_{x}2 \log2 T)$ oracle regret, which is of the same order as that of the stochastic online ridge. Then, we leverage O2SLS as an oracle to design OFUL-IV, a stochastic linear bandit algorithm to tackle endogeneity. OFUL-IV yields $\widetilde{\mathcal O}(\sqrt{d_{x}d_{z}T})$ regret that matches the regret lower bound under exogeneity. For different datasets with endogeneity, we experimentally show efficiencies of O2SLS and OFUL-IV.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. Improved algorithms for linear stochastic bandits. Advances in neural information processing systems, 24, 2011a.
  2. Online least squares estimation with self-normalized processes: An application to bandit problems. arXiv preprint arXiv:1102.2670, 2011b.
  3. Online-to-confidence-set conversions and application to sparse stochastic bandits. In Artificial Intelligence and Statistics, pages 1–9. PMLR, 2012.
  4. Linear thompson sampling revisited. In AISTATS, 2017.
  5. Two-stage least squares estimation of average causal effects in models with variable treatment intensity. Journal of the American statistical Association, 90(430):431–442, 1995.
  6. Identification of causal effects using instrumental variables. Journal of the American statistical Association, 91(434):444–455, 1996.
  7. Minimax fixed-design linear regression. In Conference on Learning Theory, pages 226–239. PMLR, 2015.
  8. Deep generalized method of moments for instrumental variable analysis. Advances in neural information processing systems, 32, 2019.
  9. Glenn W. Brier. Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78:1–3, 1950.
  10. A review of instrumental variable estimators for mendelian randomization. Statistical methods in medical research, 26(5):2333–2355, 2017.
  11. Prediction, learning, and games. Cambridge university press, 2006.
  12. Dean P Foster. Prediction in the worst case. The Annals of Statistics, pages 1084–1090, 1991.
  13. Beyond UCB: Optimal and efficient contextual bandits with regression oracles. In International Conference on Machine Learning, pages 3199–3210. PMLR, 2020.
  14. Uniform regret bounds over ℝdsuperscriptℝ𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT for the sequential linear regression problem with the square loss. In Algorithmic Learning Theory, pages 404–432. PMLR, 2019.
  15. William H Greene. Econometric analysis. Pearson Education India, 2003.
  16. Strategic instrumental variable regression: Recovering causal relationships from strategic responses. In International Conference on Machine Learning, pages 8502–8522. PMLR, 2022.
  17. Counterfactual prediction with deep instrumental variables networks. arXiv preprint arXiv:1612.09596, 2016.
  18. Linear regression with limited observation. In 29th International Conference on Machine Learning, ICML 2012, pages 807–814, 2012.
  19. Econometric methods with applications in business and economics. Oxford University Press, 2004.
  20. MA Hernan and J Robins. Causal Inference: What if. Boca Raton: Chapman & Hill/CRC, 2020.
  21. Nathan Kallus. Instrument-armed bandits. In Algorithmic Learning Theory, pages 529–546. PMLR, 2018.
  22. Best arm identification in generalized linear bandits. Operations Research Letters, 49(3):365–371, 2021.
  23. Online learning with kernels. IEEE transactions on signal processing, 52(8):2165–2176, 2004.
  24. Semiparametric contextual bandits. In International Conference on Machine Learning, pages 2776–2785. PMLR, 2018.
  25. Bandit algorithms. Cambridge University Press, 2020.
  26. Self-fulfilling bandits: Dynamic selection in algorithmic decision-making. arXiv preprint arXiv:2108.12547, 2021.
  27. On-line learning of linear functions. In Proceedings of the twenty-third annual ACM symposium on Theory of computing, pages 465–475, 1991.
  28. On deep instrumental variables estimate. arXiv preprint arXiv:2004.14954, 2020.
  29. Odalric-Ambrym Maillard. Self-normalization techniques for streaming confident regression, 2016.
  30. The causal interpretation of two-stage least squares with multiple instrumental variables. American Economic Review, 111(11):3663–98, 2021.
  31. Deep partial least squares for iv regression. arXiv preprint arXiv:2207.02612, 2022.
  32. Instrumental variable estimation of nonparametric models. Econometrica, 71(5):1565–1578, 2003.
  33. Francesco Orabona. A modern introduction to online learning. arXiv preprint arXiv:1912.13213, 2019.
  34. Stochastic online linear regression: the forward algorithm to replace ridge. arXiv preprint arXiv:2111.01602, 2021.
  35. Bilinear exponential family of mdps: Frequentist regret bound with tractable exploration and planning. arXiv preprint arXiv:2210.02087, 2022.
  36. Leveraging good representations in linear contextual bandits. In International Conference on Machine Learning, pages 8371–8380. PMLR, 2021.
  37. Donald B Rubin. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of educational Psychology, 66(5):688, 1974.
  38. Weighted linear bandits for non-stationary environments. Advances in Neural Information Processing Systems, 32, 2019.
  39. Understanding machine learning: From theory to algorithms. Cambridge university press, 2014.
  40. Machine learning instrument variables for causal inference. In Proceedings of the 21st ACM Conference on Economics and Computation, pages 835–836, 2020.
  41. Thompson sampling for noncompliant bandits. arXiv preprint arXiv:1812.00856, 2018.
  42. From ads to interventions: Contextual bandits in mobile health. Mobile health: sensors, analytic methods, and applications, pages 495–517, 2017.
  43. Scalable representation learning in linear contextual bandits with constant regret guarantees. Advances in Neural Information Processing Systems, 35:2307–2319, 2022.
  44. Online instrumental variable regression with applications to online linear system identification. In Proceedings of the AAAI Conference on Artificial Intelligence, 2016.
  45. Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.
  46. Volodya Vovk. Competitive on-line linear regression. Advances in Neural Information Processing Systems, 10, 1997.
  47. Volodya Vovk. Competitive on-line statistics. International Statistical Review, 69(2):213–248, 2001.
  48. Martin J Wainwright. High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge University Press, 2019.
  49. Abraham Wald. The fitting of straight lines if both variables are subject to error. The annals of mathematical statistics, 11(3):284–300, 1940.
  50. Larry Wasserman. All of statistics: a concise course in statistical inference, volume 26. Springer, 2004.
  51. Philip G Wright. Tariff on animal and vegetable oils. Macmillan Company, New York, 1928.
  52. Learning deep features in instrumental variable regression. arXiv preprint arXiv:2010.07154, 2020.
  53. Deep proxy causal learning and its application to confounded bandit policy evaluation. arXiv preprint arXiv:2106.03907, 2021.
  54. Causal inference with treatment measurement error: a nonparametric instrumental variable approach. In Uncertainty in Artificial Intelligence, pages 2414–2424. PMLR, 2022.
Citations (4)

Summary

We haven't generated a summary for this paper yet.