Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

No-Regret Learning in Bilateral Trade via Global Budget Balance (2310.12370v2)

Published 18 Oct 2023 in cs.GT and cs.LG

Abstract: Bilateral trade models the problem of intermediating between two rational agents -- a seller and a buyer -- both characterized by a private valuation for an item they want to trade. We study the online learning version of the problem, in which at each time step a new seller and buyer arrive and the learner has to set prices for them without any knowledge about their (adversarially generated) valuations. In this setting, known impossibility results rule out the existence of no-regret algorithms when budget balanced has to be enforced at each time step. In this paper, we introduce the notion of \emph{global budget balance}, which only requires the learner to fulfill budget balance over the entire time horizon. Under this natural relaxation, we provide the first no-regret algorithms for adversarial bilateral trade under various feedback models. First, we show that in the full-feedback model, the learner can guarantee $\tilde O(\sqrt{T})$ regret against the best fixed prices in hindsight, and that this bound is optimal up to poly-logarithmic terms. Second, we provide a learning algorithm guaranteeing a $\tilde O(T{3/4})$ regret upper bound with one-bit feedback, which we complement with a $\Omega(T{5/7})$ lower bound that holds even in the two-bit feedback model. Finally, we introduce and analyze an alternative benchmark that is provably stronger than the best fixed prices in hindsight and is inspired by the literature on bandits with knapsacks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (59)
  1. Bandits with global convex constraints and objective. Oper. Res., 67(5):1486–1502, 2019. doi: 10.1287/opre.2019.1840.
  2. Nonstochastic multi-armed bandits with graph-structured feedback. SIAM J. Comput., 46(6):1785–1826, 2017. doi: 10.1137/140989455.
  3. The nonstochastic multiarmed bandit problem. SIAM J. Comput., 32(1):48–77, 2002. doi: 10.1137/S0097539701398375.
  4. An a⁢l⁢p⁢h⁢a𝑎𝑙𝑝ℎ𝑎alphaitalic_a italic_l italic_p italic_h italic_a-regret analysis of adversarial bilateral trade. In NeurIPS, 2022.
  5. Bandits with knapsacks. Journal of the ACM, 65(3):1–55, 2018. doi: 10.1145/3164539.
  6. Learning in repeated auctions with budgets: Regret minimization and equilibrium. Manag. Sci., 65(9):3952–3968, 2019. doi: 10.1287/mnsc.2018.3174.
  7. Partial monitoring - classification, regret bounds, and algorithms. Math. Oper. Res., 39(4):967–997, 2014. doi: 10.1287/moor.2014.0663.
  8. Optimal rates and efficient algorithms for online bayesian persuasion. In ICML, volume 202 of Proceedings of Machine Learning Research, pages 2164–2183. PMLR, 2023.
  9. Bandits with replenishable knapsacks: the best of both worlds. In ICLR, 2024a.
  10. No-regret learning in bilateral trade via global budget balance. In STOC. ACM, 2024b.
  11. Reallocation mechanisms. In EC, page 617. ACM, 2014. doi: 10.1145/2600057.2602843.
  12. Approximating gains-from-trade in bilateral trading. In WINE, volume 10123 of Lecture Notes in Computer Science, pages 400–413. Springer, 2016. doi: 10.1007/978-3-662-54110-4_28.
  13. An online learning theory of brokerage. The 23rd International Conference on Autonomous Agents and Multi-Agent Systems, 2024.
  14. Approximating gains from trade in two-sided markets via simple mechanisms. In Proceedings of the 2017 ACM Conference on Economics and Computation, pages 589–590, 2017. doi: 10.1145/3033274.3085148.
  15. Online bayesian persuasion. In NeurIPS, 2020.
  16. Online learning with knapsacks: the best of both worlds. In ICML, volume 162 of Proceedings of Machine Learning Research, pages 2767–2783. PMLR, 2022.
  17. Regret minimization in online bayesian persuasion: Handling adversarial receiver’s types under full and partial feedback models. Artif. Intell., 314:103821, 2023.
  18. Prediction, learning, and games. Cambridge university press, 2006. doi: 10.1017/CBO9780511546921.
  19. Regret minimization under partial monitoring. Math. Oper. Res., 31(3):562–580, 2006. doi: moor.1060.0206.
  20. Regret minimization for reserve prices in second-price auctions. IEEE Trans. Inf. Theory, 61(1):549–564, 2015. doi: 10.1109/TIT.2014.2365772.
  21. A regret analysis of bilateral trade. In EC, pages 289–309. ACM, 2021. doi: 10.1145/3465456.3467645.
  22. Repeated bilateral trade against a smoothed adversary. In COLT, volume 195 of Proceedings of Machine Learning Research, pages 1095–1130. PMLR, 2023.
  23. The role of transparency in repeated first-price auctions with unknown valuations. In STOC. ACM, 2024.
  24. Bilateral trade: A regret minimization perspective. Mathematics of Operations Research, 49(1):171–203, 2024. doi: 10.1287/moor.2023.1351.
  25. Learning in auctions: Regret is hard, envy is easy. Games Econ. Behav., 134:308–343, 2022.
  26. Approximately efficient bilateral trade. In STOC, pages 718–721. ACM, 2022. doi: 10.1145/3519935.3520054.
  27. Optimal no-regret learning for one-sided lipschitz functions. In ICML, volume 202 of Proceedings of Machine Learning Research, pages 8836–8850. PMLR, 2023.
  28. Efficient two-sided markets with limited information. In STOC, pages 1452–1465. ACM, 2021. doi: 10.1145/3406325.3451076.
  29. Yumou Fei. Improved approximation to first-best gains-from-trade. In International Conference on Web and Internet Economics, pages 204–218. Springer, 2022. doi: 10.1007/978-3-031-22832-2_12.
  30. Online pricing with strategic and patient buyers. Advances in Neural Information Processing Systems, 29, 2016.
  31. Fundamentals of convex analysis. Springer Science & Business Media, 2004.
  32. Adaptive contract design for crowdsourcing markets: Bandit algorithms for repeated principal-agent problems. J. Artif. Intell. Res., 55:317–359, 2016.
  33. Adversarial bandits with knapsacks. Journal of the ACM, 69(6):1–47, 2022. doi: 10.1145/3557045.
  34. Adaptive algorithms for online convex optimization with long-term constraints. In ICML, volume 48 of JMLR Workshop and Conference Proceedings, pages 402–411. JMLR.org, 2016.
  35. Playing games with approximation algorithms. SIAM J. Comput., 39(3):1088–1106, 2009. doi: 10.1137/070701704.
  36. Fixed-price approximations in bilateral trade. In SODA, pages 2964–2985. SIAM, 2022. doi: 10.1137/1.9781611977073.115.
  37. Online learning with vector costs and bandits with knapsacks. In COLT, volume 125 of Proceedings of Machine Learning Research, pages 2286–2305. PMLR, 2020.
  38. The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In FOCS, pages 594–605. IEEE Computer Society, 2003. doi: 10.1109/SFCS.2003.1238232.
  39. Non-monotonic resource utilization in the bandits with knapsacks problem. In NeurIPS, 2022.
  40. Bandit algorithms. Cambridge University Press, 2020. doi: 10.1017/9781108571401.
  41. Learning and efficiency in games with dynamic population. In SODA, pages 120–129. SIAM, 2016. doi: 10.1137/1.9781611977554.ch17.
  42. Trading regret for efficiency: online convex optimization with long term constraints. J. Mach. Learn. Res., 13:2503–2528, 2012.
  43. Online learning with sample path constraints. J. Mach. Learn. Res., 10:569–590, 2009.
  44. R Preston McAfee. The gains from trade under fixed price mechanisms. Applied economics research bulletin, 1(1):1–10, 2008.
  45. On the pseudo-dimension of nearly optimal auctions. In NIPS, pages 136–144, 2015.
  46. Efficient mechanisms for bilateral trading. Journal of economic theory, 29(2):265–281, 1983.
  47. Learning in repeated auctions. Found. Trends Mach. Learn., 15(3):176–334, 2022. doi: 10.1561/2200000077.
  48. Algorithmic Game Theory. Cambridge University Press, 2007. doi: 10.1017/CBO9780511800481.
  49. José Luis Palacios. On the simple symmetric random walk and its maximal function. The American Statistician, 62(2):138–140, 2008. doi: 10.1198/000313008X304846.
  50. Alexander Schrijver. Combinatorial optimization: polyhedra and efficiency. Springer Science & Business Media, 2003.
  51. Aleksandrs Slivkins. Introduction to multi-armed bandits. Found. Trends Mach. Learn., 12(1-2):1–286, 2019. doi: 10.1561/2200000068.
  52. Contextual bandits with packing and covering constraints: A modular lagrangian approach via regression. In COLT, volume 195 of Proceedings of Machine Learning Research, pages 4633–4656. PMLR, 2023.
  53. Safety-aware algorithms for adversarial contextual bandit. In ICML, volume 70 of Proceedings of Machine Learning Research, pages 3280–3288. PMLR, 2017.
  54. William Vickrey. Counterspeculation, auctions, and competitive sealed tenders. The Journal of finance, 16(1):8–37, 1961.
  55. Online learning in repeated auctions. In COLT, volume 49 of JMLR Workshop and Conference Proceedings, pages 1562–1583. JMLR.org, 2016.
  56. Andrew Chi-Chih Yao. Probabilistic computations: Toward a unified measure of complexity (extended abstract). In FOCS, pages 222–227. IEEE Computer Society, 1977.
  57. Online convex optimization with stochastic constraints. In NIPS, pages 1428–1438, 2017.
  58. The sample complexity of online contract design. In EC, page 1188. ACM, 2023. doi: 10.1145/3580507.3597673.
  59. Learning to persuade on the fly: Robustness against ignorance. In EC, pages 927–928. ACM, 2021. doi: 10.1145/3465456.3467593.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Martino Bernasconi (19 papers)
  2. Matteo Castiglioni (60 papers)
  3. Andrea Celli (39 papers)
  4. Federico Fusco (29 papers)
Citations (9)

Summary

We haven't generated a summary for this paper yet.