Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Automatic Outlier Rectification via Optimal Transport (2403.14067v2)

Published 21 Mar 2024 in stat.ML, cs.LG, math.OC, and stat.ME

Abstract: In this paper, we propose a novel conceptual framework to detect outliers using optimal transport with a concave cost function. Conventional outlier detection approaches typically use a two-stage procedure: first, outliers are detected and removed, and then estimation is performed on the cleaned data. However, this approach does not inform outlier removal with the estimation task, leaving room for improvement. To address this limitation, we propose an automatic outlier rectification mechanism that integrates rectification and estimation within a joint optimization framework. We take the first step to utilize the optimal transport distance with a concave cost function to construct a rectification set in the space of probability distributions. Then, we select the best distribution within the rectification set to perform the estimation task. Notably, the concave cost function we introduced in this paper is the key to making our estimator effectively identify the outlier during the optimization process. We demonstrate the effectiveness of our approach over conventional approaches in simulations and empirical analyses for mean estimation, least absolute regression, and the fitting of option implied volatility surfaces.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. Nonparametric estimation of state-price densities implicit in financial asset prices. The Journal of Finance, 53(2):499–547, 1998.
  2. Data-driven stochastic programming using phi-divergences. In The operations research revolution, pp.  1–19. INFORMS, 2015.
  3. Robust solutions of optimization problems affected by uncertain probabilities. Management Science, 59(2):341–357, 2013.
  4. Distributionally robust groupwise regularization estimator. In Asian Conference on Machine Learning, pp.  97–112. PMLR, 2017.
  5. Quantifying distributional model risk via optimal transport. Mathematics of Operations Research, 44(2):565–600, 2019.
  6. Distributionally robust optimization and robust statistics. arXiv preprint arXiv:2401.14655, 2024.
  7. George EP Box. Non-normality and tests on variances. Biometrika, 40(3/4):318–335, 1953.
  8. Deep local volatility. Risks, 8(3):82, 2020.
  9. Rockafellian relaxation in optimization under uncertainty: Asymptotically exact formulations. arXiv preprint arXiv:2204.04762, 2022.
  10. Frank H Clarke. Optimization and nonsmooth analysis. SIAM, 1990.
  11. Dynamics of implied volatility surfaces. Quantitative finance, 2(1):45, 2002.
  12. Stéphane Crépey. Calibration of the local volatility in a trinomial tree using tikhonov regularization. Inverse Problems, 19(1):91, 2002.
  13. Stéphane Crépey. Delta-hedging vega risk? Quantitative Finance, 4(5):559–579, 2004.
  14. Volatility surfaces: theory, rules of thumb, and empirical evidence. Quantitative Finance, 7(5):507–524, 2007.
  15. Stochastic subgradient method converges on tame functions. Foundations of computational mathematics, 20(1):119–154, 2020.
  16. Distributionally robust optimization under moment uncertainty with application to data-driven problems. Operations research, 58(3):595–612, 2010.
  17. The automatic robustness of minimum distance functionals. The Annals of Statistics, 16(2):552–586, 1988a.
  18. Pathologies of some minimum distance estimators. The Annals of Statistics, pp.  587–608, 1988b.
  19. Bruno Dupire et al. Pricing with a smile. Risk, 7(1):18–20, 1994.
  20. The local volatility surface: Unlocking the information in index option prices. Financial Analysts Journal, 52(4):25–36, 1996.
  21. Distributionally robust stochastic optimization with wasserstein distance. Mathematics of Operations Research, 2022.
  22. Jim Gatheral. The Volatility Surface: A Practitioner’s Guide. John Wiley & Sons, 2011.
  23. Joseph Diez Gergonne. Dissertation sur la recherche du milieu le plus probable. Etc. Annales math. pures appl, 12(6):181–204, 1821.
  24. Frank R Hampel. Contributions to the theory of robust estimation. University of California, Berkeley, 1968.
  25. Frank R Hampel. A general qualitative definition of robustness. The annals of mathematical statistics, 42(6):1887–1896, 1971.
  26. Bruce Hansen. Econometrics, chapter 19: Nonparametric Regression. Princeton University Press, 2022.
  27. Reweighted ls estimators converge at the same rate as the initial estimator. The Annals of Statistics, pp.  2161–2167, 1992.
  28. Performance of some resistant rules for outlier labeling. Journal of the American Statistical Association, 81(396):991–999, 1986.
  29. Peter J Huber. Robust estimation of a location parameter. Ann Math Stat, 35:73–101, 1964.
  30. 27.3: The IVF Model, pp.  659–660. Pearson, 11 edition, 2022.
  31. Robust test statistics based on restricted minimum rényi’s pseudodistance estimators. Entropy, 24(5):616, 2022.
  32. Distributionally favorable optimization: A framework for data-driven decision-making with endogenous outliers. SIAM Journal on Optimization, 34(1):419–458, 2024.
  33. Implied volatility surface. Encyclopedia of quantitative finance, 2010.
  34. Wasserstein distributionally robust optimization: Theory and applications in machine learning. In Operations research & management science in the age of analytics, pp.  130–166. Informs, 2019.
  35. Understanding notions of stationarity in nonsmooth optimization: A guided tour of various constructions of subdifferential for nonsmooth functions. IEEE Signal Processing Magazine, 37(5):18–31, 2020.
  36. Tikhonov regularization is optimal transport robust under martingale constraints. In Advances in Neural Information Processing Systems, 2022.
  37. Stewart Mayhew. Implied volatility. Financial Analysts Journal, 51(4):8–20, 1995.
  38. P Warwick Millar. Robust estimation via minimum distance methods. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 55(1):73–89, 1981.
  39. Data-driven distributionally robust optimization using the wasserstein metric: Performance guarantees and tractable reformulations. Mathematical Programming, 171(1):115–166, 2018.
  40. Robust minimum distance inference based on combined distances. Communications in Statistics-Simulation and Computation, 24(3):653–673, 1995.
  41. Minimum distance and robust estimation. Journal of the American Statistical Association, 75(371):616–624, 1980.
  42. Benjamin Peirce. Criterion for the rejection of doubtful observations. The Astronomical Journal, 2:161–163, 1852.
  43. Computational optimal transport: With applications to data science. Foundations and Trends® in Machine Learning, 11(5-6):355–607, 2019.
  44. R Tyrrell Rockafellar and Roger J-B Wets. Variational Analysis, volume 317. Springer Science & Business Media, 2009.
  45. Robust regression and outlier detection. Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. John Wiley & Sons, Inc., 1987.
  46. Mark Rubinstein. Implied binomial trees. The Journal of Finance, 49(3):771–818, 1994.
  47. Distributionally robust logistic regression. In Advances in Neural Information Processing Systems 28, 2015.
  48. Regularization via mass transportation. Journal of Machine Learning Research, 20(103):1–68, 2019.
  49. Certifying some distributional robustness with principled adversarial training. In International Conference on Learning Representations, 2018.
  50. Robin Thompson. A note on restricted maximum likelihood estimation with an alternative outlier model. Journal of the Royal Statistical Society: Series B (Methodological), 47(1):53–55, 1985.
  51. John W Tukey. Exploratory Data Analysis. Addison-Wesley, 1977.
  52. John Wilder Tukey. A survey of sampling from contaminated distributions. Contributions to probability and statistics, pp.  448–485, 1960.
  53. Cédric Villani. Optimal Transport: Old and New, volume 338. Springer, 2009.
  54. Distributionally robust convex optimization. Operations Research, 62(6):1358–1376, 2014.
  55. On linear optimization over wasserstein balls. Mathematical Programming, 195(1-2):1107–1122, 2022.
  56. A simple and general duality proof for wasserstein distributionally robust optimization. arXiv preprint arXiv:2205.00362, 2022.
  57. Generalized resilience and robust statistics. The Annals of Statistics, 50(4):2256–2283, 2022.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com