Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
131 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ProbSAINT: Probabilistic Tabular Regression for Used Car Pricing (2403.03812v1)

Published 6 Mar 2024 in cs.LG and cs.AI

Abstract: Used car pricing is a critical aspect of the automotive industry, influenced by many economic factors and market dynamics. With the recent surge in online marketplaces and increased demand for used cars, accurate pricing would benefit both buyers and sellers by ensuring fair transactions. However, the transition towards automated pricing algorithms using machine learning necessitates the comprehension of model uncertainties, specifically the ability to flag predictions that the model is unsure about. Although recent literature proposes the use of boosting algorithms or nearest neighbor-based approaches for swift and precise price predictions, encapsulating model uncertainties with such algorithms presents a complex challenge. We introduce ProbSAINT, a model that offers a principled approach for uncertainty quantification of its price predictions, along with accurate point predictions that are comparable to state-of-the-art boosting techniques. Furthermore, acknowledging that the business prefers pricing used cars based on the number of days the vehicle was listed for sale, we show how ProbSAINT can be used as a dynamic forecasting model for predicting price probabilities for different expected offer duration. Our experiments further indicate that ProbSAINT is especially accurate on instances where it is highly certain. This proves the applicability of its probabilistic predictions in real-world scenarios where trustworthiness is crucial.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. Noise Flow: Noise Modeling With Conditional Normalizing Flows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).
  2. Optuna: A Next-generation Hyperparameter Optimization Framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
  3. Illustrative discussion of mc-dropout in general dataset: uncertainty estimation in bitcoin. Neural Processing Letters 53 (2021), 1001–1011.
  4. Machine learning methods for demand estimation. American Economic Review 105, 5 (2015), 481–485.
  5. Deep neural networks and tabular data: A survey. IEEE Transactions on Neural Networks and Learning Systems (2022).
  6. Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016). https://api.semanticscholar.org/CorpusID:4650265
  7. Used Car Price Prediction Based on the Iterative Framework of XGBoost+LightGBM. Electronics 11, 18 (2022). https://doi.org/10.3390/electronics11182932
  8. Ngboost: Natural gradient boosting for probabilistic prediction. In International conference on machine learning. PMLR, 2690–2700.
  9. On the expressiveness of approximate inference in bayesian neural networks. Advances in Neural Information Processing Systems 33 (2020), 15897–15908.
  10. Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning. PMLR, 1050–1059.
  11. Revisiting deep learning models for tabular data. Advances in Neural Information Processing Systems 34 (2021), 18932–18943.
  12. Why do tree-based models still outperform deep learning on typical tabular data?. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022. http://papers.nips.cc/paper_files/paper/2022/hash/0378c7692da36807bdec87ab043cdadc-Abstract-Datasets_and_Benchmarks.html
  13. DeepFM: A Factorization-Machine based Neural Network for CTR Prediction. IJCAI abs/1703.04247 (2017). https://api.semanticscholar.org/CorpusID:970388
  14. Tabtransformer: Tabular data modeling using contextual embeddings. arXiv preprint arXiv:2012.06678 (2020).
  15. Mordor Intelligence. 2022. Used Cars Market in Germany Size and Share Analysis. https://www.mordorintelligence.com/industry-reports/germany-used-car-market. Accessed: 2024-05-01.
  16. Pricing Used Vehicles at Volkswagen Financial Services AG. In 2023 IEEE International Conference on Big Data (BigData). IEEE Computer Society, 1736–1743.
  17. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Neural Information Processing Systems. https://api.semanticscholar.org/CorpusID:3815895
  18. Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in neural information processing systems 30 (2017).
  19. Uncertainty in gradient boosting via ensembles. ICLR (2021).
  20. John Ashworth Nelder and Robert WM Wedderburn. 1972. Generalized linear models. Journal of the Royal Statistical Society Series A: Statistics in Society 135, 3 (1972), 370–384.
  21. Normalizing flows for probabilistic modeling and inference. The Journal of Machine Learning Research 22, 1 (2021), 2617–2680.
  22. Scikit-learn: Machine learning in Python. Journal of machine learning research 12, Oct (2011), 2825–2830.
  23. CatBoost: unbiased boosting with categorical features. Advances in neural information processing systems 31 (2018).
  24. Multivariate probabilistic time series forecasting via conditioned normalizing flows. arXiv preprint arXiv:2002.06103 (2020).
  25. K Samruddhi and R Ashok Kumar. 2020. Used car price prediction using k-nearest neighbor based model. Int. J. Innov. Res. Appl. Sci. Eng.(IJIRASE) 4 (2020), 629–632.
  26. Ravid Shwartz-Ziv and Amitai Armon. 2022. Tabular data: Deep learning is not all you need. Information Fusion 81 (2022), 84–90.
  27. Saint: Improved neural networks for tabular data via row attention and contrastive pre-training. arXiv preprint arXiv:2106.01342 (2021).
  28. Probabilistic gradient boosting machines for large-scale probabilistic regression. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining. 1510–1520.
  29. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15 (2014), 1929–1958. https://api.semanticscholar.org/CorpusID:6844431
  30. Dieudonné Tchuente and Serge Nyawa. 2022. Real estate price estimation in French cities using geocoding and machine learning. Annals of Operations Research (2022), 1–38.
  31. Attention is all you need. Advances in neural information processing systems 30 (2017).
  32. Pattabiraman Venkatasubbu and Mukkesh Ganesh. 2019. Used cars price prediction using supervised learning techniques. Int. J. Eng. Adv. Technol.(IJEAT) 9, 1S3 (2019).
  33. Changchun Wang and Hui Wu. 2018. A new machine learning approach to house price estimation. New Trends in Mathematical Sciences 6, 4 (2018).

Summary

We haven't generated a summary for this paper yet.