Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-Modal Financial Time-Series Retrieval Through Latent Space Projections (2309.16741v2)

Published 28 Sep 2023 in cs.LG, cs.AI, and cs.HC

Abstract: Financial firms commonly process and store billions of time-series data, generated continuously and at a high frequency. To support efficient data storage and retrieval, specialized time-series databases and systems have emerged. These databases support indexing and querying of time-series by a constrained Structured Query Language(SQL)-like format to enable queries like "Stocks with monthly price returns greater than 5%", and expressed in rigid formats. However, such queries do not capture the intrinsic complexity of high dimensional time-series data, which can often be better described by images or language (e.g., "A stock in low volatility regime"). Moreover, the required storage, computational time, and retrieval complexity to search in the time-series space are often non-trivial. In this paper, we propose and demonstrate a framework to store multi-modal data for financial time-series in a lower-dimensional latent space using deep encoders, such that the latent space projections capture not only the time series trends but also other desirable information or properties of the financial time-series data (such as price volatility). Moreover, our approach allows user-friendly query interfaces, enabling natural language text or sketches of time-series, for which we have developed intuitive interfaces. We demonstrate the advantages of our method in terms of computational efficiency and accuracy on real historical data as well as synthetic data, and highlight the utility of latent-space projections in the storage and retrieval of financial time-series data with intuitive query modalities.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Querying Shapes of Histories. In Proceedings of the 21th International Conference on Very Large Data Bases (VLDB ’95). 502–514.
  2. Time-series retrieval of soil moisture using CYGNSS. IEEE Transactions on Geoscience and Remote Sensing 57, 7 (2019), 4322–4331.
  3. The TS-tree: efficient time series search and retrieval. In Proceedings of the 11th international conference on Extending database technology: Advances in database technology.
  4. Benchmarking Specialized Databases for High-frequency Data. arXiv preprint arXiv:2301.12561 (2023).
  5. Optimizing data analysis with a semi-structured time series database. In Workshop on Managing Systems via Log Analysis and Machine Learning Techniques.
  6. Trades, quotes and prices: financial markets under the microscope. Cambridge University Press.
  7. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Lisbon, Portugal, 632–642. https://doi.org/10.18653/v1/D15-1075
  8. Signature Verification Using a ”Siamese” Time Delay Neural Network. In NeurIPS (NIPS’93). 737–744.
  9. David Byrd. 2019. Explaining Agent-Based Financial Market Simulation. arXiv:1909.11650 [cs.MA]
  10. ABIDES: Towards high-fidelity multi-agent market simulation. In Proceedings of the 2020 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation. 11–22.
  11. On the Constrained Time-Series Generation Problem. arXiv preprint arXiv:2307.01717 (2023).
  12. Learning to simulate realistic limit order book markets from data as a World Agent. In Proceedings of the Third ACM International Conference on AI in Finance. 428–436.
  13. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs.CL]
  14. Deep learning.
  15. Ts-benchmark: A benchmark for time series databases. In 2021 IEEE ICDE. IEEE, 588–599.
  16. Deep Residual Learning for Image Recognition. arXiv:1512.03385 [cs.CV]
  17. Putting the Human in the Time Series Analytics Loop. Companion Proceedings of The 2019 World Wide Web Conference (2019). https://api.semanticscholar.org/CorpusID:153314304
  18. Timescale Inc. 2022. Time-series data simplified — Timescale. https://www.timescale.com/
  19. InfluxData. 2022. influxdb: open source time series database. https://www.influxdata.com/products/influxdb-overview/
  20. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data 7, 3 (2019), 535–547.
  21. Finding surprising patterns in a time series database in linear time and space. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. 550–556.
  22. KX. 2022. Developing with kdb+ and the q language - Kdb+ and q documentation. https://code.kx.com/q/
  23. High-performance time-series quantitative retrieval from satellite images on a GPU cluster. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 12, 8 (2019), 2810–2821.
  24. Miro Mannino and Azza Abouzied. 2018. Expressive time series querying with hand-drawn scale-free sketches. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1–13.
  25. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018).
  26. Efficient Estimation of Word Representations in Vector Space. In International Conference on Learning Representations.
  27. Bernt Oksendal. 1998. Stochastic Differential Equations, , An Introduction with Applications. Springer.
  28. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning, Vol. 139. PMLR, 8748–8763.
  29. Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. arXiv:1908.10084 [cs.CL]
  30. ImageNet Large Scale Visual Recognition Challenge. arXiv:1409.0575 [cs.CV]
  31. FaceNet: A unified embedding for face recognition and clustering. 2015 IEEE CVPR (2015), 815–823.
  32. Performance Study of Time Series Databases. arXiv preprint arXiv:2208.13982 (2022).
  33. Deep r-th root of rank supervised joint binary embedding for multivariate time series retrieval. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2229–2238.
  34. Deepstyle: Multimodal search engine for fashion and interior design. IEEE Access 7 (2019), 84613–84628.
  35. Get real: Realism metrics for robust limit order book market simulations. In Proceedings of the First ACM International Conference on AI in Finance. 1–8.
  36. Welfare Effects of Market Making in Continuous Double Auctions. (2017).
  37. Effective deep learning-based multi-modal retrieval. The VLDB Journal 25 (2016), 79–101.
  38. Martin Wattenberg. 2001. Sketching a graph to query a time-series database. In CHI’01 Extended Abstracts on Human factors in Computing Systems. 381–382.
  39. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 1112–1122. https://doi.org/10.18653/v1/N18-1101
  40. Philip Winston. 2022. Time-Series Databases and Amazon Timestream. IEEE Software 39, 03 (2022), 126–128.
  41. EdgeDB: An efficient time-series database for edge computing. IEEE Access 7 (2019), 142295–142307.
  42. Time-series generative adversarial networks. Advances in neural information processing systems 32 (2019).
  43. Deep unsupervised binary coding networks for multivariate time series retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 1403–1411.
Citations (2)

Summary

We haven't generated a summary for this paper yet.