Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Shotit: compute-efficient image-to-video search engine for the cloud (2404.12169v1)

Published 18 Apr 2024 in cs.MM and cs.IR

Abstract: With the rapid growth of information technology, users are exposed to a massive amount of data online, including image, music, and video. This has led to strong needs to provide effective corresponsive search services such as image, music, and video search services. Most of them are operated based on keywords, namely using keywords to find related image, music, and video. Additionally, there are image-to-image search services that enable users to find similar images using one input image. Given that videos are essentially composed of image frames, then similar videos can be searched by one input image or screenshot. We want to target this scenario and provide an efficient method and implementation in this paper. We present Shotit, a cloud-native image-to-video search engine that tailors this search scenario in a compute-efficient approach. One main limitation faced in this scenario is the scale of its dataset. A typical image-to-image search engine only handles one-to-one relationships, colloquially, one image corresponds to another single image. But image-to-video proliferates. Take a 24-min length video as an example, it will generate roughly 20,000 image frames. As the number of videos grows, the scale of the dataset explodes exponentially. In this case, a compute-efficient approach ought to be considered, and the system design should cater to the cloud-native trend. Choosing an emerging technology - vector database as its backbone, Shotit fits these two metrics performantly. Experiments for two different datasets, a 50 thousand-scale Blender Open Movie dataset, and a 50 million-scale proprietary TV genre dataset at a 4 Core 32GB RAM Intel Xeon Gold 6271C cloud machine with object storage reveal the effectiveness of Shotit. A demo regarding the Blender Open Movie dataset is illustrated within this paper.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (62)
  1. 1985. IEEE standard for binary floating-point arithmetic. Institute of Electrical and Electronics Engineers, New York. Note: Standard 754–1985.
  2. 2023. Apache Hadoop. Retrieved June 12, 2023 from https://hadoop.apache.org/
  3. 2023. Apache Lucene. Retrieved June 9, 2023 from https://lucene.apache.org/
  4. 2023a. Apache Solr. Retrieved June 9, 2023 from https://solr.apache.org/
  5. 2023. Apache Spark. Retrieved June 9, 2023 from https://spark.apache.org/
  6. 2023. Apple buyed Shazam. Retrieved June 12, 2023 from https://www.bloomberg.com/news/articles/2017-12-11/apple-buys-early-iphone-app-hit-shazam-to-boost-apple-music
  7. 2023. AWS Overview Whitepaper. Retrieved June 12, 2023 from https://docs.aws.amazon.com/pdfs/whitepapers/latest/aws-overview/aws-overview.pdf
  8. 2023. Facebook Faiss. Retrieved June 9, 2023 from https://github.com/facebookresearch/faiss/
  9. 2023. Fastai. Retrieved June 19, 2023 from https://github.com/fastai/fastai
  10. 2023. Hadoop MapReduce. Retrieved June 12, 2023 from https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html
  11. 2023. HBase. Retrieved June 12, 2023 from https://hbase.apache.org/
  12. 2023. HDFS. Retrieved June 12, 2023 from https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
  13. 2023. jsbi-calculator. Retrieved June 23, 2023 from https://www.npmjs.com/package/jsbi-calculator
  14. 2023. jsbi pull request 82. Retrieved June 23, 2023 from https://github.com/GoogleChromeLabs/jsbi/pull/82
  15. 2023. LireSolr Vector Processing Source Code. Retrieved June 19, 2023 from https://github.com/dermotte/liresolr/blob/master/src/main/java/net/semanticmetadata/lire/solr/LireRequestHandler.java#L421
  16. 2023. MariaDB. Retrieved June 23, 2023 from https://mariadb.org/
  17. 2023. Milvus. Retrieved June 9, 2023 from https://github.com/milvus-io/milvus/
  18. 2023. MinIO. Retrieved June 23, 2023 from https://min.io/
  19. 2023a. OpenCV findContours. Retrieved June 23, 2023 from https://docs.opencv.org/2.4/doc/tutorials/imgproc/shapedescriptors/find_contours/find_contours.html
  20. 2023b. OpenDistro. Retrieved June 9, 2023 from https://opendistro.github.io/for-elasticsearch/
  21. 2023. PaddlePaddle. Retrieved June 19, 2023 from https://github.com/PaddlePaddle/Paddle
  22. 2023. Pinecore. Retrieved June 9, 2023 from https://www.pinecone.io/
  23. 2023. Pytorch. Retrieved June 19, 2023 from https://pytorch.org/
  24. 2023. Qdrant. Retrieved June 9, 2023 from https://qdrant.tech/
  25. 2023. Slide of Billion-scale Approximate Nearest Neighbor Search. Retrieved June 13, 2023 from https://speakerdeck.com/matsui_528/cvpr20-tutorial-billion-scale-approximate-nearest-neighbor-search
  26. 2023b. SolrCloud. Retrieved June 9, 2023 from https://solr.apache.org/guide/6_6/solrcloud.html
  27. 2023. Tensorflow. Retrieved June 19, 2023 from https://www.tensorflow.org/
  28. 2023. Towhee. Retrieved June 19, 2023 from https://github.com/towhee-io/towhee
  29. 2023a. trace.moe 2018 report. Retrieved June 23, 2023 from https://github.com/soruly/slides/blob/master/2018-09-whatanime.ga.md
  30. 2023b. trace.moe 2020 markdown report. Retrieved June 9, 2023 from https://github.com/soruly/slides/blob/master/2020-12-trace.moe.md
  31. 2023c. trace.moe about page. Retrieved June 9, 2023 from https://trace.moe/about
  32. 2023d. trace.moe: Anime Scene Search Engine. Retrieved June 9, 2023 from https://trace.moe/
  33. 2023e. trace.moe github. Retrieved June 19, 2023 from https://github.com/soruly/trace.moe
  34. 2023f. trace.moe initial 2016 slide. Retrieved June 9, 2023 from https://github.com/soruly/slides/blob/master/2016-05-whatanime.ga.slide
  35. 2023. Vald. Retrieved June 9, 2023 from https://github.com/vdaas/vald/
  36. 2023. Vsearch. Retrieved June 9, 2023 from https://github.com/vearch/vearch/
  37. 2023. Weaviate. Retrieved June 9, 2023 from https://github.com/weaviate/weaviate/
  38. André Araujo and Bernd Girod. 2018. Large-Scale Video Retrieval Using Image Queries. IEEE Transactions on Circuits and Systems for Video Technology 28, 6 (2018), 1406–1420. https://doi.org/10.1109/TCSVT.2017.2667710
  39. G. Bradski. 2000. The OpenCV Library. Dr. Dobb’s Journal of Software Tools (2000).
  40. Bigtable: A Distributed Storage System for Structured Data (Awarded Best Paper!). In 7th Symposium on Operating Systems Design and Implementation (OSDI ’06), November 6-8, Seattle, WA, USA, Brian N. Bershad and Jeffrey C. Mogul (Eds.). USENIX Association, 205–218. http://www.usenix.org/events/osdi06/tech/chang.html
  41. F. Chollet. 2017. Xception: Deep Learning with Depthwise Separable Convolutions. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 1800–1807. https://doi.org/10.1109/CVPR.2017.195
  42. Francois Chollet et al. 2015. Keras. https://github.com/fchollet/keras
  43. The Snowflake Elastic Data Warehouse. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, Fatma Özcan, Georgia Koutrika, and Sam Madden (Eds.). ACM, 215–226. https://doi.org/10.1145/2882903.2903741
  44. Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified Data Processing on Large Clusters. In 6th Symposium on Operating System Design and Implementation (OSDI 2004), San Francisco, California, USA, December 6-8, 2004, Eric A. Brewer and Peter Chen (Eds.). USENIX Association, 137–150. http://www.usenix.org/events/osdi04/tech/dean.html
  45. The Google file system. In Proceedings of the 19th ACM Symposium on Operating Systems Principles 2003, SOSP 2003, Bolton Landing, NY, USA, October 19-22, 2003, Michael L. Scott and Larry L. Peterson (Eds.). ACM, 29–43. https://doi.org/10.1145/945445.945450
  46. Deep Residual Learning for Image Recognition. In Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (Las Vegas, NV, USA) (CVPR ’16). IEEE, 770–778. https://doi.org/10.1109/CVPR.2016.90
  47. Searching for MobileNetV3. CoRR abs/1905.02244 (2019). arXiv:1905.02244 http://arxiv.org/abs/1905.02244
  48. E. Kasutani and A. Yamada. 2001. The MPEG-7 color layout descriptor: a compact image feature description for high-speed image/video segment retrieval. In Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205), Vol. 1. 674–677 vol.1. https://doi.org/10.1109/ICIP.2001.959135
  49. Mathias Lux and Oge Marques. 2013. Visual Information Retrieval Using Java and LIRE. Morgan & Claypool Publishers. https://doi.org/10.2200/S00468ED1V01Y201301ICR025
  50. LireSolr: A Visual Information Retrieval Server. In Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, ICMR 2017, Bucharest, Romania, June 6-9, 2017, Bogdan Ionescu, Nicu Sebe, Jiashi Feng, Martha A. Larson, Rainer Lienhart, and Cees Snoek (Eds.). ACM, 466–469. https://doi.org/10.1145/3078971.3079014
  51. CVPR2020 Tutorial on Image Retrieval in the Wild. https://matsui528.github.io/cvpr2020_tutorial_retrieval/.
  52. Keiron O’Shea and Ryan Nash. 2015. An Introduction to Convolutional Neural Networks. CoRR abs/1511.08458 (2015). arXiv:1511.08458 http://arxiv.org/abs/1511.08458
  53. Anna Podlesnaya and Sergey Podlesnyy. 2018. Deep Learning Based Semantic Video Indexing and Retrieval. In Proceedings of SAI Intelligent Systems Conference (IntelliSys) 2016. 359–372.
  54. David R. O’Hallaron Randal E. Bryant. 2010. Computer Systems: A Programmer’s Perspective. Addison-Wesley Publishing CompanyUnited States.
  55. Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1409.1556
  56. Suramya Tomar. 2006. Converting video formats with FFmpeg. Linux Journal 2006, 146 (2006), 10.
  57. Avery Wang. 2003. An Industrial Strength Audio Search Algorithm. In ISMIR 2003, 4th International Conference on Music Information Retrieval, Baltimore, Maryland, USA, October 27-30, 2003, Proceedings.
  58. Wikipedia contributors. 2023a. BitTorrent — Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=BitTorrent&oldid=1158427041. [Online; accessed 12-June-2023].
  59. Wikipedia contributors. 2023b. Content-based image retrieval(CBIR) — Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=Content-based_image_retrieval&oldid=1147985578. [Online; accessed 9-June-2023].
  60. Wikipedia contributors. 2023c. Locality-sensitive hashing(LSH) — Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=Locality-sensitive_hashing&oldid=1158689833. [Online; accessed 9-June-2023].
  61. Wikipedia contributors. 2023d. Reverse Polish notation — Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=Reverse_Polish_notation&oldid=1160633074. [Online; accessed 23-June-2023].
  62. CNN-VWII: An efficient approach for large-scale video retrieval by image queries. Pattern Recognit. Lett. 123 (2019), 82–88. https://doi.org/10.1016/j.patrec.2019.03.015

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com