Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Large Language Model Informed Patent Image Retrieval (2404.19360v1)

Published 30 Apr 2024 in cs.CV, cs.CL, and cs.IR

Abstract: In patent prosecution, image-based retrieval systems for identifying similarities between current patent images and prior art are pivotal to ensure the novelty and non-obviousness of patent applications. Despite their growing popularity in recent years, existing attempts, while effective at recognizing images within the same patent, fail to deliver practical value due to their limited generalizability in retrieving relevant prior art. Moreover, this task inherently involves the challenges posed by the abstract visual features of patent images, the skewed distribution of image classifications, and the semantic information of image descriptions. Therefore, we propose a language-informed, distribution-aware multimodal approach to patent image feature learning, which enriches the semantic understanding of patent image by integrating LLMs and improves the performance of underrepresented classes with our proposed distribution-aware contrastive losses. Extensive experiments on DeepPatent2 dataset show that our proposed method achieves state-of-the-art or comparable performance in image-based patent retrieval with mAP +53.3%, Recall@10 +41.8%, and MRR@10 +51.9%. Furthermore, through an in-depth user analysis, we explore our model in aiding patent professionals in their image retrieval efforts, highlighting the model's real-world applicability and effectiveness.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023). https://doi.org/10.48550/arXiv.2303.08774
  2. DeepPatent2: A Large-Scale Benchmarking Corpus for Technical Drawing Understanding. Scientific Data 10, 1 (2023), 772. https://doi.org/10.1038/s41597-023-02653-7
  3. Zero-Shot Composed Image Retrieval with Textual Inversion. arXiv preprint arXiv:2303.15247 (2023). https://doi.org/10.48550/arXiv.2303.15247
  4. Effective conditioned and composed image retrieval combining clip-based features. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 21466–21474.
  5. Diagram image retrieval using sketch-based deep learning and transfer learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. 174–175. https://doi.org/10.48550/arXiv.2004.10780
  6. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901. https://doi.org/10.48550/arXiv.2005.14165
  7. Russell N Carney and Joel R Levin. 2002. Pictorial illustrations still improve students’ learning from text. Educational psychology review 14 (2002), 5–26. https://doi.org/10.1023/A:1013176309260
  8. Gridmask data augmentation. arXiv preprint arXiv:2001.04086 (2020). https://doi.org/10.48550/arXiv.2001.04086
  9. Research Methods, Design, and Analysis (12 ed.). Global Edition. Page count: 542; Dimensions: 23 x 18.6 cm; Book number: 00106962.
  10. “This is my unicorn, Fluffy”: Personalizing frozen vision-language representations. In European Conference on Computer Vision. Springer, 558–577. https://doi.org/10.48550/arXiv.2204.01694
  11. Progressive cross-modal semantic network for zero-shot sketch-based image retrieval. IEEE Transactions on Image Processing 29 (2020), 8892–8902. https://doi.org/10.1109/TIP.2020.3020383
  12. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020). https://doi.org/10.48550/arXiv.2010.11929
  13. ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding with GPT and Prototype Guidance. arXiv preprint arXiv:2303.16894 (2023). https://doi.org/10.48550/arXiv.2303.16894
  14. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778. https://doi.org/10.48550/arXiv.1512.03385
  15. Kotaro Higuchi and Keiji Yanai. 2023. Patent image retrieval using transformer-based deep metric learning. World Patent Information 74 (2023), 102217. https://doi.org/10.1016/j.wpi.2023.102217
  16. Joint representation learning for text and 3D point cloud. Pattern Recognition 147 (2024), 110086. https://doi.org/10.48550/arXiv.2301.07584
  17. Relational skeletons for retrieval in patent drawings. In Proceedings 2001 International Conference on Image Processing (Cat. No. 01CH37205), Vol. 2. IEEE, 737–740. https://doi.org/10.1109/ICIP.2001.958599
  18. Deriving design feature vectors for patent images using convolutional neural networks. Journal of Mechanical Design 143, 6 (2021), 061405. https://doi.org/10.1115/1.4049214
  19. Vision-by-Language for Training-Free Compositional Image Retrieval. https://doi.org/10.48550/arXiv.2310.09291 arXiv:2310.09291 [cs.CV]
  20. Alex Kendall and Roberto Cipolla. 2017. Geometric loss functions for camera pose regression with deep learning. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5974–5983. https://doi.org/10.48550/arXiv.1704.00390
  21. Alex Kendall and Yarin Gal. 2017. What uncertainties do we need in bayesian deep learning for computer vision? Advances in neural information processing systems 30 (2017). https://doi.org/10.48550/arXiv.1703.04977
  22. A survey on deep learning for patent analysis. World Patent Information 65 (2021), 102035. https://doi.org/10.1016/j.wpi.2021.102035
  23. DeepPatent: Large scale patent drawing recognition and retrieval. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2309–2318. https://doi.org/10.1109/WACV51458.2022.00063
  24. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597 (2023). https://doi.org/10.48550/arXiv.2301.12597
  25. Visual instruction tuning. Advances in neural information processing systems 36 (2024). https://doi.org/10.48550/arXiv.2304.08485
  26. Swin transformer v2: Scaling up capacity and resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 12009–12019. https://doi.org/10.48550/arXiv.2111.09883
  27. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision. 10012–10022. https://doi.org/10.48550/arXiv.2103.14030
  28. Image retrieval on real-life images with pre-trained vision-and-language models. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2125–2134. https://doi.org/10.48550/arXiv.2108.04024
  29. Hao Cheng Lo and Chiou-Shann Fuh. 2023. Enhancing Long-Tailed 3D Semantic Segmentation with Category-wise Linguistic-Visual Representation. In The 36th IPPR Conference on Computer Vision, Graphics, and Image Processing (CVGIP). Kinmen, Taiwan.
  30. Large-scale person re-identification based on deep hash learning. Entropy 21, 5 (2019), 449. https://doi.org/10.1109/TIP.2017.2695101
  31. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019). https://doi.org/10.48550/arXiv.1912.01703
  32. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763. https://doi.org/10.48550/arXiv.2103.00020
  33. Language-grounded indoor 3d semantic segmentation in the wild. In European Conference on Computer Vision. Springer, 125–141. https://doi.org/10.48550/arXiv.2204.07761
  34. The sketchy database: learning to retrieve badly drawn bunnies. ACM Transactions on Graphics (TOG) 35, 4 (2016), 1–12. https://doi.org/10.1145/2897824.2925954
  35. Walid Shalaby and Wlodek Zadrozny. 2019. Patent retrieval: a literature review. Knowledge and Information Systems 61 (2019), 631–660. https://doi.org/10.1007/s10115-018-1322-7
  36. Mingxing Tan and Quoc Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning. PMLR, 6105–6114. https://doi.org/10.48550/arXiv.1905.11946
  37. Vl-ltr: Learning class-wise visual-linguistic representation for long-tailed visual recognition. In European Conference on Computer Vision. Springer, 73–91. https://doi.org/10.48550/arXiv.2111.13579
  38. Avinash Tiwari and Veena Bansal. 2004. PATSEEK: content based image retrieval system for patent database. (2004).
  39. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023). https://doi.org/10.48550/arXiv.2307.09288
  40. Concept-based patent image retrieval. World Patent Information 34, 4 (2012), 292–303. https://doi.org/10.1016/j.wpi.2012.07.002
  41. Towards content-based patent image retrieval: A framework perspective. World Patent Information 32, 2 (2010), 94–106. https://doi.org/10.1016/j.wpi.2009.05.010
  42. Transferable coupled network for zero-shot sketch-based image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 12 (2021), 9181–9194. https://doi.org/10.1109/TPAMI.2021.3123315
  43. Hongsong Wang and Yuqi Zhang. 2023. Learning Efficient Representations for Image-Based Patent Retrieval. In Chinese Conference on Pattern Recognition and Computer Vision (PRCV). Springer, 15–26. https://doi.org/10.1007/978-981-99-8540-1_2
  44. Mei Wang and Weihong Deng. 2018. Deep visual domain adaptation: A survey. Neurocomputing 312 (2018), 135–153. https://doi.org/10.48550/arXiv.1802.03601
  45. Explainable AI: A brief survey on history, research areas, approaches and challenges. In Natural Language Processing and Chinese Computing: 8th CCF International Conference, NLPCC 2019, Dunhuang, China, October 9–14, 2019, Proceedings, Part II 8. Springer, 563–574. https://doi.org/10.1007/978-3-030-32236-6_51
  46. Deep learning for free-hand sketch: A survey. IEEE transactions on pattern analysis and machine intelligence 45, 1 (2022), 285–312. https://doi.org/10.48550/arXiv.2001.02600
  47. Deep learning for person re-identification: A survey and outlook. IEEE transactions on pattern analysis and machine intelligence 44, 6 (2021), 2872–2893. https://doi.org/10.1109/TPAMI.2021.3054775
  48. Prompt, generate, then cache: Cascade of foundation models makes strong few-shot learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15211–15222. https://doi.org/10.48550/arXiv.2303.02151
  49. An outward-appearance patent-image retrieval approach based on the contour-description matrix. In 2007 Japan-China Joint Workshop on Frontier of Computer Science and Technology (FCST 2007). IEEE, 86–89. https://doi.org/10.1109/FCST.2007.14
  50. Random erasing data augmentation. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 13001–13008. https://doi.org/10.48550/arXiv.1708.04896
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Hao-Cheng Lo (2 papers)
  2. Jung-Mei Chu (2 papers)
  3. Jieh Hsiang (7 papers)
  4. Chun-Chieh Cho (2 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.