Large Language Model Informed Patent Image Retrieval (2404.19360v1)
Abstract: In patent prosecution, image-based retrieval systems for identifying similarities between current patent images and prior art are pivotal to ensure the novelty and non-obviousness of patent applications. Despite their growing popularity in recent years, existing attempts, while effective at recognizing images within the same patent, fail to deliver practical value due to their limited generalizability in retrieving relevant prior art. Moreover, this task inherently involves the challenges posed by the abstract visual features of patent images, the skewed distribution of image classifications, and the semantic information of image descriptions. Therefore, we propose a language-informed, distribution-aware multimodal approach to patent image feature learning, which enriches the semantic understanding of patent image by integrating LLMs and improves the performance of underrepresented classes with our proposed distribution-aware contrastive losses. Extensive experiments on DeepPatent2 dataset show that our proposed method achieves state-of-the-art or comparable performance in image-based patent retrieval with mAP +53.3%, Recall@10 +41.8%, and MRR@10 +51.9%. Furthermore, through an in-depth user analysis, we explore our model in aiding patent professionals in their image retrieval efforts, highlighting the model's real-world applicability and effectiveness.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023). https://doi.org/10.48550/arXiv.2303.08774
- DeepPatent2: A Large-Scale Benchmarking Corpus for Technical Drawing Understanding. Scientific Data 10, 1 (2023), 772. https://doi.org/10.1038/s41597-023-02653-7
- Zero-Shot Composed Image Retrieval with Textual Inversion. arXiv preprint arXiv:2303.15247 (2023). https://doi.org/10.48550/arXiv.2303.15247
- Effective conditioned and composed image retrieval combining clip-based features. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 21466–21474.
- Diagram image retrieval using sketch-based deep learning and transfer learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. 174–175. https://doi.org/10.48550/arXiv.2004.10780
- Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901. https://doi.org/10.48550/arXiv.2005.14165
- Russell N Carney and Joel R Levin. 2002. Pictorial illustrations still improve students’ learning from text. Educational psychology review 14 (2002), 5–26. https://doi.org/10.1023/A:1013176309260
- Gridmask data augmentation. arXiv preprint arXiv:2001.04086 (2020). https://doi.org/10.48550/arXiv.2001.04086
- Research Methods, Design, and Analysis (12 ed.). Global Edition. Page count: 542; Dimensions: 23 x 18.6 cm; Book number: 00106962.
- “This is my unicorn, Fluffy”: Personalizing frozen vision-language representations. In European Conference on Computer Vision. Springer, 558–577. https://doi.org/10.48550/arXiv.2204.01694
- Progressive cross-modal semantic network for zero-shot sketch-based image retrieval. IEEE Transactions on Image Processing 29 (2020), 8892–8902. https://doi.org/10.1109/TIP.2020.3020383
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020). https://doi.org/10.48550/arXiv.2010.11929
- ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding with GPT and Prototype Guidance. arXiv preprint arXiv:2303.16894 (2023). https://doi.org/10.48550/arXiv.2303.16894
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778. https://doi.org/10.48550/arXiv.1512.03385
- Kotaro Higuchi and Keiji Yanai. 2023. Patent image retrieval using transformer-based deep metric learning. World Patent Information 74 (2023), 102217. https://doi.org/10.1016/j.wpi.2023.102217
- Joint representation learning for text and 3D point cloud. Pattern Recognition 147 (2024), 110086. https://doi.org/10.48550/arXiv.2301.07584
- Relational skeletons for retrieval in patent drawings. In Proceedings 2001 International Conference on Image Processing (Cat. No. 01CH37205), Vol. 2. IEEE, 737–740. https://doi.org/10.1109/ICIP.2001.958599
- Deriving design feature vectors for patent images using convolutional neural networks. Journal of Mechanical Design 143, 6 (2021), 061405. https://doi.org/10.1115/1.4049214
- Vision-by-Language for Training-Free Compositional Image Retrieval. https://doi.org/10.48550/arXiv.2310.09291 arXiv:2310.09291 [cs.CV]
- Alex Kendall and Roberto Cipolla. 2017. Geometric loss functions for camera pose regression with deep learning. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5974–5983. https://doi.org/10.48550/arXiv.1704.00390
- Alex Kendall and Yarin Gal. 2017. What uncertainties do we need in bayesian deep learning for computer vision? Advances in neural information processing systems 30 (2017). https://doi.org/10.48550/arXiv.1703.04977
- A survey on deep learning for patent analysis. World Patent Information 65 (2021), 102035. https://doi.org/10.1016/j.wpi.2021.102035
- DeepPatent: Large scale patent drawing recognition and retrieval. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2309–2318. https://doi.org/10.1109/WACV51458.2022.00063
- Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597 (2023). https://doi.org/10.48550/arXiv.2301.12597
- Visual instruction tuning. Advances in neural information processing systems 36 (2024). https://doi.org/10.48550/arXiv.2304.08485
- Swin transformer v2: Scaling up capacity and resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 12009–12019. https://doi.org/10.48550/arXiv.2111.09883
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision. 10012–10022. https://doi.org/10.48550/arXiv.2103.14030
- Image retrieval on real-life images with pre-trained vision-and-language models. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2125–2134. https://doi.org/10.48550/arXiv.2108.04024
- Hao Cheng Lo and Chiou-Shann Fuh. 2023. Enhancing Long-Tailed 3D Semantic Segmentation with Category-wise Linguistic-Visual Representation. In The 36th IPPR Conference on Computer Vision, Graphics, and Image Processing (CVGIP). Kinmen, Taiwan.
- Large-scale person re-identification based on deep hash learning. Entropy 21, 5 (2019), 449. https://doi.org/10.1109/TIP.2017.2695101
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019). https://doi.org/10.48550/arXiv.1912.01703
- Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763. https://doi.org/10.48550/arXiv.2103.00020
- Language-grounded indoor 3d semantic segmentation in the wild. In European Conference on Computer Vision. Springer, 125–141. https://doi.org/10.48550/arXiv.2204.07761
- The sketchy database: learning to retrieve badly drawn bunnies. ACM Transactions on Graphics (TOG) 35, 4 (2016), 1–12. https://doi.org/10.1145/2897824.2925954
- Walid Shalaby and Wlodek Zadrozny. 2019. Patent retrieval: a literature review. Knowledge and Information Systems 61 (2019), 631–660. https://doi.org/10.1007/s10115-018-1322-7
- Mingxing Tan and Quoc Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning. PMLR, 6105–6114. https://doi.org/10.48550/arXiv.1905.11946
- Vl-ltr: Learning class-wise visual-linguistic representation for long-tailed visual recognition. In European Conference on Computer Vision. Springer, 73–91. https://doi.org/10.48550/arXiv.2111.13579
- Avinash Tiwari and Veena Bansal. 2004. PATSEEK: content based image retrieval system for patent database. (2004).
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023). https://doi.org/10.48550/arXiv.2307.09288
- Concept-based patent image retrieval. World Patent Information 34, 4 (2012), 292–303. https://doi.org/10.1016/j.wpi.2012.07.002
- Towards content-based patent image retrieval: A framework perspective. World Patent Information 32, 2 (2010), 94–106. https://doi.org/10.1016/j.wpi.2009.05.010
- Transferable coupled network for zero-shot sketch-based image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 12 (2021), 9181–9194. https://doi.org/10.1109/TPAMI.2021.3123315
- Hongsong Wang and Yuqi Zhang. 2023. Learning Efficient Representations for Image-Based Patent Retrieval. In Chinese Conference on Pattern Recognition and Computer Vision (PRCV). Springer, 15–26. https://doi.org/10.1007/978-981-99-8540-1_2
- Mei Wang and Weihong Deng. 2018. Deep visual domain adaptation: A survey. Neurocomputing 312 (2018), 135–153. https://doi.org/10.48550/arXiv.1802.03601
- Explainable AI: A brief survey on history, research areas, approaches and challenges. In Natural Language Processing and Chinese Computing: 8th CCF International Conference, NLPCC 2019, Dunhuang, China, October 9–14, 2019, Proceedings, Part II 8. Springer, 563–574. https://doi.org/10.1007/978-3-030-32236-6_51
- Deep learning for free-hand sketch: A survey. IEEE transactions on pattern analysis and machine intelligence 45, 1 (2022), 285–312. https://doi.org/10.48550/arXiv.2001.02600
- Deep learning for person re-identification: A survey and outlook. IEEE transactions on pattern analysis and machine intelligence 44, 6 (2021), 2872–2893. https://doi.org/10.1109/TPAMI.2021.3054775
- Prompt, generate, then cache: Cascade of foundation models makes strong few-shot learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15211–15222. https://doi.org/10.48550/arXiv.2303.02151
- An outward-appearance patent-image retrieval approach based on the contour-description matrix. In 2007 Japan-China Joint Workshop on Frontier of Computer Science and Technology (FCST 2007). IEEE, 86–89. https://doi.org/10.1109/FCST.2007.14
- Random erasing data augmentation. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 13001–13008. https://doi.org/10.48550/arXiv.1708.04896
- Hao-Cheng Lo (2 papers)
- Jung-Mei Chu (2 papers)
- Jieh Hsiang (7 papers)
- Chun-Chieh Cho (2 papers)