Multimodal Search on Iconclass using Vision-Language Pre-Trained Models (2306.16529v1)
Abstract: Terminology sources, such as controlled vocabularies, thesauri and classification systems, play a key role in digitizing cultural heritage. However, Information Retrieval (IR) systems that allow to query and explore these lexical resources often lack an adequate representation of the semantics behind the user's search, which can be conveyed through multiple expression modalities (e.g., images, keywords or textual descriptions). This paper presents the implementation of a new search engine for one of the most widely used iconography classification system, Iconclass. The novelty of this system is the use of a pre-trained vision-LLM, namely CLIP, to retrieve and explore Iconclass concepts using visual or textual queries.
- Patricia Harpring. 2010. Development of the Getty Vocabularies: AAT, TGN, ULAN, and CONA. Art Documentation: Journal of the Art Libraries Society of North America 29, 1 (2010), 67–72. https://www.jstor.org/stable/27949541 Publisher: [The University of Chicago Press, Art Libraries Society of North America].
- Billion-scale similarity search with GPUs. https://doi.org/10.48550/arXiv.1702.08734 arXiv:1702.08734 [cs].
- Etienne Posthumus and Harald Sack. 2022. The Art Historian’s Bicycle Becomes an E-Bike. (2022).
- Learning Transferable Visual Models From Natural Language Supervision. https://doi.org/10.48550/arXiv.2103.00020 arXiv:2103.00020 [cs].
- H. van de Waal. 1968. Decimal index of the art of the Low Countries; D.I.A.L (abridged ed. of the iconclass system ed.). Rijksbureau voor Kunsthistorische Documentatie, The Hague. OCLC: 27696.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.