Contrastive Language-Image Pre-training for the Italian Language (2108.08688v1)

Published 19 Aug 2021 in cs.CL and cs.CV

Abstract: CLIP (Contrastive Language-Image Pre-training) is a very recent multi-modal model that jointly learns representations of images and texts. The model is trained on a massive amount of English data and shows impressive performance on zero-shot classification tasks. Training the same model on a different language is not trivial, since data in other languages might be not enough and the model needs high-quality translations of the texts to guarantee a good performance. In this paper, we present the first CLIP model for the Italian Language (CLIP-Italian), trained on more than 1.4 million image-text pairs. Results show that CLIP-Italian outperforms the multilingual CLIP model on the tasks of image retrieval and zero-shot classification.

PDF Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

Authors (6)

Federico Bianchi (47 papers)
Giuseppe Attanasio (21 papers)
Raphael Pisoni (3 papers)
Silvia Terragni (8 papers)
Gabriele Sarti (21 papers)
Sri Lakshmi (1 paper)

Citations (29)

View on Semantic Scholar

Contrastive Language-Image Pre-training for the Italian Language (2108.08688v1)

Related Papers