Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

UAlberta at SemEval-2023 Task 1: Context Augmentation and Translation for Multilingual Visual Word Sense Disambiguation (2306.14067v1)

Published 24 Jun 2023 in cs.CL

Abstract: We describe the systems of the University of Alberta team for the SemEval-2023 Visual Word Sense Disambiguation (V-WSD) Task. We present a novel algorithm that leverages glosses retrieved from BabelNet, in combination with text and image encoders. Furthermore, we compare language-specific encoders against the application of English encoders to translated texts. As the contexts given in the task datasets are extremely short, we also experiment with augmenting these contexts with descriptions generated by a LLM. This yields substantial improvements in accuracy. We describe and evaluate additional V-WSD methods which use image generation and text-conditioned image segmentation. Overall, the results of our official submission rank us 18 out of 56 teams. Some of our unofficial results are even better than the official ones. Our code is publicly available at https://github.com/UAlberta-NLP/v-wsd.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. ConSeC: Word sense disambiguation as continuous sense comprehension. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1492–1503, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  2. Word sense disambiguation with pictures. In Proceedings of the HLT-NAACL 2003 Workshop on Learning Word Meaning from Non-Linguistic Data, pages 1–5.
  3. Contrastive language-image pre-training for the italian language. arXiv preprint arXiv:2108.08688.
  4. Terra Blevins and Luke Zettlemoyer. 2020. Moving down the long tail of word sense disambiguation with gloss informed bi-encoders. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1006–1017, Online. Association for Computational Linguistics.
  5. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.
  6. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  7. Fatality killed the cat or: BabelPic, a multimodal dataset for non-concrete concepts. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4680–4686, Online. Association for Computational Linguistics.
  8. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  9. Parsbert: Transformer-based model for persian language understanding. Neural Processing Letters.
  10. Disambiguating visual verbs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(2):311–322.
  11. Scaling up visual and vision-language representation learning with noisy text supervision. In International Conference on Machine Learning, pages 4904–4916. PMLR.
  12. Discriminating image senses by clustering with multimodal features. In Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, pages 547–554, Sydney, Australia. Association for Computational Linguistics.
  13. Timo Lüddecke and Alexander Ecker. 2022. Image segmentation using text and image prompts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7086–7096.
  14. Low-resource languages: A review of past work and future challenges. arXiv preprint arXiv:2006.07264.
  15. Nibbling at the hard core of Word Sense Disambiguation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4724–4737, Dublin, Ireland. Association for Computational Linguistics.
  16. Roberto Navigli and Simone Paolo Ponzetto. 2012. Babelnet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial intelligence, 193:217–250.
  17. Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155.
  18. XL-WSD: An extra-large and cross-lingual evaluation framework for word sense disambiguation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 13648–13656.
  19. Learning transferable visual models from natural language supervision.
  20. SemEval-2023 Task 1: Visual Word Sense Disambiguation. In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), Toronto, Canada. Association for Computational Linguistics.
  21. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695.
  22. Kate Saenko and Trevor Darrell. 2008. Unsupervised learning of visual sense models for polysemous words. In Advances in Neural Information Processing Systems, volume 21. Curran Associates, Inc.
  23. Amir Ahmadi Sajjad Ayoubi, Navid Kanaani. 2022. Clipfa: Connecting farsi text and images. https://github.com/SajjjadAyobi/CLIPfa.
  24. Transductive visual verb sense disambiguation. In 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 3049–3058.
  25. Ellen M. Voorhees and Dawn M. Tice. 2000. The TREC-8 question answering track. In Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00), Athens, Greece. European Language Resources Association (ELRA).
  26. Ming Wang and Yinglin Wang. 2020. A synset relation-enhanced framework with a try-again mechanism for word sense disambiguation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6229–6240, Online. Association for Computational Linguistics.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com