Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Geometry of Multilingual Language Models: An Equality Lens (2305.07839v1)

Published 13 May 2023 in cs.CL

Abstract: Understanding the representations of different languages in multilingual LLMs is essential for comprehending their cross-lingual properties, predicting their performance on downstream tasks, and identifying any biases across languages. In our study, we analyze the geometry of three multilingual LLMs in Euclidean space and find that all languages are represented by unique geometries. Using a geometric separability index we find that although languages tend to be closer according to their linguistic family, they are almost separable with languages from other families. We also introduce a Cross-Lingual Similarity Index to measure the distance of languages with each other in the semantic space. Our findings indicate that the low-resource languages are not represented as good as high resource languages in any of the models

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Cheril Shah (2 papers)
  2. Yashashree Chandak (1 paper)
  3. Manan Suri (32 papers)
Citations (1)