Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Phylogeny and geometry of languages from normalized Levenshtein distance (1104.4426v3)

Published 22 Apr 2011 in cs.CL and q-bio.PE

Abstract: The idea that the distance among pairs of languages can be evaluated from lexical differences seems to have its roots in the work of the French explorer Dumont D'Urville. He collected comparative words lists of various languages during his voyages aboard the Astrolabe from 1826 to 1829 and, in his work about the geographical division of the Pacific, he proposed a method to measure the degree of relation between languages. The method used by the modern lexicostatistics, developed by Morris Swadesh in the 1950s, measures distances from the percentage of shared cognates, which are words with a common historical origin. The weak point of this method is that subjective judgment plays a relevant role. Recently, we have proposed a new automated method which is motivated by the analogy with genetics. The new approach avoids any subjectivity and results can be easily replicated by other scholars. The distance between two languages is defined by considering a renormalized Levenshtein distance between pair of words with the same meaning and averaging on the words contained in a list. The renormalization, which takes into account the length of the words, plays a crucial role, and no sensible results can be found without it. In this paper we give a short review of our automated method and we illustrate it by considering the cluster of Malagasy dialects. We show that it sheds new light on their kinship relation and also that it furnishes a lot of new information concerning the modalities of the settlement of Madagascar.

Citations (9)

Summary

We haven't generated a summary for this paper yet.