Detect Language of Transliterated Texts

Published 26 Apr 2020 in eess.AS, cs.CL, cs.LG, cs.SD, and stat.ML | (2004.13521v1)

Abstract: Informal transliteration from other languages to English is prevalent in social media threads, instant messaging, and discussion forums. Without identifying the language of such transliterated text, users who do not speak that language cannot understand its content using translation tools. We propose a Language Identification (LID) system, with an approach for feature extraction, which can detect the language of transliterated texts reasonably well even with limited training data and computational resources. We tokenize the words into phonetic syllables and use a simple Long Short-term Memory (LSTM) network architecture to detect the language of transliterated texts. With intensive experiments, we show that the tokenization of transliterated words as phonetic syllables effectively represents their causal sound patterns. Phonetic syllable tokenization, therefore, makes it easier for even simpler model architectures to learn the characteristic patterns to identify any language.

Abstract PDF Upgrade to Chat

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

Continue Learning

We haven't generated follow-up questions for this paper yet.

Generate Now

Authors (1)

Sourav Sen

Detect Language of Transliterated Texts

Summary

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Authors (1)

Collections

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Detect Language of Transliterated Texts

Summary

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (1)

Collections

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research