Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Part of speech tagging for code switched data (1909.13006v2)

Published 28 Sep 2019 in cs.CL

Abstract: We address the problem of Part of Speech tagging (POS) in the context of linguistic code switching (CS). CS is the phenomenon where a speaker switches between two languages or variants of the same language within or across utterances, known as intra-sentential or inter-sentential CS, respectively. Processing CS data is especially challenging in intra-sentential data given state of the art monolingual NLP technology since such technology is geared toward the processing of one language at a time. In this paper we explore multiple strategies of applying state of the art POS taggers to CS data. We investigate the landscape in two CS language pairs, Spanish-English and Modern Standard Arabic-Arabic dialects. We compare the use of two POS taggers vs. a unified tagger trained on CS data. Our results show that applying a machine learning framework using two state of the art POS taggers achieves better performance compared to all other approaches that we investigate.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Giovanni Molina (3 papers)
  2. Mona Diab (71 papers)
  3. Thamar Solorio (67 papers)
  4. Abdelati Hawwari (3 papers)
  5. Victor Soto (6 papers)
  6. Julia Hirschberg (37 papers)
  7. Fahad Alghamdi (7 papers)
Citations (36)

Summary

We haven't generated a summary for this paper yet.