Papers
Topics
Authors
Recent
Search
2000 character limit reached

Context based lemmatizer for Polish language

Published 23 Jul 2022 in cs.CL and cs.AI | (2207.11565v1)

Abstract: Lemmatization is the process of grouping together the inflected forms of a word so they can be analysed as a single item, identified by the word's lemma, or dictionary form. In computational linguistics, lemmatisation is the algorithmic process of determining the lemma of a word based on its intended meaning. Unlike stemming, lemmatisation depends on correctly identifying the intended part of speech and meaning of a word in a sentence, as well as within the larger context surrounding that sentence. As a result, developing efficient lemmatisation algorithm is the complex task. In recent years it can be observed that deep learning models used for this task outperform other methods including machine learning algorithms. In this paper the polish lemmatizer based on Google T5 model is presented. The training was run with different context lengths. The model achieves the best results for polish language lemmatisation process.

Citations (1)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.