Comparability of per‑word all‑category accuracy with unspecified category sets
Determine whether the 97.6% per‑word “all categories” accuracy reported for the DeepPavlov BERT‑based Russian morphological tagger is directly comparable to the 95.3559% per‑word “all categories” accuracy reported for the Multi‑head attention–based tagger when evaluated on the specific category set {upos, Mood, VerbForm, Person, Animacy, Degree, Variant, Number, Gender, NumForm, Case, Tense, Voice}, given that the DeepPavlov evaluation does not specify the exact set of morphological categories and that accuracy varies substantially with the chosen category set.
References
Though it is clearly better than obtained by proposed architecture, we are unable to be sure that this results can be comparable, because the authors do not give the exact list of the categories being analyzed.