Explaining Conformer-1’s underperformance on the Numbers domain
Determine the factors responsible for Conformer-1’s inferior performance relative to other providers on the in-house Numbers domain and ascertain whether the use of pseudo-labels and the applied filtering strategy causally contribute to this discrepancy.
References
Conformer-1 outperforms other providers on all domains except Numbers: we hypothesize that the difference between Conformer-1 and other models in this domain can be attributed to the use of pseudo-labels and our filtering strategy. We leave this discrepancy open as a future area of research.
— Conformer-1: Robust ASR via Large-Scale Semisupervised Bootstrapping
(2404.07341 - Zhang et al., 10 Apr 2024) in Subsection 'Comparison with other Speech APIs' in the Experiments section