Papers
Topics
Authors
Recent
2000 character limit reached

Quantification of stylistic differences in human- and ASR-produced transcripts of African American English (2409.03059v1)

Published 4 Sep 2024 in cs.CL

Abstract: Common measures of accuracy used to assess the performance of automatic speech recognition (ASR) systems, as well as human transcribers, conflate multiple sources of error. Stylistic differences, such as verbatim vs non-verbatim, can play a significant role in ASR performance evaluation when differences exist between training and test datasets. The problem is compounded for speech from underrepresented varieties, where the speech to orthography mapping is not as standardized. We categorize the kinds of stylistic differences between 6 transcription versions, 4 human- and 2 ASR-produced, of 10 hours of African American English (AAE) speech. Focusing on verbatim features and AAE morphosyntactic features, we investigate the interactions of these categories with how well transcripts can be compared via word error rate (WER). The results, and overall analysis, help clarify how ASR outputs are a function of the decisions made by the training data's human transcribers.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
  1. M. Bucholtz, “The politics of transcription,” Journal of Pragmatics, vol. 32, no. 10, pp. 1439–1465, 2000.
  2. T. Kendall and C. Farrington, “The corpus of regional African American Language,” Version 2023.06, 2023.
  3. L. D. Shriberg and G. L. Lof, “Reliability studies in broad and narrow phonetic transcription,” Clinical Linguistics & Phonetics, vol. 5, no. 3, pp. 225–279, 1991.
  4. B. R. Patterson, N. C. Neupauer, P. A. Burant, S. C. Koehn, and A. T. Reed, “A preliminary examination of conversation analytic techniques: Rates of inter-transcriber reliability,” Western Journal of Communication (includes Communication Reports), vol. 60, no. 1, pp. 76–91, 1996.
  5. D. Loakes, “Does automatic speech recognition (ASR) have a role in the transcription of indistinct covert recordings for forensic purposes?” Frontiers in Communication, vol. 7, p. 803452, 2022.
  6. R. Love and D. Wright, “Specifying challenges in transcribing covert recordings: Implications for forensic transcription,” Frontiers in Communication, vol. 6, p. 797448, 2021.
  7. M. Del Río, C. Miller, J. Profant, J. Drexler-Fox, Q. McNamara, N. Bhandari, N. Delworth, I. Pirkin, M. Jetté, S. Chandra et al., “Accents in speech recognition through the lens of a world englishes evaluation set,” Research in Language, pp. 225–244, 2023.
  8. T. Jones, J. R. Kalbfeld, R. Hancock, and R. Clark, “Testifying while black: An experimental study of court reporter accuracy in transcription of African American English,” Language, vol. 95, no. 2, pp. e216–e252, 2019.
  9. A. Koenecke, A. Nam, E. Lake, J. Nudell, M. Quartey, Z. Mengesha, C. Toups, J. R. Rickford, D. Jurafsky, and S. Goel, “Racial disparities in automated speech recognition,” Proceedings of the National Academy of Sciences, vol. 117, no. 14, pp. 7684–7689, 2020.
  10. A. B. Wassink, C. Gansen, and I. Bartholomew, “Uneven success: automatic speech recognition and ethnicity-related dialects,” Speech Communication, vol. 140, pp. 50–70, 2022.
  11. J. D. Fox and N. Delworth, “Improving contextual recognition of rare words with an alternate spelling prediction model,” arXiv preprint arXiv:2209.01250, 2022.
  12. A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, “Robust speech recognition via large-scale weak supervision,” in International Conference on Machine Learning.   PMLR, 2023, pp. 28 492–28 518.
  13. M. Del Rio, N. Delworth, R. Westerman, M. Huang, N. Bhandari, J. Palakapilly, Q. McNamara, J. Dong, P. Zelasko, and M. Jetté, “Earnings-21: A practical benchmark for asr in the wild,” arXiv preprint arXiv:2104.11348, 2021.
  14. A. K. Spears, “Rickford’s list of African American English grammatical features: an update,” in The Routledge companion to the work of John R. Rickford.   Routledge, 2019, pp. 79–89.
  15. J. Davydova and K. Hazen, “The role of linguistic structure in the perceptions of vernacular speech: Evidence from L1 English and English as a foreign language,” English World-Wide, vol. 42, no. 3, pp. 273–298, 2021.
  16. L. Harrington, R. W. Rhodes, and V. Hughes, “Style variability in disfluency analysis for forensic speaker comparison,” International Journal of Speech, Language and the Law, pp. 31–58, 2021.
  17. H. Fogel and L. C. Ehri, “Teaching African American English forms to standard American English-speaking teachers: Effects on acquisition, attitudes, and responses to student use,” Journal of Teacher Education, vol. 57, no. 5, pp. 464–480, 2006.
  18. C. Miller, D. Silverman, V. Jurica, E. Richerson, R. Morris, and E. Mallard, “Embedding register-aware MT into the CAT workflow,” in Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 2: User Track), J. Campbell, A. Yanishevsky, J. Doyon, and D. Jones, Eds.   Boston, MA: Association for Machine Translation in the Americas, Mar. 2018, pp. 275–282.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.