On the adequacy of L2 pronunciation feedback from automatic speech recognition: A focus on Google Translate
Paul John, Walcir Cardoso, Carol Johnson

DOI: https://doi.org/10.29140/9781914291050-20

In: Proceedings of the XXIst International CALL Research Conference (pp. 146–154)
Edited by Jozef Colpaert, Yijen Wang, Glenn Stockwell (2022)


Abstract

This study investigates automatic speech recognition (ASR) in Google Translate as a source for L2 pronunciation feedback. To be effective, ASR should transcribe learner errors accurately and perform equally well on male and female voices, avoiding gender bias. We assess Google Translate on three Quebec francophone (QF) segmental errors in English: th-substitution ( think → [t] ink ); h-deletion ( happy → appy ); and h-epenthesis ( ice → [h] ice ). Eight QFs (4F/4M) recorded 120 sentences with and without an error on the final item (e.g., I don’t know who to *tank/thank ). Errors were equally divided between real word output (* tank ) and nonword output (e.g., My sister is afraid of *tunder ). We anticipate real word errors, corresponding to entries in the Google Translate lexicon, will be accurately transcribed, whereas nonwords, by definition absent from the lexicon, should be erroneously matched to similar-sounding real words (i.e., the intended output “thunder”), constituting misleading feedback. Forthcoming data analyses will determine the relative contribution of error type, real/nonword output, and gender to final-word transcription and feedback accuracy. Preliminary findings suggest a hierarchy of accuracy (h-deletion, h-epenthesis ˃th-substitution) specific to real-word output. Indeed, ASR shows a clear inability to flag nonword errors. A gender bias effect is not apparent; in fact, ASR generally transcribed the sentences recorded by females more accurately. Mistranscriptions unrelated to final items have yet to be examined. Our presentation will address the implications of our findings for L2 teachers/learners and for developers seeking to design ASR specifically for L2 uses.

Suggested citation:

John, P., Cardoso, W., & Johnson, C. (2022). On the adequacy of L2 pronunciation feedback from automatic speech recognition: A focus on Google Translate. In J. Colpaert, Y. Wang, & G. Stockwell (Eds.), Proceedings of the XXIst International CALL Research Conference (pp. 146–154). London: Castledown Publishers. https://doi.org/10.29140/9781914291050-20