A corpus of short YouTube news videos to inform course design and materials development in an EFL university setting in Japan
Christopher Robert Cooper

DOI: https://doi.org/10.29140/9781914291050-6

In: Proceedings of the XXIst International CALL Research Conference (pp. 33–46)
Edited by Jozef Colpaert, Yijen Wang, Glenn Stockwell (2022)


Abstract

The aim of the current study was to inform the course design of an elective English news listening course at a private university in Japan. YouTube channels from twelve countries across Asia, Africa, Europe and North America were manually selected by the researcher as a source of in class materials and for course participants to view outside the classroom. To create materials for language-focused learning and assess the vocabulary demands of the videos, transcripts from the channels were extracted using the YouTube Data API and Python, and a corpus of 8,286 video transcripts uploaded in 2021 was randomly sampled to represent the channels. The transcripts were cleaned, and frequency lists were created for adjectives, nouns, and verbs at CEFR B1 level and above. In addition, proper noun and multi-word unit frequency lists were created. An online concordancer was created using the open-source tool ShinyConc , so the learners could investigate the usage of the words in the frequency lists by themselves. A Python script was written to assess the lexical coverage of the videos using the CEFR-J wordlist, and the results suggested that learners may need to be at the CEFR B2 level or above to comfortably comprehend short news YouTube videos. Suggestions for future research are made in the paper, and Python code and supplementary data are available at the author’s GitHub page (https://github.com/cooperchris17/yt_short_news ).

Suggested citation:

Cooper, C.R. (2022). A corpus of short YouTube news videos to inform course design and materials development in an EFL university setting in Japan. In J. Colpaert, Y. Wang, & G. Stockwell (Eds.), Proceedings of the XXIst International CALL Research Conference (pp. 33–46). London: Castledown Publishers. https://doi.org/10.29140/9781914291050-6