Developing a discipline-specific corpus and high-frequency word list for science and engineering students in graduate school

Downloads

Published

2022-12-31

Section: Articles

Authors

  • Suwako Uehara The University of Electro-Communications, Japan
  • Hibiya Haraki The University of Electro-Communications, Japan
  • Stuart McLean Momoyama Gakuin University, Japan
DOI: https://doi.org/10.7820/vli.v11.2.uehara

Abstract

Japanese graduate school students in the field of science and engineering need to read academic research in their second language (L2), and such tasks can be challenging. Studies showed a strong (0.78) correlation between vocabulary size and reading comprehension (McLean et al., 2020), and providing high-frequency word lists could enhance comprehension. In this work-in-progress, 1.35 million tokens of professor-recommended reading materials were used to investigate a method to create a vocabulary list that would benefit science majors in graduate school, the procedures to create a corpus and a high-frequency word list efficiently, and the steps required to create a cleaner corpus. This paper outlines a systematic literature-informed method that includes input from professors in the field, the combined use of tailored script in MATLAB and AntCont (Anthony, 2022) generated corpus and high-frequency words efficiently, and repeated comparison of original PDFs and the matching text files, then adding MATLAB script to deal with specific issues created by a cleaner text. This proposed method can be applied in other contexts to enhance the generation of high-frequency word lists.


Keywords: corpus, graduate school, high-frequency word list, science majors

Suggested Citation:

Uehara, S., Haraki, H., & McLean, S. (2022). Developing a discipline-specific corpus and high-frequency word list for science and engineering students in graduate school. Vocabulary Learning and Instruction, 11(2), 57–68. https://doi.org/10.7820/vli.v11.2.uehara

Metrics

Metrics Loading ...

Most read articles by the same author(s)