The Contested Role of Technology in Building Better Language Tests

New technologies are part of the fabric of life today, and accordingly, they intersect with language testing in ways that have changed practices in the field.  Both high and low stakes tests are administered and scored by computer, providing testers with flexibility in controlling the administration conditions as well as the power to analyze test takers’ constructed linguistic responses, and to present feedback and detailed score reports.  Test tasks can be administered in multimodal formats to assess listening within particular contexts visible to test takers, in adaptive formats depending on test takers’ needs, and with options such as scaffolding for particular test takers.  Natural language processing technologies—including syntactic parsing, automatic speech recognition, and machine learning—are used for automated scoring of test takers’ constructed written and spoken responses as well as for analysis of the language of written texts appearing in language tests.  Interactive technologies and databases create opportunities for a testing system to learn about the test takers and construct models of test takers’ knowledge that can be used to provide feedback and recommendations.

Many of these potentials are being put into practice in language testing, but each of the affordances offered by technology also raises a new set of issues to be tackled, not the least of which is grasping the meaning and value of new practices operationalized in language tests and applied in validation research.  This paper will illustrate some of the promising technological advances appearing in language tests and assessments today but will also identify three problems that they present for our profession.  First, technology prompts the use of mischievous metaphors, i.e., expressions used to describe language assessment concepts that mislead people about the virtues of technologies as they are currently implemented. Second, the availability of natural language processing tools has resulted in an expanded number of applied linguists analyzing test takers’ constructed responses on language tests and offering perplexing interpretations of their research results.  Third, the use of artificial intelligence techniques in test construction and delivery can result in testing processes that are difficult to justify in a validity argument because they are not transparent even if they appear to work by some measure. These challenges add technological dimensions to the existing impetus to increase assessment literacy both within the field and beyond.

Prof. Carol A. Chapelle, Iowa State University, USA

Carol A. Chapelle is Distinguished Professor of Liberal Arts and Sciences at Iowa State University. She is editor of the Encyclopedia of Applied Linguistics (Wiley, 2013) as well as co-editor of Language Testing and of the Cambridge Applied Linguistics Series. She is past president of the American Association for Applied Linguistics and former editor of TESOL Quarterly. Her research investigates the use of technology in language learning and assessment, the topic of many of her books and research articles.

Exploring the Potential of Natural Language Processing for Language Testing and Assessment

Natural Language Processing (NLP) is a research area in the intersection of artificial intelligence and linguistics that is concerned with programming computers to understand, process, and generate human language. In this talk, we will explore what role NLP can play in the area of Language Testing and Assessment. We will focus on two tasks in particular: the automatic development and the automatic scoring of tests. We will put a special emphasis on multi-lingual aspects of both tasks.

Developing tests can be assisted by NLP methods which automatically predict and adapt test difficulty. We showcase this at the example of C-tests, where we predict the difficulty of individual gaps as well as the whole text in different languages. We also present gap-fill bundles, a new test format that combines a high potential for automation with interesting -and yet largely unexplored- test characteristics.

Scoring test responses can be supported by automatic content assessment models. We have a look at the current state of the art in the field and discuss constraints under which the models are expected to perform well. These factors include spelling errors, the task domain, and the language of the test.

Prof. Torsten Zesch, University Duisburg-Essen, Germany

Torsten Zesch leads the Language Technology Lab at the University of Duisburg-Essen, Germany ( He holds a doctoral degree in Computer Science from Technische Universität Darmstadt and has worked as a substitute professor at the German Institute for International Pedagogical Research (Frankfurt, Germany). His research interests include the processing of non-standard, error-prone language as found in social media or learner language. In the area of language testing and assessment, he focuses on using natural language processing for generating language exercises as well as the automatic scoring of essays and free-text answers.