Batty, Aaron



Faculty of Nursing and Medical Care (Shonan Fujisawa)



External Links


Research Areas 【 Display / hide

  • Humanities & Social Sciences / English linguistics

Research Keywords 【 Display / hide

  • Assessment

  • Item Response Theory (IRT)

  • Language Testing


Papers 【 Display / hide

  • An eye-tracking study of attention to visual cues in L2 listening tests

    Batty A.O.

    Language Testing (Language Testing)  38 ( 4 ) 511 - 535 2021.10

    ISSN  02655322

     View Summary

    Nonverbal and other visual cues are well established as a critical component of human communication. Under most circumstances, visual information is available to aid in the comprehension and interpretation of spoken language. Citing these facts, many L2 assessment researchers have studied video-mediated listening tests through score comparisons with audio tests, by measuring the amount of time spent watching, and by attempting to determine examinee viewing behavior through self-reports. However, the specific visual cues to which examinees attend have heretofore not been measured objectively. The present research employs eye-tracking methodology to determine the amounts of time 12 participants viewed specific visual cues on a six-item, video-mediated L2 listening test. Seventy-two scanpath-overlaid videos of viewing behavior were manually coded for visual cues at 0.10-second intervals. Cued retrospective interviews based on eye-tracking data provided reasons for the observed behaviors. Faces were found to occupy the majority (81.74%) of visual dwell time, with participants largely splitting their time between the speaker’s eyes and mouth. Detected gesture viewing was negligible. The reason given for most viewing behavior was determining characters’ emotional states. These findings suggest that the primary difference between audio- and video-mediated L2 listening tests of conversational content is the absence or presence of facial expressions.

  • Predicting L2 reading proficiency with modalities of vocabulary knowledge: A bootstrapping approach

    McLean S., Stewart J., Batty A.

    Language Testing (Language Testing)  37 ( 3 ) 389 - 411 2020

    ISSN  02655322

     View Summary

    © The Author(s) 2020. Vocabulary’s relationship to reading proficiency is frequently cited as a justification for the assessment of L2 written receptive vocabulary knowledge. However, to date, there has been relatively little research regarding which modalities of vocabulary knowledge have the strongest correlations to reading proficiency, and observed differences have often been statistically non-significant. The present research employs a bootstrapping approach to reach a clearer understanding of relationships between various modalities of vocabulary knowledge to reading proficiency. Test-takers (N = 103) answered 1000 vocabulary test items spanning the third 1000 most frequent English words in the New General Service List corpus (Browne, Culligan, & Phillips, 2013). Items were answered under four modalities: Yes/No checklists, form recall, meaning recall, and meaning recognition. These pools of test items were then sampled with replacement to create 1000 simulated tests ranging in length from five to 200 items and the results were correlated to the Test of English for International Communication (TOEIC®) Reading scores. For all examined test lengths, meaning-recall vocabulary tests had the highest average correlations to reading proficiency, followed by form-recall vocabulary tests. The results indicated that tests of vocabulary recall are stronger predictors of reading proficiency than tests of vocabulary recognition, despite the theoretically closer relationship of vocabulary recognition to reading.

  • Validity evidence for a sentence repetition test of Swiss German Sign Language

    Haug T., Batty A., Venetz M., Notter C., Girard-Groeber S., Knoch U., Audeoud M.

    Language Testing (Language Testing)  37 ( 3 ) 412 - 434 2020

    ISSN  02655322

     View Summary

    © The Author(s) 2020. In this study we seek evidence of validity according to the socio-cognitive framework (Weir, 2005) for a new sentence repetition test (SRT) for young Deaf L1 Swiss German Sign Language (DSGS) users. SRTs have been developed for various purposes for both spoken and sign languages to assess language development in children. In order to address the need for tests to assess the grammatical development of Deaf L1 DSGS users in a school context, we developed an SRT. The test targets young learners aged 6–17 years, and we administered it to 46 Deaf students aged 6.92–17.33 (M = 11.17) years. In addition to the young learner data, we collected data from Deaf adults (N = 14) and from a sub-sample of the children (n = 19) who also took a test of DSGS narrative comprehension, serving as a criterion measure. We analyzed the data with many-facet Rasch modeling, regression analysis, and analysis of covariance. The results show evidence of scoring, criterion, and context validity, suggesting the suitability of the SRT for the intended purpose, and will inform the revision of the test for future use as an instrument to assess the sign language development of Deaf children.

  • Going online: The effect of mode of delivery on performances and perceptions on an English L2 writing test suite

    T Brunfaut, L Harding, AO Batty

    Assessing Writing (Assessing Writing)  36   3 - 18 2018.04

    Research paper (scientific journal), Joint Work, Accepted,  ISSN  10752935

     View Summary

    © 2018 The Authors In response to changing stakeholder needs, large-scale language test providers have increasingly considered the feasibility of delivering paper-based examinations online. Evidence is required, however, to determine whether online delivery of writing tests results in changes to writing performance reflected in differential test scores across delivery modes, and whether test-takers hold favourable perceptions of online delivery. The current study aimed to determine the effect of delivery mode on the two writing tasks (reading-into-writing and extended writing) within the Trinity College London Integrated Skills in English (ISE) test suite across three proficiency levels (CEFR B1-C1). 283 test-takers (107 at ISE I/B1, 109 at ISE II/B2, and 67 at ISE III/C1) completed both writing tasks in paper-based and online mode. Test-takers also completed a questionnaire to gauge perceptions of the impact, usability and fairness of the delivery modes. Many-facet Rasch measurement (MFRM) analysis of scores revealed that delivery mode had no discernible effect, apart from the reading-into-writing task at ISE I, where the paper-based mode was slightly easier. Test-takers generally held more positive perceptions of the online delivery mode, although technical problems were reported. Findings are discussed with reference to the need for further research into interactions between delivery mode, task and level.

  • Investigating the impact of nonverbal communication cues on listening item types

    A Batty

    Language Learning and Language Teaching (John Benjamins)  50   161 - 175 2018

    Research paper (other academic), Single Work,  ISSN  15699471

display all >>

Papers, etc., Registered in KOARA 【 Display / hide

Research Projects of Competitive Funds, etc. 【 Display / hide

  • An Objective Test of Communicative English Proficiency


    Keio University, BATTY Aaron Olaf, JEFFREY Stewart, Grant-in-Aid for Challenging Exploratory Research, No Setting

     View Summary

    The researchers developed a new test of communicative speaking proficiency, called the Objective Communicative Speaking Test (OCST). The OCST is a timed information-gap task-based test delivered via tablet computers. The OCST measures the time required for a speaker to relate a piece of information unknown to the rater, on the assumption that more traditional components of oral proficiency will contribute to time to completion. The test was administered to a sample of 86 first- (L1s) and second-language (L2s) speakers of English, and their task completion times were assigned an L1-referenced score. The data were analyzed via many-facet Rasch analysis. As hypothesized, the objective design of the test reduced rater effects, and raters could be excluded from the model. An examinee reliability coefficient of 0.88 was observed, surpassing that of most subjective tests of speaking proficiency.


Courses Taught 【 Display / hide











display all >>