In this article, we investigate how speakers can be categorised based on their language background in the field of Learner Corpus Research (LCR). Specifically, we discuss three key aspects: first, the theoretical assumptions and methodological choices made in learner corpus design, second the integration of a holistic perspective for speaker categorisation in LCR and third the consequences that different categorisations might have on study outcomes. Through a comprehensive review of corpora used in the field, we identify the most common terms, definitions and criteria of categorisation used to describe a speaker's language background. Focusing on the most central metadata encoding language backgrounds, the L1 metadata, we inspect different operationalisations made and scrutinise the theoretical assumptions underlying them. Drawing on research on plurilingualism, we propose a holistic view of speaker's language background for Learner Corpus Research, combining various aspects of speaker's language use by methods inspired from the Dominant Language Constellation framework. We apply this methodology to re-evaluate the language categorisation system in LEONIDE, a multilingual corpus of Italian, German and English texts from secondary school students of diverse language backgrounds. We use the same corpus to evaluate the consequences of using different categorisations of the students on the outcome of possible linguistic studies. Despite a generally high overlap between study results across categorisations, we observe that variables combining multiple aspects of the speakers’ language backgrounds seem to explain group differences for more of the linguistic features investigated.

Categorising speakers’ language background: Theoretical assumptions and methodological challenges for learner corpus research

Lopopolo O
;
Glaznieks A;Spina S
2024-01-01

Abstract

In this article, we investigate how speakers can be categorised based on their language background in the field of Learner Corpus Research (LCR). Specifically, we discuss three key aspects: first, the theoretical assumptions and methodological choices made in learner corpus design, second the integration of a holistic perspective for speaker categorisation in LCR and third the consequences that different categorisations might have on study outcomes. Through a comprehensive review of corpora used in the field, we identify the most common terms, definitions and criteria of categorisation used to describe a speaker's language background. Focusing on the most central metadata encoding language backgrounds, the L1 metadata, we inspect different operationalisations made and scrutinise the theoretical assumptions underlying them. Drawing on research on plurilingualism, we propose a holistic view of speaker's language background for Learner Corpus Research, combining various aspects of speaker's language use by methods inspired from the Dominant Language Constellation framework. We apply this methodology to re-evaluate the language categorisation system in LEONIDE, a multilingual corpus of Italian, German and English texts from secondary school students of diverse language backgrounds. We use the same corpus to evaluate the consequences of using different categorisations of the students on the outcome of possible linguistic studies. Despite a generally high overlap between study results across categorisations, we observe that variables combining multiple aspects of the speakers’ language backgrounds seem to explain group differences for more of the linguistic features investigated.
2024
Learner corpora, Metadata, Multilingualism, Dominant language constellation, Cluster analysis
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S2772766124000764-main.pdf

accesso aperto

Tipologia: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 1.43 MB
Formato Adobe PDF
1.43 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12071/44128
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact