In this article, we investigate how speakers can be categorised based on their language background in the field of Learner Corpus Research (LCR). Specifically, we discuss three key aspects: first, the theoretical assumptions and methodological choices made in learner corpus design, second the integration of a holistic perspective for speaker categorisation in LCR and third the consequences that different categorisations might have on study outcomes. Through a comprehensive review of corpora used in the field, we identify the most common terms, definitions and criteria of categorisation used to describe a speaker's language background. Focusing on the most central metadata encoding language backgrounds, the L1 metadata, we inspect different operationalisations made and scrutinise the theoretical assumptions underlying them. Drawing on research on plurilingualism, we propose a holistic view of speaker's language background for Learner Corpus Research, combining various aspects of speaker's language use by methods inspired from the Dominant Language Constellation framework. We apply this methodology to re-evaluate the language categorisation system in LEONIDE, a multilingual corpus of Italian, German and English texts from secondary school students of diverse language backgrounds. We use the same corpus to evaluate the consequences of using different categorisations of the students on the outcome of possible linguistic studies. Despite a generally high overlap between study results across categorisations, we observe that variables combining multiple aspects of the speakers’ language backgrounds seem to explain group differences for more of the linguistic features investigated.
Categorising speakers’ language background: Theoretical assumptions and methodological challenges for learner corpus research
Lopopolo O
;Glaznieks A;Spina S
2024-01-01
Abstract
In this article, we investigate how speakers can be categorised based on their language background in the field of Learner Corpus Research (LCR). Specifically, we discuss three key aspects: first, the theoretical assumptions and methodological choices made in learner corpus design, second the integration of a holistic perspective for speaker categorisation in LCR and third the consequences that different categorisations might have on study outcomes. Through a comprehensive review of corpora used in the field, we identify the most common terms, definitions and criteria of categorisation used to describe a speaker's language background. Focusing on the most central metadata encoding language backgrounds, the L1 metadata, we inspect different operationalisations made and scrutinise the theoretical assumptions underlying them. Drawing on research on plurilingualism, we propose a holistic view of speaker's language background for Learner Corpus Research, combining various aspects of speaker's language use by methods inspired from the Dominant Language Constellation framework. We apply this methodology to re-evaluate the language categorisation system in LEONIDE, a multilingual corpus of Italian, German and English texts from secondary school students of diverse language backgrounds. We use the same corpus to evaluate the consequences of using different categorisations of the students on the outcome of possible linguistic studies. Despite a generally high overlap between study results across categorisations, we observe that variables combining multiple aspects of the speakers’ language backgrounds seem to explain group differences for more of the linguistic features investigated.File | Dimensione | Formato | |
---|---|---|---|
1-s2.0-S2772766124000764-main.pdf
accesso aperto
Tipologia:
Versione Editoriale (PDF)
Licenza:
Creative commons
Dimensione
1.43 MB
Formato
Adobe PDF
|
1.43 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.