This study explores the applicability of Automated Essay Scoring (AES) systems for assessing written production in Italian as a second or foreign language, with a particular focus on ChatGPT. In response to the growing demand for scalable and efficient assessment methods, the research presents a comparative analysis of automated and human scoring practices. It investigates potential discrepancies between the two approaches, examining how these may vary across CEFR proficiency levels and linguistic dimensions. The research is based on a sub corpus of the CELI Corpus, which comprises authentic learner texts from CELI certification exams at CEFR levels B1 to C2, and it is structured in two phases. The first consists of a pilot study, where a sample of texts was repeatedly evaluated using a ChatGPT-based model. The model was guided by a carefully crafted prompt that incorporated CEFR levels and the CELI scoring criteria to mirror human evaluation practices in terms of content and reasoning. This phase assessed the model’s internal consistency and reliability before proceeding to the comparison with human rater. The second phase involved a comparative analysis of 800 texts, evaluated by both ChatGPT and human raters across four linguistic dimensions: lexical, grammatical, sociolinguistic, and discourse coherence and cohesion. The pilot phase showed high internal consistency in ChatGPT's scoring, whereas the main study revealed partial alignment with human scores across some linguistic dimensions, with significative discrepancies observed at different CEFR levels. The study concludes by proposing a hybrid approach to language assessment, in which AES tools may complement human judgment, particularly in large-scale contexts. The findings aim to contribute to the broader discourse on the integration of AES in language testing and to suggest directions for future research.

Automated Essay Scoring e valutazione umana: un’indagine comparativa / Sbardella, Talia. - (2025 Dec 22).

Automated Essay Scoring e valutazione umana: un’indagine comparativa

SBARDELLA, TALIA
2025-12-22

Abstract

This study explores the applicability of Automated Essay Scoring (AES) systems for assessing written production in Italian as a second or foreign language, with a particular focus on ChatGPT. In response to the growing demand for scalable and efficient assessment methods, the research presents a comparative analysis of automated and human scoring practices. It investigates potential discrepancies between the two approaches, examining how these may vary across CEFR proficiency levels and linguistic dimensions. The research is based on a sub corpus of the CELI Corpus, which comprises authentic learner texts from CELI certification exams at CEFR levels B1 to C2, and it is structured in two phases. The first consists of a pilot study, where a sample of texts was repeatedly evaluated using a ChatGPT-based model. The model was guided by a carefully crafted prompt that incorporated CEFR levels and the CELI scoring criteria to mirror human evaluation practices in terms of content and reasoning. This phase assessed the model’s internal consistency and reliability before proceeding to the comparison with human rater. The second phase involved a comparative analysis of 800 texts, evaluated by both ChatGPT and human raters across four linguistic dimensions: lexical, grammatical, sociolinguistic, and discourse coherence and cohesion. The pilot phase showed high internal consistency in ChatGPT's scoring, whereas the main study revealed partial alignment with human scores across some linguistic dimensions, with significative discrepancies observed at different CEFR levels. The study concludes by proposing a hybrid approach to language assessment, in which AES tools may complement human judgment, particularly in large-scale contexts. The findings aim to contribute to the broader discourse on the integration of AES in language testing and to suggest directions for future research.
22-dic-2025
Italiano
36
2022/2023
L​inguistica e didattica delle lingue
Dottorato di ricerca in Scienze linguistiche filologico-letterarie e politico-sociali 
DOLCI, Roberto
SANTUCCI, Valentino
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12071/50168
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact