Automated Essay Scoring e valutazione umana: un’indagine comparativa

IRIS

This study explores the applicability of Automated Essay Scoring (AES) systems for assessing written production in Italian as a second or foreign language, with a particular focus on ChatGPT. In response to the growing demand for scalable and efficient assessment methods, the research presents a comparative analysis of automated and human scoring practices. It investigates potential discrepancies between the two approaches, examining how these may vary across CEFR proficiency levels and linguistic dimensions. The research is based on a sub corpus of the CELI Corpus, which comprises authentic learner texts from CELI certification exams at CEFR levels B1 to C2, and it is structured in two phases. The first consists of a pilot study, where a sample of texts was repeatedly evaluated using a ChatGPT-based model. The model was guided by a carefully crafted prompt that incorporated CEFR levels and the CELI scoring criteria to mirror human evaluation practices in terms of content and reasoning. This phase assessed the model’s internal consistency and reliability before proceeding to the comparison with human rater. The second phase involved a comparative analysis of 800 texts, evaluated by both ChatGPT and human raters across four linguistic dimensions: lexical, grammatical, sociolinguistic, and discourse coherence and cohesion. The pilot phase showed high internal consistency in ChatGPT's scoring, whereas the main study revealed partial alignment with human scores across some linguistic dimensions, with significative discrepancies observed at different CEFR levels. The study concludes by proposing a hybrid approach to language assessment, in which AES tools may complement human judgment, particularly in large-scale contexts. The findings aim to contribute to the broader discourse on the integration of AES in language testing and to suggest directions for future research.

Automated Essay Scoring e valutazione umana: un’indagine comparativa / Sbardella, Talia. - (2025 Dec 22).

Automated Essay Scoring e valutazione umana: un’indagine comparativa

SBARDELLA, TALIA

2025-12-22

Abstract

This study explores the applicability of Automated Essay Scoring (AES) systems for assessing written production in Italian as a second or foreign language, with a particular focus on ChatGPT. In response to the growing demand for scalable and efficient assessment methods, the research presents a comparative analysis of automated and human scoring practices. It investigates potential discrepancies between the two approaches, examining how these may vary across CEFR proficiency levels and linguistic dimensions. The research is based on a sub corpus of the CELI Corpus, which comprises authentic learner texts from CELI certification exams at CEFR levels B1 to C2, and it is structured in two phases. The first consists of a pilot study, where a sample of texts was repeatedly evaluated using a ChatGPT-based model. The model was guided by a carefully crafted prompt that incorporated CEFR levels and the CELI scoring criteria to mirror human evaluation practices in terms of content and reasoning. This phase assessed the model’s internal consistency and reliability before proceeding to the comparison with human rater. The second phase involved a comparative analysis of 800 texts, evaluated by both ChatGPT and human raters across four linguistic dimensions: lexical, grammatical, sociolinguistic, and discourse coherence and cohesion. The pilot phase showed high internal consistency in ChatGPT's scoring, whereas the main study revealed partial alignment with human scores across some linguistic dimensions, with significative discrepancies observed at different CEFR levels. The study concludes by proposing a hybrid approach to language assessment, in which AES tools may complement human judgment, particularly in large-scale contexts. The findings aim to contribute to the broader discourse on the integration of AES in language testing and to suggest directions for future research.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data  di discussione
	
				22-dic-2025
			
	Lingua/e
	
				Italiano
			
	Ciclo
	
				36
			
	Anno Accademico
	
				2022/2023
			
	Indirizzo corso di dottorato
	
				L​inguistica e didattica delle lingue
			
	Corso di dottorato
	
				Dottorato di ricerca in Scienze linguistiche filologico-letterarie e politico-sociali 
			
	Tutor afferenti all'Ateneo
	
				DOLCI, Roberto
			
	Supervisori e coordinatori interni
	
				SANTUCCI, Valentino

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12071/50168

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

social impact