Natural language processing is undoubtedly one of the most active fields of research in the machine learning community. In this work we propose a supervised classification system that, given in input a text written in the Italian language, predicts its linguistic complexity in terms of a level of the Common European Framework of Reference for Languages (better known as CEFR). The system was built by considering: (i) a dataset of texts labeled by linguistic experts was collected, (ii) some vectorisation procedures which transform any text to a numerical representation, and (iii) the training of a support vector machine’s model. Experiments were conducted following a statistically sound design and the experimental results show that the system is able to reach a good prediction accuracy.

Learning to Classify Text Complexity for the Italian Language Using Support Vector Machines

Santucci, Valentino
;
Forti, Luciana;Spina, Stefania;
2020

Abstract

Natural language processing is undoubtedly one of the most active fields of research in the machine learning community. In this work we propose a supervised classification system that, given in input a text written in the Italian language, predicts its linguistic complexity in terms of a level of the Common European Framework of Reference for Languages (better known as CEFR). The system was built by considering: (i) a dataset of texts labeled by linguistic experts was collected, (ii) some vectorisation procedures which transform any text to a numerical representation, and (iii) the training of a support vector machine’s model. Experiments were conducted following a statistically sound design and the experimental results show that the system is able to reach a good prediction accuracy.
978-3-030-58801-4
978-3-030-58802-1
File in questo prodotto:
File Dimensione Formato  
Santucci2020_Chapter_LearningToClassifyTextComplexi.pdf

non disponibili

Descrizione: Versione editoriale
Tipologia: Versione Editoriale (PDF)
Licenza: NON PUBBLICO - Accesso chiuso
Dimensione 927.54 kB
Formato Adobe PDF
927.54 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
_ICCSA2020__Malt.pdf

accesso aperto

Descrizione: Preprint
Tipologia: Documento in Pre-print
Licenza: Creative commons
Dimensione 455.55 kB
Formato Adobe PDF
455.55 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/20.500.12071/20968
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact