Natural language processing is undoubtedly one of the most active fields of research in the machine learning community. In this work we propose a supervised classification system that, given in input a text written in the Italian language, predicts its linguistic complexity in terms of a level of the Common European Framework of Reference for Languages (better known as CEFR). The system was built by considering: (i) a dataset of texts labeled by linguistic experts was collected, (ii) some vectorisation procedures which transform any text to a numerical representation, and (iii) the training of a support vector machine’s model. Experiments were conducted following a statistically sound design and the experimental results show that the system is able to reach a good prediction accuracy.
Learning to Classify Text Complexity for the Italian Language Using Support Vector Machines
Santucci, Valentino
;Forti, Luciana;Spina, Stefania;
2020-01-01
Abstract
Natural language processing is undoubtedly one of the most active fields of research in the machine learning community. In this work we propose a supervised classification system that, given in input a text written in the Italian language, predicts its linguistic complexity in terms of a level of the Common European Framework of Reference for Languages (better known as CEFR). The system was built by considering: (i) a dataset of texts labeled by linguistic experts was collected, (ii) some vectorisation procedures which transform any text to a numerical representation, and (iii) the training of a support vector machine’s model. Experiments were conducted following a statistically sound design and the experimental results show that the system is able to reach a good prediction accuracy.File | Dimensione | Formato | |
---|---|---|---|
Santucci2020_Chapter_LearningToClassifyTextComplexi.pdf
non disponibili
Descrizione: Versione editoriale
Tipologia:
Versione Editoriale (PDF)
Licenza:
NON PUBBLICO - Accesso chiuso
Dimensione
927.54 kB
Formato
Adobe PDF
|
927.54 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
_ICCSA2020__Malt.pdf
accesso aperto
Descrizione: Preprint
Tipologia:
Documento in Pre-print
Licenza:
Creative commons
Dimensione
455.55 kB
Formato
Adobe PDF
|
455.55 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.