Natural language processing is undoubtedly one of the most active fields of research in the machine learning community. In this work we propose a supervised classification system that, given in input a text written in the Italian language, predicts its linguistic complexity in terms of a level of the Common European Framework of Reference for Languages (better known as CEFR). The system was built by considering: (i) a dataset of texts labeled by linguistic experts was collected, (ii) some vectorisation procedures which transform any text to a numerical representation, and (iii) the training of a support vector machine’s model. Experiments were conducted following a statistically sound design and the experimental results show that the system is able to reach a good prediction accuracy.
|Titolo:||Learning to Classify Text Complexity for the Italian Language Using Support Vector Machines|
SANTUCCI, Valentino (Corresponding)
|Data di pubblicazione:||2020|
|Appare nelle tipologie:||4.1 Contributo in Atti di convegno|
File in questo prodotto:
|Santucci2020_Chapter_LearningToClassifyTextComplexi.pdf||Versione editoriale||Versione Editoriale (PDF)||NON PUBBLICO - Accesso chiuso||Administrator Richiedi una copia|
|_ICCSA2020__Malt.pdf||Preprint||Documento in Pre-print||Open Access Visualizza/Apri|