In this paper I describe the Academic Italian Word List (AIWL), a frequency list of the most common non- technical words used in written academic communication. The project arises from the need to expand academic vocabulary of non-native students of Italian Universities. The AIWL is a corpus-based list, being extracted from a balanced, POS-tagged and lemmatized corpus of Italian academic written language (the AIC, Academic Italian Corpus). The AIC includes 1 million words and is composed of 240 texts belonging to different subject areas and textual typologies. The lexical units extracted from the AIC (single words as well as word combinations) are ordered by frequency and selected by a statistical measure of dispersion within the different subject areas. The AIWL aims to provide a computational and lexicographical resource to support the constitution of natural language processing applications to be used in an online learning environment. This paper describes in detail the theoretical assumptions and the methodology of extraction of the frequency list, and it outlines the main features of the lexical units that are included in the AIWL.
AIWL: una lista di frequenza dell’italiano accademico
SPINA S
2010-01-01
Abstract
In this paper I describe the Academic Italian Word List (AIWL), a frequency list of the most common non- technical words used in written academic communication. The project arises from the need to expand academic vocabulary of non-native students of Italian Universities. The AIWL is a corpus-based list, being extracted from a balanced, POS-tagged and lemmatized corpus of Italian academic written language (the AIC, Academic Italian Corpus). The AIC includes 1 million words and is composed of 240 texts belonging to different subject areas and textual typologies. The lexical units extracted from the AIC (single words as well as word combinations) are ordered by frequency and selected by a statistical measure of dispersion within the different subject areas. The AIWL aims to provide a computational and lexicographical resource to support the constitution of natural language processing applications to be used in an online learning environment. This paper describes in detail the theoretical assumptions and the methodology of extraction of the frequency list, and it outlines the main features of the lexical units that are included in the AIWL.File | Dimensione | Formato | |
---|---|---|---|
JADT2010.pdf
non disponibili
Licenza:
Non specificato
Dimensione
1.31 MB
Formato
Adobe PDF
|
1.31 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.