In this paper I describe the Academic Italian Word List (AIWL), a frequency list of the most common non- technical words used in written academic communication. The project arises from the need to expand academic vocabulary of non-native students of Italian Universities. The AIWL is a corpus-based list, being extracted from a balanced, POS-tagged and lemmatized corpus of Italian academic written language (the AIC, Academic Italian Corpus). The AIC includes 1 million words and is composed of 240 texts belonging to different subject areas and textual typologies. The lexical units extracted from the AIC (single words as well as word combinations) are ordered by frequency and selected by a statistical measure of dispersion within the different subject areas. The AIWL aims to provide a computational and lexicographical resource to support the constitution of natural language processing applications to be used in an online learning environment. This paper describes in detail the theoretical assumptions and the methodology of extraction of the frequency list, and it outlines the main features of the lexical units that are included in the AIWL.

AIWL: una lista di frequenza dell’italiano accademico

SPINA S
2010-01-01

Abstract

In this paper I describe the Academic Italian Word List (AIWL), a frequency list of the most common non- technical words used in written academic communication. The project arises from the need to expand academic vocabulary of non-native students of Italian Universities. The AIWL is a corpus-based list, being extracted from a balanced, POS-tagged and lemmatized corpus of Italian academic written language (the AIC, Academic Italian Corpus). The AIC includes 1 million words and is composed of 240 texts belonging to different subject areas and textual typologies. The lexical units extracted from the AIC (single words as well as word combinations) are ordered by frequency and selected by a statistical measure of dispersion within the different subject areas. The AIWL aims to provide a computational and lexicographical resource to support the constitution of natural language processing applications to be used in an online learning environment. This paper describes in detail the theoretical assumptions and the methodology of extraction of the frequency list, and it outlines the main features of the lexical units that are included in the AIWL.
2010
9788879164509
File in questo prodotto:
File Dimensione Formato  
JADT2010.pdf

non disponibili

Licenza: Non specificato
Dimensione 1.31 MB
Formato Adobe PDF
1.31 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12071/2442
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact