The contents of the present research are based on the real learning and linguistic needs registered among non-native students, who have to become more and more aware in knowing how to do and how to communicate something in an academic daily situation. Our aim is to spread, through computational methods based on the usage of corpora and statistical tools, the Italian spoken and written academic lexicon. In fact, the academic lexicon is tightly linked to the learning and didactic activities with which students daily come in contact with during lessons, exams, conferences and so on. Keywords: Innovation, technology, research projects, etc. The immediate access to the meaning of a word allows students to focus on the content of what they try to explain, rather than on the way to explain that concept. Both the frequency with which certain words are used in a text, and their salience in the definition of its meaning, can bring important cues about it and also about its author, because the choices made are never fortuitous (Archer 2009). After having developed both a corpus for Italian spoken and written academic lexicon, our aim has been to realize two frequency lists of the Italian non-technical words, widely used in the academic written and oral communication. The considered corpora are composed of over one million words, belonging to different subject areas, textual typologies and communicative situations. The lexical units extracted from them, are then ordered by frequency and selected by a statistical measure of dispersion within the different above-mentioned areas. The frequency of the lexical occurrences that constitute a corpus is a key element in all those tasks concerned with the recognition and the comprehension of words, both pronounced or written. This factor influences reading, writing and productive skills, as well as the processes of acquisition and development of a language, especially when we have to face with a particular and non-generic genre, as that of the Academic Lexicon. The lists are directly connected with the need of expand the academic lexicon of the non-native students who learn Italian as second language at the University. With the aim of extracting relevant contextual cues, we have add to the over mentioned frequency parameter, the index of keyness, which allows to find the keywords of a text; that is to say those words that typify a text at its best. This project would evaluate the validity of these statistic measures, reflecting on their ability of interpreting and describing the linguistic context in which textual data are integrated, such as the academic one.

A corpus based statistical analysis of the peculiar lexicon of Italian written and spoken academic discourse through the usage of two parameters: frequency and keyness

Peppoloni D
2014-01-01

Abstract

The contents of the present research are based on the real learning and linguistic needs registered among non-native students, who have to become more and more aware in knowing how to do and how to communicate something in an academic daily situation. Our aim is to spread, through computational methods based on the usage of corpora and statistical tools, the Italian spoken and written academic lexicon. In fact, the academic lexicon is tightly linked to the learning and didactic activities with which students daily come in contact with during lessons, exams, conferences and so on. Keywords: Innovation, technology, research projects, etc. The immediate access to the meaning of a word allows students to focus on the content of what they try to explain, rather than on the way to explain that concept. Both the frequency with which certain words are used in a text, and their salience in the definition of its meaning, can bring important cues about it and also about its author, because the choices made are never fortuitous (Archer 2009). After having developed both a corpus for Italian spoken and written academic lexicon, our aim has been to realize two frequency lists of the Italian non-technical words, widely used in the academic written and oral communication. The considered corpora are composed of over one million words, belonging to different subject areas, textual typologies and communicative situations. The lexical units extracted from them, are then ordered by frequency and selected by a statistical measure of dispersion within the different above-mentioned areas. The frequency of the lexical occurrences that constitute a corpus is a key element in all those tasks concerned with the recognition and the comprehension of words, both pronounced or written. This factor influences reading, writing and productive skills, as well as the processes of acquisition and development of a language, especially when we have to face with a particular and non-generic genre, as that of the Academic Lexicon. The lists are directly connected with the need of expand the academic lexicon of the non-native students who learn Italian as second language at the University. With the aim of extracting relevant contextual cues, we have add to the over mentioned frequency parameter, the index of keyness, which allows to find the keywords of a text; that is to say those words that typify a text at its best. This project would evaluate the validity of these statistic measures, reflecting on their ability of interpreting and describing the linguistic context in which textual data are integrated, such as the academic one.
2014
Quantitative analysis; Corpus Linguistics; Statistical Index
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12071/11212
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact