This study describes and evaluates a multi-method approach for identifying and extracting collocations to develop a learner Italian collocation dictionary. The approach integrates part-of-speech tagging and dependency parsing to extract six syntactic relations from a reference corpus of Italian. The initial set of candidates was gradually reduced using frequency, dispersion, and association measures. This set was then evaluated by comparing it with existing collocation dictionaries and gathering expert judgments on which collocations should be included. Combining these two evaluations, further refined the list. Moreover, the effect of statistical measures on expert judgments was investigated. Results revealed that dispersion and association measures positively influenced human evaluations, while higher frequency often correlated with negative ratings. This triangulation of corpus-based and statistical methods, human judgements and comparison with existing dictionaries captures collocations widely used across genres, suitable for inclusion in a learner dictionary, offering a useful tool for learners while contributing to corpus-based collocation research.
Developing a learner dictionary of collocations: description and evaluation of a multi-method approach
Spina, Stefania
;Fioravanti, Irene;Zanda, Fabio;Forti, Luciana;Gervasi, Osvaldo
2026-01-01
Abstract
This study describes and evaluates a multi-method approach for identifying and extracting collocations to develop a learner Italian collocation dictionary. The approach integrates part-of-speech tagging and dependency parsing to extract six syntactic relations from a reference corpus of Italian. The initial set of candidates was gradually reduced using frequency, dispersion, and association measures. This set was then evaluated by comparing it with existing collocation dictionaries and gathering expert judgments on which collocations should be included. Combining these two evaluations, further refined the list. Moreover, the effect of statistical measures on expert judgments was investigated. Results revealed that dispersion and association measures positively influenced human evaluations, while higher frequency often correlated with negative ratings. This triangulation of corpus-based and statistical methods, human judgements and comparison with existing dictionaries captures collocations widely used across genres, suitable for inclusion in a learner dictionary, offering a useful tool for learners while contributing to corpus-based collocation research.| File | Dimensione | Formato | |
|---|---|---|---|
|
10.1515_cllt-2025-0008.pdf
accesso aperto
Tipologia:
Versione Editoriale (PDF)
Licenza:
Creative commons
Dimensione
2.36 MB
Formato
Adobe PDF
|
2.36 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
