This article is concerned with experiments on the automatic induction of Italian semantic verb Classes using k-Means, a standard clustering technique, to the task of verifying if it is plausible to find a tight connection between the meaning components of a verb and its syntactic behaviour. The theoretical foundation has been established in extensive works on semantic verb classes such as (Levin (1993)) for English and (Schulte im Walde (2002), (2003), (2004), (2006)) for German: each verb class contains verbs which are similar in their meaning and in their syntactic properties. Founding our work on this hypothesis, we have conducted a corpus-based study on “La Repubblica” corpus, one of the leading corpora freely available for the Italian language, to subsequently obtain an automatic classification of a sample of Italian verbs. Using probability distributions over verb subcategorisation frames, we obtained an intuitively plausible clustering of 200 verbs into 40, 24 and 10 classes. The automatic clustering was evaluated against independently motivated, handconstructed semantic verb classes. A series of post-hoc cluster analysis explored the influence of specific frames and frame groups on the coherence of the verb classes, and supported the validity of the syntactic-semantic hypothesis.
Scheda prodotto non validato
Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo
|Titolo:||Analisi e classificazione automatica dei verbi italiani: uno studio sul corpus “La Repubblica"|
|Data di pubblicazione:||2013|
|Appare nelle tipologie:||1.1 Articolo in rivista|