This article is concerned with experiments on the automatic induction of Italian semantic verb Classes using k-Means, a standard clustering technique, to the task of verifying if it is plausible to find a tight connection between the meaning components of a verb and its syntactic behaviour. The theoretical foundation has been established in extensive works on semantic verb classes such as (Levin (1993)) for English and (Schulte im Walde (2002), (2003), (2004), (2006)) for German: each verb class contains verbs which are similar in their meaning and in their syntactic properties. Founding our work on this hypothesis, we have conducted a corpus-based study on “La Repubblica” corpus, one of the leading corpora freely available for the Italian language, to subsequently obtain an automatic classification of a sample of Italian verbs. Using probability distributions over verb subcategorisation frames, we obtained an intuitively plausible clustering of 200 verbs into 40, 24 and 10 classes. The automatic clustering was evaluated against independently motivated, handconstructed semantic verb classes. A series of post-hoc cluster analysis explored the influence of specific frames and frame groups on the coherence of the verb classes, and supported the validity of the syntactic-semantic hypothesis.

Analisi e classificazione automatica dei verbi italiani: uno studio sul corpus “La Repubblica"

Peppoloni D
2013-01-01

Abstract

This article is concerned with experiments on the automatic induction of Italian semantic verb Classes using k-Means, a standard clustering technique, to the task of verifying if it is plausible to find a tight connection between the meaning components of a verb and its syntactic behaviour. The theoretical foundation has been established in extensive works on semantic verb classes such as (Levin (1993)) for English and (Schulte im Walde (2002), (2003), (2004), (2006)) for German: each verb class contains verbs which are similar in their meaning and in their syntactic properties. Founding our work on this hypothesis, we have conducted a corpus-based study on “La Repubblica” corpus, one of the leading corpora freely available for the Italian language, to subsequently obtain an automatic classification of a sample of Italian verbs. Using probability distributions over verb subcategorisation frames, we obtained an intuitively plausible clustering of 200 verbs into 40, 24 and 10 classes. The automatic clustering was evaluated against independently motivated, handconstructed semantic verb classes. A series of post-hoc cluster analysis explored the influence of specific frames and frame groups on the coherence of the verb classes, and supported the validity of the syntactic-semantic hypothesis.
2013
syntactic/semantic interface; automatic verbal classification; Italian verbal system
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12071/11177
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact