Analisi e classificazione automatica dei verbi italiani: uno studio sul corpus “La Repubblica”

Peppoloni, Diana

AUC PHILOLOGICA

AUC Philologica (Acta Universitatis Carolinae Philologica) je akademický časopis publikující jak lingvistické, tak literárně historické a teoretické studie. Nedílnou součástí časopisu jsou i recenze odborných knih a zprávy z akademického prostředí.

Časopis je indexován v databázích CEEOL, DOAJ, EBSCO a ERIH PLUS.

AUC PHILOLOGICA, Vol 2013 No 2 (2013), 91–108

Analisi e classificazione automatica dei verbi italiani: uno studio sul corpus “La Repubblica”

Diana Peppoloni

zveřejněno: 29. 12. 2014

Abstract

An Analys Is and Automatic Classification of Italian Verbs: a Study Based on the “La Repubblica” Corpus This article is concerned with experiments on the automatic induction of Italian semantic verb classes using k-Means, a standard clustering technique, for the purpose of verifying the plausibility of finding a direct connection between the meaning-bearing components of a verb and its syntactic behaviour. A theoretical foundation has been established in extensive works on semantic verb classes such as Levin (1993) for English and Schulte im Walde (2002, 2003, 2004, 2006) for German: each verb class contains verbs which are similar in their meaning and in their syntactic properties. Basing our work on this hypothesis, we have conducted a study of the “La Repubblica” corpus, one of the leading corpora freely available for the Italian language, to subsequently obtain an automatic classification of a sample of Italian verbs. Using probability distributions over verb subcategorisation frames, we obtained an intuitively plausible clustering of 200 verbs into 40, 24, and 10 classes. The automatic clustering was evaluated against independently motivated, hand-constructed semantic verb classes. A series of post-hoc cluster analysis explored the influence of specific frames and frame groups on the coherence of the verb classes, and supported the validity of the syntactic-semantic hypothesis.

klíčová slova: syntactic-semantic hypothesis; clustering; automatic classification; subcategorization frames; written Italian corpus ipotesi sintattico-semantica; clustering; classificazione automatica; frames di sottocategorizzazione; corpus dell’italiano scritto

157 x 230 mm
vychází: 3 x ročně
cena tištěného čísla: 150 Kč
ISSN: 0567-8269
E-ISSN: 2464-6830

Ke stažení

Philologica_2_2013_final_07_Peppoloni.pdf

Sdílet