AUC Philologica (Acta Universitatis Carolinae Philologica) je akademický časopis publikující jak lingvistické, tak literárně historické a teoretické studie. Nedílnou součástí časopisu jsou i recenze odborných knih a zprávy z akademického prostředí.
Časopis je indexován v databázích CEEOL, DOAJ, EBSCO a ERIH PLUS.
AUC PHILOLOGICA, Vol 2013 No 2 (2013), 91–108
Analisi e classificazione automatica dei verbi italiani: uno studio sul corpus “La Repubblica”
Diana Peppoloni
zveřejněno: 29. 12. 2014
Abstract
An Analys Is and Automatic Classification of Italian Verbs: a Study Based on the “La Repubblica” Corpus This article is concerned with experiments on the automatic induction of Italian semantic verb classes using k-Means, a standard clustering technique, for the purpose of verifying the plausibility of finding a direct connection between the meaning-bearing components of a verb and its syntactic behaviour. A theoretical foundation has been established in extensive works on semantic verb classes such as Levin (1993) for English and Schulte im Walde (2002, 2003, 2004, 2006) for German: each verb class contains verbs which are similar in their meaning and in their syntactic properties. Basing our work on this hypothesis, we have conducted a study of the “La Repubblica” corpus, one of the leading corpora freely available for the Italian language, to subsequently obtain an automatic classification of a sample of Italian verbs. Using probability distributions over verb subcategorisation frames, we obtained an intuitively plausible clustering of 200 verbs into 40, 24, and 10 classes. The automatic clustering was evaluated against independently motivated, hand-constructed semantic verb classes. A series of post-hoc cluster analysis explored the influence of specific frames and frame groups on the coherence of the verb classes, and supported the validity of the syntactic-semantic hypothesis.
klíčová slova: syntactic-semantic hypothesis; clustering; automatic classification; subcategorization frames; written Italian corpus ipotesi sintattico-semantica; clustering; classificazione automatica; frames di sottocategorizzazione; corpus dell’italiano scritto
230 x 157 mm
vychází: 3 x ročně
cena tištěného čísla: 150 Kč
ISSN: 0567-8269
E-ISSN: 2464-6830