print copy
Disambiguation of Rich Inflection (Computational Morphology of Czech)

Disambiguation of Rich Inflection (Computational Morphology of Czech)

Hajič, Jan

subjects: linguistics

paperback, 330 pp., 1. edition
published: july 2004
ISBN: 80-246-0282-2
recommended price: 190 czk

e-shop

summary

Jan Hajič specializes in computer software and statistics in linguistics. He is working with a whole range of leading Czech and world linguists and programmers on the Czech morphological frequency dictionary. This book is the product of a long-time collective project focused on the computational morphology of Czech language. The stochastic disambiguation procedure outlined in this study represents the first systematic treatment of languages of the inflective type, to which most Slavic languages, as well as Latin, Old Greek and ancient Indo-European languages belong. Therefore, the present monograph may be of broader interest, being useful not only for the processing of Czech, but also for comparative research and for handling the corresponding phenomena in other languages.

table of contents

1 Introduction
2 Formal Morphology of Czech
2.1 Formal Morphology
2.2 Introduction to The Czech Tag System
2.3 The Positional Tag System for Czech
2.4 The Compact Tag System for Czech
3 The Paradigm System
3.1 The Paradigm Field
3.2 The Negation Field
3.3 The List of Endings
3.4 Paradigm Names
4 The Morphological Dictionary
4.1 The Root Field
4.2 The Paradigm Field
4.3 The Lemma Field
4.4 The Tag Field .
4.5 The Comment Field
4.6 The Alternate Part of Speech Field
4.7 The Style Field
4.8 The Term Field
5 Analysis and Generation of Word Forms
5.1 Direct Analysis and Generation
5.2 The Data.
5.3 The Analysis Algorithm
5.4 The Generation Algorithm
5.5 Other Algorithms
6 Morphological Disambiguation (Tagging)
6.1 Tagging - a step aside?
6.2 Orthogonality of Morphological Categories
6.3 The Training Data
6.4 The Model
6.5 Training
6.6 Results
6.7 Tagging Other Languages Using Scarce Resources
6.8 Conclusion and Further Research
6.9 Lemmatization and Word Sense Disambiguation
7 Resources & References
A Czech Morphological Data
A.1 Positional Tags - Quick Reference
A.2 The Endings