Szabad forráskódú morfológiai analizátorok összehasonlítása angolul
Comparison of foma and sfst[szerkesztés]
Feature | Foma | SFST |
---|---|---|
Documentation | Good | Good |
Installation | Simple | Simple |
Usage | Well documented, easy | Well documented, easy |
Morphological analysis | Well documented | Well documented |
Detailed documentation | Fsmbook applies 100% to foma | Not available |
Internal naming | Strict rules, almost only alphabetic chars are allowed, easy reading and syntax check | Almost all characters are allowed. Can cause hard to read programs, also syntax check is hard due to this permissiveness |
Regular expression facilities | Rich (for example .#. means begin of word) | Basic |
Substitution of a variable with multiple characters | Easy | Not possible |
"Otherwise" rule* | Not available | Available, as default |
Code readability | Good | Somewhat reduced due to requirement of using multiple alphabets and extremely permissive internal naming |
Code inclusion | Not available in lexc | Easy, using #include |
Word list in extra file | Not possible in lexc | Easy, using lex files |
Variable over and underdefinition | Nicely handled, intuitive | Somewhat cumbersome handling, requires additional user tools |
Multiple stem words handling | Easy and intuitive | Somewhat cumbersome |
Agreement variables | Not available | Available |
Debugging | Good, detailed information | Good, detailed information |
Built in syntax checking | Good | Poor |
Weighted fst* | Not available | Not available |
Result evaluation | Good | Excellent, using fst-generate |
Support from author or other users | Good | Good |
- *Otherwise rule means, that if for example a character can take 3 values, a,e and o, and the user defines the rules for a end e, he can say "otherwise" o, and this causes, that in all cases, where the rules for a and e are not true, the character will take the value o.
- **Weighted fst: There are often grammatical versions, that are more often used, and others, that seldom. For example in Hungarian possession 3-rd person is expressed with a/e or with ja/je. I can say, tor-a, but also tor-ja for 'his tor'. For translation applications it would be helpful, if the more often used version were weighted; Program would then generate in the more often used version, but it would understand even the less often used version. What is more or less used, is individual, and must be set up for each word (in some cases for each word group) individually.
Other available free tools[szerkesztés]
Openfst and hfst.
Openfst is a complete fst implementation. Documentation lacks morphology handling, therefore this tool is not for first morphology implementations.
Hfst is a lexc/twolc implementation with some additional tools, that needs an underlying fst automaton like openfst, sfst or foma.
Since hfst supports twolc, an obsolete xerox tool, instead of xfst, the more current version, its usefulness is questionable. Documentation of hfst is poor, it does not contain any full, working morphology example. Its usefulness with foma is questionable, since foma is based on the more modern xfst/lexc pair. Support of sfst is half-hearted, for example the faroese lexc file can not be compiled using sfst as format. Hfst is not capable to show the intermediate format for sfst, just an internal binary hfst format. Hfst usage might make sense in connection with the also underdocumented openfst.
Links[szerkesztés]
- Open fst home page
- Hfst home page
- Hfst wiki
- sfst home page
- foma homepage
- Sfst guide for agglutinating languages in English
- Systems and Frameworks for Computational Morphology, By Cerstin Mahlow, Michael Piotrowski. ISBN 978-3-642-23137-7
- HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers
- Foma for Basque