Szabad forráskódú morfológiai analizátorok összehasonlítása angolul

A Programozás Wiki wikiből

Comparison of foma and sfst[szerkesztés]

UsageWell documented, easyWell documented, easy
Morphological analysisWell documentedWell documented
Detailed documentationFsmbook applies 100% to fomaNot available
Internal naming Strict rules, almost only alphabetic chars are allowed, easy reading and syntax checkAlmost all characters are allowed. Can cause hard to read programs, also syntax check is hard due to this permissiveness
Regular expression facilitiesRich (for example .#. means begin of word)Basic
Substitution of a variable with multiple charactersEasyNot possible
"Otherwise" rule*Not availableAvailable, as default
Code readabilityGoodSomewhat reduced due to requirement of using multiple alphabets and extremely permissive internal naming
Code inclusionNot available in lexcEasy, using #include
Word list in extra fileNot possible in lexcEasy, using lex files
Variable over and underdefinitionNicely handled, intuitiveSomewhat cumbersome handling, requires additional user tools
Multiple stem words handlingEasy and intuitiveSomewhat cumbersome
Agreement variablesNot availableAvailable
DebuggingGood, detailed informationGood, detailed information
Built in syntax checkingGoodPoor
Weighted fst*Not availableNot available
Result evaluationGoodExcellent, using fst-generate
Support from author or other usersGoodGood
  • *Otherwise rule means, that if for example a character can take 3 values, a,e and o, and the user defines the rules for a end e, he can say "otherwise" o, and this causes, that in all cases, where the rules for a and e are not true, the character will take the value o.
  • **Weighted fst: There are often grammatical versions, that are more often used, and others, that seldom. For example in Hungarian possession 3-rd person is expressed with a/e or with ja/je. I can say, tor-a, but also tor-ja for 'his tor'. For translation applications it would be helpful, if the more often used version were weighted; Program would then generate in the more often used version, but it would understand even the less often used version. What is more or less used, is individual, and must be set up for each word (in some cases for each word group) individually.

Other available free tools[szerkesztés]

Openfst and hfst.

Openfst is a complete fst implementation. Documentation lacks morphology handling, therefore this tool is not for first morphology implementations.

Hfst is a lexc/twolc implementation with some additional tools, that needs an underlying fst automaton like openfst, sfst or foma.

Since hfst supports twolc, an obsolete xerox tool, instead of xfst, the more current version, its usefulness is questionable. Documentation of hfst is poor, it does not contain any full, working morphology example. Its usefulness with foma is questionable, since foma is based on the more modern xfst/lexc pair. Support of sfst is half-hearted, for example the faroese lexc file can not be compiled using sfst as format. Hfst is not capable to show the intermediate format for sfst, just an internal binary hfst format. Hfst usage might make sense in connection with the also underdocumented openfst.