„Szabad forráskódú morfológiai analizátorok összehasonlítása angolul” változatai közötti eltérés

Innen: Programozás Wiki
Ugrás a navigációhozUgrás a kereséshez
42. sor: 42. sor:
* [http://code.google.com/p/foma/ foma homepage]
* [http://code.google.com/p/foma/ foma homepage]
* [[SFST útmutató agglutináló nyelvekhez angolul|Sfst guide for agglutinating languages in English]]
* [[SFST útmutató agglutináló nyelvekhez angolul|Sfst guide for agglutinating languages in English]]
* Systems and Frameworks for Computational Morphology, By Cerstin Mahlow, Michael Piotrowski. ISBN 978-3-642-23138-4
[[category: Hfst]]
[[category: Hfst]]
[[category: Sfst]]
[[category: Sfst]]
[[category: Morfológia]]
[[category: Morfológia]]

A lap 2012. február 6., 12:11-kori változata

Comparison of foma and sfst

UsageWell documented, easyWell documented, easy
Morphological analysisWell documentedWell documented
Detailed documentationFsmbook applies 100% to fomaNot available
Internal naming Strict rules, almost only alphabetic chars are allowed, easy reading and syntax checkAlmost all characters are allowed. Can cause hard to read programs, also syntax check is hard due to this permissiveness
Regular expression facilitiesRich (for example .#. means begin of word)Basic
Code readabilityGoodSomewhat reduced due to requirement of using multiple alphabets and extremely permissive internal naming
Code inclusionNot availableEasy, using #include
Word list in extra fileNot possibleEasy, using lex files
Variable over and underdefinitionNicely handled, intuitiveSomewhat cumbersome handling, requires additional user tools
Multiple stem words handlingEasy and intuitiveSomewhat cumbersome
Agreement variablesNot availableAvailable
DebuggingGood, detailed informationGood, detailed information
Built in syntax checkingGoodPoor
Weighted fst*Not availableNot available
Result evaluationGoodExcellent, using fst-generate
Support from authorGoodGood
  • *Weighted fst: There are often grammatical versions, that are more often used, and others, that seldom. For example in Hungarian possession 3-rd person is expressed with a/e or with ja/je. I can say, tor-a, but also tor-ja for 'his tor'. For translation applications it would be helpful, if the more often used version were weighted; Program would then generate in the more often used version, but it would understand even the less often used version. What is more or less used, is individual, and must be set up for each word (in some cases for each word group) individually.

Other available free tools

Openfst and hfst.

Openfst is a complete fst implementation. Documentation lacks morphology handling, therefore this tool is not for first morphology implementations.

Hfst is a lexc/twolc implementation with some additional tools, that needs an underlying fst automaton like openfst, sfst or foma.

Since hfst supports twolc, an obsolete xerox tool, instead of xfst, the more current version, its usefulness is questionable. Documentation of hfst is poor, it does not contain any full, working morphology example. Its usefulness with foma is questionable, since foma is based on the more modern xfst/lexc pair. Support of sfst is half-hearted, for example the faroese lexc file can not be compiled using sfst as format. Hfst is not capable to show the intermediate format for sfst, just an internal binary hfst format. Hfst usage might make sense in connection with the also underdocumented openfst.