This shows you the differences between two versions of the page.
— |
tagger [2019-06-21 13:32:06] (current) |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== Tagger and lemmatizer HOWTO ====== | ||
+ | ===== Installation ===== | ||
+ | |||
+ | <code> | ||
+ | > git clone https://github.com/ufal/morphodita | ||
+ | > cd src/ | ||
+ | > vim Makefile.builtem | ||
+ | - C_FLAGS += -std=c++11 -W -Wall -mtune=generic -msse -msse2 -mfpmath=sse -fvisibility=hidden -U_FORTIFY_SOURCE | ||
+ | + C_FLAGS += -std=c++11 -W -Wall -march=native -fvisibility=hidden -U_FORTIFY_SOURCE | ||
+ | > make | ||
+ | </code> | ||
+ | |||
+ | ===== Models ===== | ||
+ | |||
+ | Download, unzip: | ||
+ | |||
+ | Czech: https://lindat.mff.cuni.cz/repository/xmlui/handle/11858/00-097C-0000-0023-68D8-1 | ||
+ | |||
+ | English: https://lindat.mff.cuni.cz/repository/xmlui/handle/11858/00-097C-0000-0023-68D9-0 | ||
+ | |||
+ | (download link is at the bottom of the page) | ||
+ | |||
+ | (beware, the models may have a non-free license) | ||
+ | ===== Run tagger ===== | ||
+ | |||
+ | <code>echo "Červený střízlíček a střapatá žluva ďobali šťavnaté ocúny" \ | ||
+ | | ./run_tagger czech-morfflex-pdt-131112-raw_lemmas.tagger-best_accuracy</code> | ||
+ | |||
+ | ===== Run lemmatizer ===== | ||
+ | |||
+ | <code>echo "Červený střízlíček a střapatá žluva ďobali šťavnaté ocúny." \ | ||
+ | | ./run_tagger --input=untokenized --output=vertical \ | ||
+ | czech-morfflex-pdt-131112-pos_only-raw_lemmas.tagger 2>/dev/null \ | ||
+ | | cut -f 2 | tr "\n" " " | ||
+ | </code> | ||
+ | |||
+ | ===== Problems ===== | ||
+ | |||
+ | Loading big models takes several seconds, but the tagging itself is very fast. The new version contains REST server, so it can be started once and handle multiple requests. |