Tagger and lemmatizer HOWTO


> git clone https://github.com/ufal/morphodita
> cd src/
> vim Makefile.builtem
-  C_FLAGS += -std=c++11 -W -Wall -mtune=generic -msse -msse2 -mfpmath=sse -fvisibility=hidden -U_FORTIFY_SOURCE
+  C_FLAGS += -std=c++11 -W -Wall -march=native -fvisibility=hidden -U_FORTIFY_SOURCE
> make


Download, unzip:

Czech: https://lindat.mff.cuni.cz/repository/xmlui/handle/11858/00-097C-0000-0023-68D8-1

English: https://lindat.mff.cuni.cz/repository/xmlui/handle/11858/00-097C-0000-0023-68D9-0

(download link is at the bottom of the page)

(beware, the models may have a non-free license)

Run tagger

echo "Červený střízlíček a střapatá žluva ďobali šťavnaté ocúny" \
| ./run_tagger czech-morfflex-pdt-131112-raw_lemmas.tagger-best_accuracy

Run lemmatizer

echo "Červený střízlíček a střapatá žluva ďobali šťavnaté ocúny." \
| ./run_tagger --input=untokenized --output=vertical \
czech-morfflex-pdt-131112-pos_only-raw_lemmas.tagger 2>/dev/null \
| cut -f 2 | tr "\n" " "


Loading big models takes several seconds, but the tagging itself is very fast. The new version contains REST server, so it can be started once and handle multiple requests.

Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 4.0 International
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki