User Tools

Site Tools


tagger

Tagger and lemmatizer HOWTO

Installation

> git clone https://github.com/ufal/morphodita
> cd src/
> vim Makefile.builtem
-  C_FLAGS += -std=c++11 -W -Wall -mtune=generic -msse -msse2 -mfpmath=sse -fvisibility=hidden -U_FORTIFY_SOURCE
+  C_FLAGS += -std=c++11 -W -Wall -march=native -fvisibility=hidden -U_FORTIFY_SOURCE
> make

Models

Download, unzip:

Czech: https://lindat.mff.cuni.cz/repository/xmlui/handle/11858/00-097C-0000-0023-68D8-1

English: https://lindat.mff.cuni.cz/repository/xmlui/handle/11858/00-097C-0000-0023-68D9-0

(download link is at the bottom of the page)

(beware, the models may have a non-free license)

Run tagger

echo "Červený střízlíček a střapatá žluva ďobali šťavnaté ocúny" \
| ./run_tagger czech-morfflex-pdt-131112-raw_lemmas.tagger-best_accuracy

Run lemmatizer

echo "Červený střízlíček a střapatá žluva ďobali šťavnaté ocúny." \
| ./run_tagger --input=untokenized --output=vertical \
czech-morfflex-pdt-131112-pos_only-raw_lemmas.tagger 2>/dev/null \
| cut -f 2 | tr "\n" " "

Problems

Loading big models takes several seconds, but the tagging itself is very fast. The new version contains REST server, so it can be started once and handle multiple requests.

tagger.txt · Last modified: 2019-06-21 13:32:06 (external edit)