User Tools

Site Tools


tagger

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

tagger [2019-06-21 13:32:06] (current)
Line 1: Line 1:
 +====== Tagger and lemmatizer HOWTO ======
  
 +===== Installation =====
 +
 +<​code>​
 +> git clone https://​github.com/​ufal/​morphodita
 +> cd src/
 +> vim Makefile.builtem
 +-  C_FLAGS += -std=c++11 -W -Wall -mtune=generic -msse -msse2 -mfpmath=sse -fvisibility=hidden -U_FORTIFY_SOURCE
 ++  C_FLAGS += -std=c++11 -W -Wall -march=native -fvisibility=hidden -U_FORTIFY_SOURCE
 +> make
 +</​code>​
 +
 +===== Models =====
 +
 +Download, unzip:
 +
 +Czech: https://​lindat.mff.cuni.cz/​repository/​xmlui/​handle/​11858/​00-097C-0000-0023-68D8-1
 +
 +English: https://​lindat.mff.cuni.cz/​repository/​xmlui/​handle/​11858/​00-097C-0000-0023-68D9-0
 +
 +(download link is at the bottom of the page)
 +
 +(beware, the models may have a non-free license)
 +===== Run tagger =====
 +
 +<​code>​echo "​Červený střízlíček a střapatá žluva ďobali šťavnaté ocúny"​ \
 +| ./​run_tagger czech-morfflex-pdt-131112-raw_lemmas.tagger-best_accuracy</​code>​
 +
 +===== Run lemmatizer =====
 +
 +<​code>​echo "​Červený střízlíček a střapatá žluva ďobali šťavnaté ocúny."​ \
 +| ./​run_tagger --input=untokenized --output=vertical \
 +czech-morfflex-pdt-131112-pos_only-raw_lemmas.tagger 2>/​dev/​null \
 +| cut -f 2 | tr "​\n"​ " "
 +</​code>​
 +
 +===== Problems =====
 +
 +Loading big models takes several seconds, but the tagging itself is very fast. The new version contains REST server, so it can be started once and handle multiple requests.
tagger.txt · Last modified: 2019-06-21 13:32:06 (external edit)