====== OAQA / OpenQA Setup Guide ======
We consider this guide obsolete as we think we have a nicer software available now - see [[https://github.com/brmson/blanqa|BlanQA]]!
**BlanQA** has nicer source code, is easier to clone and setup (no Indri JNI!), can run on top of a variety of text corpora (e.g. Wikipedia!), features //interactive question-answering mode// (and IRC gateway) and we are working to make it even better. Its algorithms are not as advanced as helloqa-prototype (below) yet, though.
[[project:brmson:start|{{ :project:brmson.png?220}}]]
This is a step-by-step setup guide for your very own IBM Watson-like software!
Here, we will set up a Linux-based, open-source question answering system that can process free text corpus and free-form English questions and answer them (with some low precision). Note that its performance and precision is not //anything// like DeepQA / IBM Watson obviously, but it's a start and it is built on the //same foundations// as the IBM project.
Note that the outcome is still an executable that is //not user-friendly at all//, it just spews a lot of cryptic output, not load wikipedia and start a nice conversation window. It answers questions based on batch files and you will need to wade through its debug output to figure out what's going on and what its answers are. It's just all very experimental at this point.
[[http://oaqa.github.io/|OpenQA]] is a work mainly done at CMU (probably indirectly supported by IBM). It is being tweaked for user-friendlier setup by Pasky @ brmlab. OpenQA uses UIMA by IBM+Apache, Indri by the Lemur project, etc.
See [[project:brmson:start|the brmson project page]] to learn more about the results of our research on state-of-art open source QA systems.
===== Installation =====
Anyway, to get this running on your (Debian Wheezy or some-such) machine, it should be enough to follow these steps, command by command:
* apt-get install default-jdk maven uima-utils
to install the required toolchain
* git clone https://github.com/brmson/solr-provider; cd solr-provider; mvn install; cd ..
to install an up-to-date version of one boring OpenQA component
* git clone -b prototype https://github.com/brmson/helloqa helloqa-prototype
to get the current source of helloqa-prototype, already pre-cooked by brmsonners to be usable out-of-the-box
* i?86 users (see your ''uname -a'') may skip this step, but x86_64/amd64 users need to compile their own indri database JNI bindings:
* apt-get install build-essential libz-dev
* wget http://sourceforge.net/projects/lemur/files/lemur/indri-5.0/indri-5.0.tar.gz
the latest version is currently 5.6, but the indices in git were built with 5.0 and the version must match
* tar xf indri-5.0.tar.gz; cd indri-5.0; chmod +x configure
* ./configure --prefix= --enable-java --with-javahome=/usr/lib/jvm/default-java
* make -j3
* cp ./swig/obj/java/libindri_jni.so ../helloqa-prototype/lib/
* cd ..
* cd helloqa-prototype
* mvn verify
to build it; the first time around, this will download several hundreds of megabytes of dependencies and chew for a while
* mvn exec:exec -Dexec.executable=java \
-Dexec.args="-Djava.library.path=lib/ -classpath %classpath
edu.cmu.lti.oaqa.ecd.driver.ECDDriver phases.err-analysis-IE-dsoqa"
to run it
* In the massive debug prints, you will be able to fish for questions and proposed answers, congratulations! The test dataset is:
* Freetext corpus ''src/main/resources/gs/dso-extension-psg.txt''
* Questions ''src/main/resources/input/dso-extension.txt''
* Right answers ''src/main/resources/gs/dso-extension-answerkey.txt''