OAQA / OpenQA Setup Guide
BlanQA has nicer source code, is easier to clone and setup (no Indri JNI!), can run on top of a variety of text corpora (e.g. Wikipedia!), features interactive question-answering mode (and IRC gateway) and we are working to make it even better. Its algorithms are not as advanced as helloqa-prototype (below) yet, though.
This is a step-by-step setup guide for your very own IBM Watson-like software!
Here, we will set up a Linux-based, open-source question answering system that can process free text corpus and free-form English questions and answer them (with some low precision). Note that its performance and precision is not anything like DeepQA / IBM Watson obviously, but it's a start and it is built on the same foundations as the IBM project.
Note that the outcome is still an executable that is not user-friendly at all, it just spews a lot of cryptic output, not load wikipedia and start a nice conversation window. It answers questions based on batch files and you will need to wade through its debug output to figure out what's going on and what its answers are. It's just all very experimental at this point.
OpenQA is a work mainly done at CMU (probably indirectly supported by IBM). It is being tweaked for user-friendlier setup by Pasky @ brmlab. OpenQA uses UIMA by IBM+Apache, Indri by the Lemur project, etc.
See the brmson project page to learn more about the results of our research on state-of-art open source QA systems.
Installation
Anyway, to get this running on your (Debian Wheezy or some-such) machine, it should be enough to follow these steps, command by command:
apt-get install default-jdk maven uima-utils
to install the required toolchain
git clone https://github.com/brmson/solr-provider; cd solr-provider; mvn install; cd ..
to install an up-to-date version of one boring OpenQA component
git clone -b prototype https://github.com/brmson/helloqa helloqa-prototype
to get the current source of helloqa-prototype, already pre-cooked by brmsonners to be usable out-of-the-box
- i?86 users (see your
uname -a
) may skip this step, but x86_64/amd64 users need to compile their own indri database JNI bindings:apt-get install build-essential libz-dev
wget http://sourceforge.net/projects/lemur/files/lemur/indri-5.0/indri-5.0.tar.gz
the latest version is currently 5.6, but the indices in git were built with 5.0 and the version must match
tar xf indri-5.0.tar.gz; cd indri-5.0; chmod +x configure
./configure --prefix= --enable-java --with-javahome=/usr/lib/jvm/default-java
make -j3
cp ./swig/obj/java/libindri_jni.so ../helloqa-prototype/lib/
cd ..
cd helloqa-prototype
mvn verify
to build it; the first time around, this will download several hundreds of megabytes of dependencies and chew for a while
mvn exec:exec -Dexec.executable=java \ -Dexec.args="-Djava.library.path=lib/ -classpath %classpath edu.cmu.lti.oaqa.ecd.driver.ECDDriver phases.err-analysis-IE-dsoqa"
to run it
- In the massive debug prints, you will be able to fish for questions and proposed answers, congratulations! The test dataset is:
- Freetext corpus
src/main/resources/gs/dso-extension-psg.txt
- Questions
src/main/resources/input/dso-extension.txt
- Right answers
src/main/resources/gs/dso-extension-answerkey.txt