OAQA / OpenQA Setup Guide

We consider this guide obsolete as we think we have a nicer software available now - see BlanQA!

BlanQA has nicer source code, is easier to clone and setup (no Indri JNI!), can run on top of a variety of text corpora (e.g. Wikipedia!), features interactive question-answering mode (and IRC gateway) and we are working to make it even better. Its algorithms are not as advanced as helloqa-prototype (below) yet, though.

This is a step-by-step setup guide for your very own IBM Watson-like software!

Here, we will set up a Linux-based, open-source question answering system that can process free text corpus and free-form English questions and answer them (with some low precision). Note that its performance and precision is not anything like DeepQA / IBM Watson obviously, but it's a start and it is built on the same foundations as the IBM project.

Note that the outcome is still an executable that is not user-friendly at all, it just spews a lot of cryptic output, not load wikipedia and start a nice conversation window. It answers questions based on batch files and you will need to wade through its debug output to figure out what's going on and what its answers are. It's just all very experimental at this point.

OpenQA is a work mainly done at CMU (probably indirectly supported by IBM). It is being tweaked for user-friendlier setup by Pasky @ brmlab. OpenQA uses UIMA by IBM+Apache, Indri by the Lemur project, etc.

See the brmson project page to learn more about the results of our research on state-of-art open source QA systems.

Installation

Anyway, to get this running on your (Debian Wheezy or some-such) machine, it should be enough to follow these steps, command by command:

apt-get install default-jdk maven uima-utils

to install the required toolchain

git clone https://github.com/brmson/solr-provider; cd solr-provider; mvn install; cd ..

to install an up-to-date version of one boring OpenQA component

```
git clone -b prototype https://github.com/brmson/helloqa helloqa-prototype
```
to get the current source of helloqa-prototype, already pre-cooked by brmsonners to be usable out-of-the-box

i?86 users (see your uname -a) may skip this step, but x86_64/amd64 users need to compile their own indri database JNI bindings:

apt-get install build-essential libz-dev

```
wget http://sourceforge.net/projects/lemur/files/lemur/indri-5.0/indri-5.0.tar.gz
```
the latest version is currently 5.6, but the indices in git were built with 5.0 and the version must match

tar xf indri-5.0.tar.gz; cd indri-5.0; chmod +x configure

./configure --prefix= --enable-java --with-javahome=/usr/lib/jvm/default-java

```
make -j3
```

cp ./swig/obj/java/libindri_jni.so ../helloqa-prototype/lib/

```
cd ..
```

```
cd helloqa-prototype
```
```
mvn verify
```
to build it; the first time around, this will download several hundreds of megabytes of dependencies and chew for a while

mvn exec:exec -Dexec.executable=java \
 -Dexec.args="-Djava.library.path=lib/ -classpath %classpath 
   edu.cmu.lti.oaqa.ecd.driver.ECDDriver phases.err-analysis-IE-dsoqa"

to run it

In the massive debug prints, you will be able to fish for questions and proposed answers, congratulations! The test dataset is:
- Freetext corpus src/main/resources/gs/dso-extension-psg.txt
- Questions src/main/resources/input/dso-extension.txt
- Right answers src/main/resources/gs/dso-extension-answerkey.txt