OAQA / OpenQA Setup Guide

We consider this guide obsolete as we think we have a nicer software available now - see BlanQA!

BlanQA has nicer source code, is easier to clone and setup (no Indri JNI!), can run on top of a variety of text corpora (e.g. Wikipedia!), features interactive question-answering mode (and IRC gateway) and we are working to make it even better. Its algorithms are not as advanced as helloqa-prototype (below) yet, though.

This is a step-by-step setup guide for your very own IBM Watson-like software!

Here, we will set up a Linux-based, open-source question answering system that can process free text corpus and free-form English questions and answer them (with some low precision). Note that its performance and precision is not anything like DeepQA / IBM Watson obviously, but it's a start and it is built on the same foundations as the IBM project.

Note that the outcome is still an executable that is not user-friendly at all, it just spews a lot of cryptic output, not load wikipedia and start a nice conversation window. It answers questions based on batch files and you will need to wade through its debug output to figure out what's going on and what its answers are. It's just all very experimental at this point.

OpenQA is a work mainly done at CMU (probably indirectly supported by IBM). It is being tweaked for user-friendlier setup by Pasky @ brmlab. OpenQA uses UIMA by IBM+Apache, Indri by the Lemur project, etc.

See the brmson project page to learn more about the results of our research on state-of-art open source QA systems.

Installation

Anyway, to get this running on your (Debian Wheezy or some-such) machine, it should be enough to follow these steps, command by command:

  • apt-get install default-jdk maven uima-utils

    to install the required toolchain

  • git clone https://github.com/brmson/solr-provider; cd solr-provider; mvn install; cd ..

    to install an up-to-date version of one boring OpenQA component

  • git clone -b prototype https://github.com/brmson/helloqa helloqa-prototype

    to get the current source of helloqa-prototype, already pre-cooked by brmsonners to be usable out-of-the-box

  • i?86 users (see your uname -a) may skip this step, but x86_64/amd64 users need to compile their own indri database JNI bindings:
    • apt-get install build-essential libz-dev
    • wget http://sourceforge.net/projects/lemur/files/lemur/indri-5.0/indri-5.0.tar.gz

      the latest version is currently 5.6, but the indices in git were built with 5.0 and the version must match

    • tar xf indri-5.0.tar.gz; cd indri-5.0; chmod +x configure
    • ./configure --prefix= --enable-java --with-javahome=/usr/lib/jvm/default-java
    • make -j3
    • cp ./swig/obj/java/libindri_jni.so ../helloqa-prototype/lib/
    • cd ..
  • cd helloqa-prototype
  • mvn verify

    to build it; the first time around, this will download several hundreds of megabytes of dependencies and chew for a while

  • mvn exec:exec -Dexec.executable=java \
     -Dexec.args="-Djava.library.path=lib/ -classpath %classpath 
       edu.cmu.lti.oaqa.ecd.driver.ECDDriver phases.err-analysis-IE-dsoqa"

    to run it

  • In the massive debug prints, you will be able to fish for questions and proposed answers, congratulations! The test dataset is:
    • Freetext corpus src/main/resources/gs/dso-extension-psg.txt
    • Questions src/main/resources/input/dso-extension.txt
    • Right answers src/main/resources/gs/dso-extension-answerkey.txt
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki