Installing deka

Getting the source

Get deka and Kraken source.

git clone

git clone git://

The original Kraken might not work on recent systems. However, someone published my patched version on GitHub. That version should work on something like Debian Jessie. (notice the Czech comments in Fragment.cpp :-P)

Getting tables

Get the table files (*.dlt) generated by TMTO Project/SRLabs. It is 40 files of 1.7 TB total. You can get md5sums at There is a torrent, or you can find someone to mail you a hard drive. Or if you happen to live in Prague, you can get a copy in brmlab.

Installing tables

It is to be done this way

./TableConvert      di        /mnt/tables/gsm/100.dlt 100.ins:0            100.idx
#              table format   source file             destination:offset   index destination

if stored in files. However, to avoid filesystem overhead, a direct installation on a block device is advised. The script should help you with this.

Configuring tables for deka

Edit delta_config.h and write paths to devices and index files and offsets from the generated tables.conf.

Protip: do not use /dev/sdX, but path or UUID. /dev/sdX names tend to mix up!

Generating kernel

Run ./ > slice.c or 64 to generate kernel with 4x32bit or 4x64bit vectors. One of them would probably be faster. So far it looks like genkernel32 is faster on AMD cards. We have no info about nVidia.

Switching to 64bit would also require changing “slices” in and vankusconf.h.

Compiling fails with (older?) nVidia compilers due to unsupported “unsigned long long” type. Replacing it with “ulong” variable seems to help:

ulong one = 1; mask |= one << i;
  if(diff != all) {

Setting kernel options

In and .h, number of concurrently launched kernels could be also changed. A good starting value is a small integer multiply of number of computing cores on your card minus 1. For example 4095 if your card has 2048 cores.

Additionally, QSIZE can be changed to fit about two times the number of fragments processed in parallel.

Running deka


Run, once or twice for each OpenCL device. It will ask you which device you want to use and tell you to set PYOPENCL_CTX environment variable to avoid asking again.

Run, once or twice.

(or use to run all the above – but running it manually is better for the first time as you can see debug prints)

Then, connect to the server (for example with telnet) and test it.

~> telnet localhost 1578
Trying ::1...
Connected to localhost.
Escape character is '^]'.
crack 001110001001010111000110000100110100001000011010100001000010000110101100101010100110110100100111110011101110000000
Cracking #0 001110001001010111000110000100110100001000011010100001000010000110101100101010100110110100100111110011101110000000
Found 44D85D82BAF275B4 @ 2 #0 (table:412)
crack #0 took 35586 msec

Congratulations, you have a working setup!

Performance tuning

By entering “stats”, you can view size of burst queues. You can see if your bottleneck is the storage (“endpoints” queue) or chain computation.

Possible speedups:

  • tune loop unrolling in kernel
  • tune number of iterations in kernel (currently 3000)
  • tune number of kernels executed
  • use async IO or multiple threads to read blocks
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 4.0 International
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki