Cubical Monolith 2014 09

2014-09-30

I have all 11-systems up and running (on wired network connections.) This brings the total functioning compute nodes to 43.

2014-09-30-003-small.jpg
mpi@master:/master/mpi_tests$ mpirun -np 43 -f /master/mpi_tests/machinefile /master/mpi_tests/system
node02 20:25:25 up 2 days, 1 min, 0 users, load average: 1.15, 1.10, 1.13
node01 20:25:25 up 2 days, 1 min, 2 users, load average: 1.13, 1.08, 1.11
node03 20:25:25 up 2 days, 1 min, 0 users, load average: 1.17, 1.11, 1.14
node04 20:25:25 up 2 days, 1 min, 0 users, load average: 1.03, 1.06, 1.10
node05 20:25:25 up 2 days, 1 min, 0 users, load average: 1.02, 1.12, 1.13
node07 20:25:25 up 2 days, 1 min, 0 users, load average: 1.08, 1.06, 1.07
node06 20:25:25 up 2 days, 1 min, 1 user, load average: 1.03, 1.06, 1.10
node10 20:25:25 up 2 days, 1 min, 0 users, load average: 1.00, 1.03, 1.07
node09 20:25:25 up 2 days, 1 min, 0 users, load average: 1.13, 1.08, 1.07
node08 20:25:25 up 2 days, 1 min, 0 users, load average: 1.00, 1.07, 1.11
node11 20:25:25 up 2 days, 1 min, 0 users, load average: 1.04, 1.09, 1.12
node12 20:25:25 up 2 days, 1 min, 0 users, load average: 1.06, 1.07, 1.10
node13 20:25:26 up 2 days, 1 min, 0 users, load average: 1.01, 1.08, 1.12
node15 20:25:26 up 2 days, 1 min, 0 users, load average: 1.08, 1.09, 1.12
node14 20:25:26 up 2 days, 1 min, 0 users, load average: 1.11, 1.13, 1.14
node17 20:25:26 up 2 days, 1 min, 1 user, load average: 1.09, 1.13, 1.13
node19 20:25:26 up 2 days, 1 min, 0 users, load average: 1.15, 1.08, 1.11
node16 20:25:26 up 2 days, 1 min, 0 users, load average: 1.05, 1.07, 1.11
node18 20:25:26 up 2 days, 1 min, 1 user, load average: 1.04, 1.10, 1.13
node22 20:25:26 up 2 days, 1 min, 0 users, load average: 1.22, 1.12, 1.13
node23 20:25:26 up 2 days, 1 min, 0 users, load average: 1.01, 1.07, 1.12
node35 20:25:26 up 1 day, 23:59, 0 users, load average: 1.11, 1.10, 1.08
node37 20:25:26 up 1 day, 23:59, 0 users, load average: 1.08, 1.08, 1.05
node36 20:25:26 up 1 day, 23:59, 0 users, load average: 1.03, 1.04, 1.05
node20 20:25:26 up 2 days, 1 min, 1 user, load average: 1.02, 1.09, 1.13
node38 20:25:26 up 1 day, 23:34, 0 users, load average: 1.04, 1.06, 1.05
node39 20:25:26 up 1 day, 22:53, 0 users, load average: 1.00, 1.06, 1.05
node26 20:25:26 up 2 days, 1 min, 0 users, load average: 1.07, 1.07, 1.11
node41 20:25:26 up 1 day, 22:07, 0 users, load average: 1.00, 1.04, 1.05
node40 20:25:26 up 1 day, 22:25, 1 user, load average: 1.01, 1.05, 1.05
node24 20:25:26 up 2 days, 1 min, 0 users, load average: 1.07, 1.09, 1.12
node25 20:25:26 up 2 days, 1 min, 0 users, load average: 1.02, 1.07, 1.12
node43 20:25:26 up 48 min, 1 user, load average: 2.04, 2.07, 1.75
node27 20:25:26 up 2 days, 1 min, 0 users, load average: 1.02, 1.06, 1.08
node42 20:25:26 up 3:10, 1 user, load average: 2.01, 2.03, 1.73
node21 20:25:26 up 2 days, 1 min, 0 users, load average: 1.03, 1.06, 1.11
node30 20:25:26 up 2 days, 1 min, 1 user, load average: 1.03, 1.07, 1.11
node32 20:25:26 up 2 days, 1 min, 0 users, load average: 1.07, 1.08, 1.12
node31 20:25:26 up 2 days, 1 min, 1 user, load average: 1.03, 1.05, 1.09
node33 20:25:26 up 1 day, 23:59, 0 users, load average: 1.06, 1.08, 1.11
node28 20:25:26 up 1 day, 21:08, 0 users, load average: 1.05, 1.09, 1.12
node29 20:25:26 up 2 days, 1 min, 0 users, load average: 1.07, 1.09, 1.12
node34 20:25:26 up 1 day, 23:59, 0 users, load average: 1.03, 1.07, 1.11
mpi@master:/master/mpi_tests$

2014-09-29

I am continuing to build boot cards for each node (33-43.) Updated: I have 9 systems up and working with the cluster but I ran out of 4GB micro sd cards. I have 10-more in the post office if I can ever manage to go pick them up.
41-nodes

mpi@master:/master/mpi_tests$ mpirun -np 41 -f /master/mpi_tests/machinefile /master/mpi_tests/system
node05 23:41:25 up 3:17, 0 users, load average: 1.01, 1.01, 0.66
node01 23:41:25 up 3:17, 0 users, load average: 1.03, 0.99, 0.65
node06 23:41:25 up 3:17, 0 users, load average: 1.00, 0.99, 0.66
node02 23:41:25 up 3:17, 0 users, load average: 1.01, 0.99, 0.64
node09 23:41:25 up 3:17, 0 users, load average: 1.04, 1.00, 0.66
node07 23:41:25 up 3:17, 0 users, load average: 1.03, 1.02, 0.68
node03 23:41:25 up 3:17, 0 users, load average: 1.02, 1.03, 0.67
node08 23:41:25 up 3:17, 0 users, load average: 1.01, 1.00, 0.67
node04 23:41:25 up 3:17, 0 users, load average: 1.13, 1.05, 0.68
node10 23:41:26 up 3:17, 0 users, load average: 1.19, 1.06, 0.70
node13 23:41:26 up 3:17, 0 users, load average: 1.06, 1.01, 0.68
node17 23:41:26 up 3:17, 0 users, load average: 1.01, 1.03, 0.70
node16 23:41:26 up 3:17, 0 users, load average: 1.00, 1.00, 0.69
node12 23:41:26 up 3:17, 0 users, load average: 1.05, 1.06, 0.71
node37 23:41:26 up 3:15, 0 users, load average: 1.01, 0.97, 0.64
node39 23:41:26 up 2:09, 0 users, load average: 1.14, 1.02, 0.71
node38 23:41:26 up 2:50, 0 users, load average: 1.00, 0.96, 0.66
node36 23:41:26 up 3:15, 0 users, load average: 1.00, 0.96, 0.64
node35 23:41:26 up 3:15, 0 users, load average: 1.12, 1.00, 0.64
node14 23:41:26 up 3:17, 0 users, load average: 1.00, 0.99, 0.67
node41 23:41:26 up 1:23, 0 users, load average: 1.00, 0.99, 0.93
node40 23:41:26 up 1:41, 1 user, load average: 1.00, 0.96, 0.84
node15 23:41:26 up 3:17, 0 users, load average: 1.00, 0.99, 0.67
node11 23:41:26 up 3:17, 0 users, load average: 1.01, 0.99, 0.66
node18 23:41:26 up 3:17, 0 users, load average: 1.05, 1.02, 0.68
node23 23:41:26 up 3:17, 0 users, load average: 1.09, 1.04, 0.71
node22 23:41:26 up 3:17, 0 users, load average: 1.03, 1.04, 0.71
node19 23:41:26 up 3:17, 0 users, load average: 1.07, 1.07, 0.72
node28 23:41:26 up 24 min, 0 users, load average: 1.14, 1.09, 0.73
node21 23:41:27 up 3:17, 0 users, load average: 1.01, 0.99, 0.66
node24 23:41:27 up 3:17, 0 users, load average: 1.01, 0.99, 0.67
node26 23:41:27 up 3:17, 0 users, load average: 1.01, 1.02, 0.71
node25 23:41:27 up 3:17, 0 users, load average: 1.05, 1.03, 0.69
node32 23:41:27 up 3:17, 0 users, load average: 1.02, 0.99, 0.67
node30 23:41:27 up 3:17, 0 users, load average: 1.01, 1.01, 0.70
node31 23:41:27 up 3:17, 0 users, load average: 1.01, 1.00, 0.70
node27 23:41:27 up 3:17, 0 users, load average: 1.01, 1.01, 0.70
node29 23:41:27 up 3:17, 0 users, load average: 1.07, 1.07, 0.74
node34 23:41:27 up 3:15, 0 users, load average: 1.03, 0.99, 0.67
node20 23:41:27 up 3:17, 0 users, load average: 1.00, 1.00, 0.69
node33 23:41:27 up 3:15, 0 users, load average: 1.01, 1.00, 0.69

Until I can get more wifi-adapters I'm running them with a wired connection to the switches in the tower of pi rack.
The wired performance is always impressive when you've been using wifi. I will not give in to the spaghetti just yet but I very much miss the wired performance.

Wireless Congestion
I set the access point to enable short preambles and will tune rts threshold later to find some point where 64-wireless nodes on the same access point (within less than 1 foot of each other) will work reliably.

That aspect of using wireless sucks but handling 64-ethernet switch ports sucks worse in my opinion. If I were to use the 5V gig-e (8-port) switches that I used on the tower of pi project I'd need (8) + 1 port on a 9th switch for the uplink.

I still refuse to use wired unless I hit a brick wall with 64-wifi hosts situated so closely together.

I wonder how much RF I am absorbing from them which led me to muse about dropping the transmit power on them to some safe level and still allow all 64-nodes to communicate with their access point.

If I can combine a dedicated access point into the chassis somehow I may be able to turn the RF (xmit) power down.

We will see later right now I just want to get the next 11-nodes working together as one cluster.

I should be able to test all 44 nodes shortly I have to create a new power harness for this new set of (11) systems and connect a 12V-5V DC converter with a separate power run (with 15A fuse in line) to the battery.

2014-09-28-005-small.jpg
Updated: I just finished making the input power harness for the second set of 32-nodes. The red and black pair are the 12V input, the yellow and black are the +5V output.

Eventually this power harness will handle the second set of (32) systems entirely.

In parallel I am also designing a few additions to let me turn on each tier of systems individually. I may use a time delay design or I may use manual switches connected to solid-state relays.

I just don't want to hit my power system with such a huge current spike now that there are so many systems.

I also need to bring up the nfs server first (master node) as when everything powers up at the same time I have to issue a mass remount to all the compute nodes as most of them didn't NFS mount the shared directory because of course the master node nfsd wasn't up yet.

I was thinking of running the master system on it's own power feed with a "master" switch which would actually help solve an annoying issue I have with the master NFS service not being ready when all the other cluster nodes come up. It's best to boot the master first.

These changes are all low priority but will come as I circle back around to cleaning up everything. I have all the solid-state relays and a few time-delay relay boards as well but I am a huge fan of manual switches and that would work if they were connected to the SSR.

Using SSR I can bring up one tier at a time which will be a much kinder and gentler approach to the power system.

I could also just bring up as many nodes as I want which would be interesting some day to do that via software control and some multi (many) channel relay board that can control each compute nodes' input power.

That would be great because then you could power cycle it from remote.

2014-09-28

I am still waiting for the wooden boxes to arrive. Last Friday, the next 11-nodes were received and I am already staging them.

I got these next systems from David at r0ck.me. They ship very quickly and they have some very cool stuff at very good prices.

2014-09-28-004-small.jpg

I ran out of nylon spacers so I had to order some more. I also ordered 11-more wifi adapters.

Since the combined current load for all 64-nodes will be around 22A I have to add a second 12V-5V DC-DC converter and I have a few plans in the works for using time-delay relay circuits to power up each group of 11 servers sequentially as to not go from zero to twenty-two amps in zero seconds.

I am building the boot cards for each node now and letting cfengine do all the tedious work of preparing the nodes.

2014-09-20

I spoke with the owner of the company that makes the wooden boxes and ordered a set of 12"x12"x12" boxes stained-black for the base of the monolith. Then I will decide on which size to use.

MPICC

I have been revisiting some issues I had with floating point benchmarks last year so this is just taking some notes.

MPICC was built with the following options (installed from the debian repo)

mpicc for MPICH2 version 1.4.1
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabihf/4.6/lto-wrapper
Target: arm-linux-gnueabihf
Configured with: ../src/configure -v --with-pkgversion='Debian 4.6.3-14' 
--with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs 
--enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr 
--program-suffix=-4.6 --enable-shared --enable-linker-build-id 
--with-system-zlib --libexecdir=/usr/lib --without-included-gettext 
--enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6 
--libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu 
--enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object 
--enable-plugin --enable-objc-gc --disable-sjlj-exceptions --with-arch=armv7-a 
--with-fpu=vfpv3-d16 --with-float=hard --with-mode=thumb --enable-checking=release 
--build=arm-linux-gnueabihf --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf
Thread model: posix
gcc version 4.6.3 (Debian 4.6.3-14)

This of course mirrors the system gcc.

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabihf/4.6/lto-wrapper
Target: arm-linux-gnueabihf
Configured with: ../src/configure -v --with-pkgversion='Debian 4.6.3-14' 
--with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs 
--enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.6 
--enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib 
--without-included-gettext --enable-threads=posix 
--with-gxx-include-dir=/usr/include/c++/4.6
--libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu 
--enable-libstdcxx-debug
--enable-libstdcxx-time=yes --enable-gnu-unique-object --enable-plugin --enable-objc-gc
--disable-sjlj-exceptions --with-arch=armv7-a --with-fpu=vfpv3-d16 --with-float=hard 
--with-mode=thumb --enable-checking=release --build=arm-linux-gnueabihf 
--host=arm-linux-gnueabihf --target=arm-linux-gnueabihf
Thread model: posix
gcc version 4.6.3 (Debian 4.6.3-14)

Excerpt from here

3.17.4 ARM Options

These ‘-m’ options are defined for Advanced RISC Machines (ARM) architectures:

-mabi=name
Generate code for the specified ABI. Permissible values are: ‘apcs-gnu’, ‘atpcs’, ‘aapcs’, ‘aapcs-linux’ and ‘iwmmxt’.
-mapcs-frame
Generate a stack frame that is compliant with the ARM Procedure Call Standard for all functions, even if this is not strictly necessary for correct execution of the code. Specifying -fomit-frame-pointer with this option causes the stack frames not to be generated for leaf functions. The default is -mno-apcs-frame.
-mapcs
This is a synonym for -mapcs-frame.
-mthumb-interwork
Generate code that supports calling between the ARM and Thumb instruction sets. Without this option, on pre-v5 architectures, the two instruction sets cannot be reliably used inside one program. The default is -mno-thumb-interwork, since slightly larger code is generated when -mthumb-interwork is specified. In AAPCS configurations this option is meaningless.
-mno-sched-prolog
Prevent the reordering of instructions in the function prologue, or the merging of those instruction with the instructions in the function's body. This means that all functions start with a recognizable set of instructions (or in fact one of a choice from a small set of different function prologues), and this information can be used to locate the start of functions inside an executable piece of code. The default is -msched-prolog.
-mfloat-abi=name
Specifies which floating-point ABI to use. Permissible values are: ‘soft’, ‘softfp’ and ‘hard’.

Specifying ‘soft’ causes GCC to generate output containing library calls for floating-point operations. ‘softfp’ allows the generation of code using hardware floating-point instructions, but still uses the soft-float calling conventions. ‘hard’ allows generation of floating-point instructions and uses FPU-specific calling conventions.

The default depends on the specific target configuration. Note that the hard-float and soft-float ABIs are not link-compatible; you must compile your entire program with the same ABI, and link with a compatible set of libraries.
-mlittle-endian
Generate code for a processor running in little-endian mode. This is the default for all standard configurations.
-mbig-endian
Generate code for a processor running in big-endian mode; the default is to compile code for a little-endian processor.
-march=name
This specifies the name of the target ARM architecture. GCC uses this name to determine what kind of instructions it can emit when generating assembly code. This option can be used in conjunction with or instead of the -mcpu= option. Permissible names are: ‘armv2’, ‘armv2a’, ‘armv3’, ‘armv3m’, ‘armv4’, ‘armv4t’, ‘armv5’, ‘armv5t’, ‘armv5e’, ‘armv5te’, ‘armv6’, ‘armv6j’, ‘armv6t2’, ‘armv6z’, ‘armv6zk’, ‘armv6-m’, ‘armv7’, ‘armv7-a’, ‘armv7-r’, ‘armv7-m’, ‘armv7e-m’, ‘armv7ve’, ‘armv8-a’, ‘armv8-a+crc’, ‘iwmmxt’, ‘iwmmxt2’, ‘ep9312’.

-march=armv7ve is the armv7-a architecture with virtualization extensions.

-march=armv8-a+crc enables code generation for the ARMv8-A architecture together with the optional CRC32 extensions.

-march=native causes the compiler to auto-detect the architecture of the build computer. At present, this feature is only supported on GNU/Linux, and not all architectures are recognized. If the auto-detect is unsuccessful the option has no effect.
-mtune=name
This option specifies the name of the target ARM processor for which GCC should tune the performance of the code. For some ARM implementations better performance can be obtained by using this option. Permissible names are: ‘arm2’, ‘arm250’, ‘arm3’, ‘arm6’, ‘arm60’, ‘arm600’, ‘arm610’, ‘arm620’, ‘arm7’, ‘arm7m’, ‘arm7d’, ‘arm7dm’, ‘arm7di’, ‘arm7dmi’, ‘arm70’, ‘arm700’, ‘arm700i’, ‘arm710’, ‘arm710c’, ‘arm7100’, ‘arm720’, ‘arm7500’, ‘arm7500fe’, ‘arm7tdmi’, ‘arm7tdmi-s’, ‘arm710t’, ‘arm720t’, ‘arm740t’, ‘strongarm’, ‘strongarm110’, ‘strongarm1100’, ‘strongarm1110’, ‘arm8’, ‘arm810’, ‘arm9’, ‘arm9e’, ‘arm920’, ‘arm920t’, ‘arm922t’, ‘arm946e-s’, ‘arm966e-s’, ‘arm968e-s’, ‘arm926ej-s’, ‘arm940t’, ‘arm9tdmi’, ‘arm10tdmi’, ‘arm1020t’, ‘arm1026ej-s’, ‘arm10e’, ‘arm1020e’, ‘arm1022e’, ‘arm1136j-s’, ‘arm1136jf-s’, ‘mpcore’, ‘mpcorenovfp’, ‘arm1156t2-s’, ‘arm1156t2f-s’, ‘arm1176jz-s’, ‘arm1176jzf-s’, ‘cortex-a5’, ‘cortex-a7’, ‘cortex-a8’, ‘cortex-a9’, ‘cortex-a12’, ‘cortex-a15’, ‘cortex-a53’, ‘cortex-a57’, ‘cortex-r4’, ‘cortex-r4f’, ‘cortex-r5’, ‘cortex-r7’, ‘cortex-m4’, ‘cortex-m3’, ‘cortex-m1’, ‘cortex-m0’, ‘cortex-m0plus’, ‘marvell-pj4’, ‘xscale’, ‘iwmmxt’, ‘iwmmxt2’, ‘ep9312’, ‘fa526’, ‘fa626’, ‘fa606te’, ‘fa626te’, ‘fmp626’, ‘fa726te’.

Additionally, this option can specify that GCC should tune the performance of the code for a big.LITTLE system. Permissible names are: ‘cortex-a15.cortex-a7’, ‘cortex-a57.cortex-a53’.

-mtune=generic-arch specifies that GCC should tune the performance for a blend of processors within architecture arch. The aim is to generate code that run well on the current most popular processors, balancing between optimizations that benefit some CPUs in the range, and avoiding performance pitfalls of other CPUs. The effects of this option may change in future GCC versions as CPU models come and go.

-mtune=native causes the compiler to auto-detect the CPU of the build computer. At present, this feature is only supported on GNU/Linux, and not all architectures are recognized. If the auto-detect is unsuccessful the option has no effect.
-mcpu=name
This specifies the name of the target ARM processor. GCC uses this name to derive the name of the target ARM architecture (as if specified by -march) and the ARM processor type for which to tune for performance (as if specified by -mtune). Where this option is used in conjunction with -march or -mtune, those options take precedence over the appropriate part of this option.

Permissible names for this option are the same as those for -mtune.

-mcpu=generic-arch is also permissible, and is equivalent to -march=arch -mtune=generic-arch. See -mtune for more information.

-mcpu=native causes the compiler to auto-detect the CPU of the build computer. At present, this feature is only supported on GNU/Linux, and not all architectures are recognized. If the auto-detect is unsuccessful the option has no effect.
-mfpu=name
This specifies what floating-point hardware (or hardware emulation) is available on the target. Permissible names are: ‘vfp’, ‘vfpv3’, ‘vfpv3-fp16’, ‘vfpv3-d16’, ‘vfpv3-d16-fp16’, ‘vfpv3xd’, ‘vfpv3xd-fp16’, ‘neon’, ‘neon-fp16’, ‘vfpv4’, ‘vfpv4-d16’, ‘fpv4-sp-d16’, ‘neon-vfpv4’, ‘fp-armv8’, ‘neon-fp-armv8’, and ‘crypto-neon-fp-armv8’.

If -msoft-float is specified this specifies the format of floating-point values.

If the selected floating-point hardware includes the NEON extension (e.g. -mfpu=‘neon’), note that floating-point operations are not generated by GCC's auto-vectorization pass unless -funsafe-math-optimizations is also specified. This is because NEON hardware does not fully implement the IEEE 754 standard for floating-point arithmetic (in particular denormal values are treated as zero), so the use of NEON instructions may lead to a loss of precision.
-mfp16-format=name
Specify the format of the __fp16 half-precision floating-point type. Permissible names are ‘none’, ‘ieee’, and ‘alternative’; the default is ‘none’, in which case the __fp16 type is not defined. See Half-Precision, for more information.
-mstructure-size-boundary=n
The sizes of all structures and unions are rounded up to a multiple of the number of bits set by this option. Permissible values are 8, 32 and 64. The default value varies for different toolchains. For the COFF targeted toolchain the default value is 8. A value of 64 is only allowed if the underlying ABI supports it.

Specifying a larger number can produce faster, more efficient code, but can also increase the size of the program. Different values are potentially incompatible. Code compiled with one value cannot necessarily expect to work with code or libraries compiled with another value, if they exchange information using structures or unions.
-mabort-on-noreturn
Generate a call to the function abort at the end of a noreturn function. It is executed if the function tries to return.
-mlong-calls
-mno-long-calls
Tells the compiler to perform function calls by first loading the address of the function into a register and then performing a subroutine call on this register. This switch is needed if the target function lies outside of the 64-megabyte addressing range of the offset-based version of subroutine call instruction.

Even if this switch is enabled, not all function calls are turned into long calls. The heuristic is that static functions, functions that have the ‘short-call’ attribute, functions that are inside the scope of a ‘#pragma no_long_calls’ directive, and functions whose definitions have already been compiled within the current compilation unit are not turned into long calls. The exceptions to this rule are that weak function definitions, functions with the ‘long-call’ attribute or the ‘section’ attribute, and functions that are within the scope of a ‘#pragma long_calls’ directive are always turned into long calls.

This feature is not enabled by default. Specifying -mno-long-calls restores the default behavior, as does placing the function calls within the scope of a ‘#pragma long_calls_off’ directive. Note these switches have no effect on how the compiler generates code to handle function calls via function pointers.
-msingle-pic-base
Treat the register used for PIC addressing as read-only, rather than loading it in the prologue for each function. The runtime system is responsible for initializing this register with an appropriate value before execution begins.
-mpic-register=reg
Specify the register to be used for PIC addressing. For standard PIC base case, the default will be any suitable register determined by compiler. For single PIC base case, the default is ‘R9’ if target is EABI based or stack-checking is enabled, otherwise the default is ‘R10’.
-mpic-data-is-text-relative
Assume that each data segments are relative to text segment at load time. Therefore, it permits addressing data using PC-relative operations. This option is on by default for targets other than VxWorks RTP.
-mpoke-function-name
Write the name of each function into the text section, directly preceding the function prologue. The generated code is similar to this:

t0
                       .ascii "arm_poke_function_name", 0
                       .align
                   t1
                       .word 0xff000000 + (t1 - t0)
                   arm_poke_function_name
                       mov     ip, sp
                       stmfd   sp!, {fp, ip, lr, pc}
                       sub     fp, ip, #4

When performing a stack backtrace, code can inspect the value of pc stored at fp + 0. If the trace function then looks at location pc - 12 and the top 8 bits are set, then we know that there is a function name embedded immediately preceding this location and has length ((pc[-3]) & 0xff000000).
-mthumb
-marm
Select between generating code that executes in ARM and Thumb states. The default for most configurations is to generate code that executes in ARM state, but the default can be changed by configuring GCC with the —with-mode=state configure option.
-mtpcs-frame
Generate a stack frame that is compliant with the Thumb Procedure Call Standard for all non-leaf functions. (A leaf function is one that does not call any other functions.) The default is -mno-tpcs-frame.
-mtpcs-leaf-frame
Generate a stack frame that is compliant with the Thumb Procedure Call Standard for all leaf functions. (A leaf function is one that does not call any other functions.) The default is -mno-apcs-leaf-frame.
-mcallee-super-interworking
Gives all externally visible functions in the file being compiled an ARM instruction set header which switches to Thumb mode before executing the rest of the function. This allows these functions to be called from non-interworking code. This option is not valid in AAPCS configurations because interworking is enabled by default.
-mcaller-super-interworking
Allows calls via function pointers (including virtual functions) to execute correctly regardless of whether the target code has been compiled for interworking or not. There is a small overhead in the cost of executing a function pointer if this option is enabled. This option is not valid in AAPCS configurations because interworking is enabled by default.
-mtp=name
Specify the access model for the thread local storage pointer. The valid models are soft, which generates calls to __aeabi_read_tp, cp15, which fetches the thread pointer from cp15 directly (supported in the arm6k architecture), and auto, which uses the best available method for the selected processor. The default setting is auto.
-mtls-dialect=dialect
Specify the dialect to use for accessing thread local storage. Two dialects are supported—‘gnu’ and ‘gnu2’. The ‘gnu’ dialect selects the original GNU scheme for supporting local and global dynamic TLS models. The ‘gnu2’ dialect selects the GNU descriptor scheme, which provides better performance for shared libraries. The GNU descriptor scheme is compatible with the original scheme, but does require new assembler, linker and library support. Initial and local exec TLS models are unaffected by this option and always use the original scheme.
-mword-relocations
Only generate absolute relocations on word-sized values (i.e. R_ARM_ABS32). This is enabled by default on targets (uClinux, SymbianOS) where the runtime loader imposes this restriction, and when -fpic or -fPIC is specified.
-mfix-cortex-m3-ldrd
Some Cortex-M3 cores can cause data corruption when ldrd instructions with overlapping destination and base registers are used. This option avoids generating these instructions. This option is enabled by default when -mcpu=cortex-m3 is specified.
-munaligned-access
-mno-unaligned-access
Enables (or disables) reading and writing of 16- and 32- bit values from addresses that are not 16- or 32- bit aligned. By default unaligned access is disabled for all pre-ARMv6 and all ARMv6-M architectures, and enabled for all other architectures. If unaligned access is not enabled then words in packed data structures will be accessed a byte at a time.

The ARM attribute Tag_CPU_unaligned_access will be set in the generated object file to either true or false, depending upon the setting of this option. If unaligned access is enabled then the preprocessor symbol __ARM_FEATURE_UNALIGNED will also be defined.
-mneon-for-64bits
Enables using Neon to handle scalar 64-bits operations. This is disabled by default since the cost of moving data from core registers to Neon is high.
-mslow-flash-data
Assume loading data from flash is slower than fetching instruction. Therefore literal load is minimized for better performance. This option is only supported when compiling for ARMv7 M-profile and off by default.
-mrestrict-it
Restricts generation of IT blocks to conform to the rules of ARMv8. IT blocks can only contain a single 16-bit instruction from a select set of instructions. This option is on by default for ARMv8 Thumb mode.

2014-09-16

I have been working on code to create 40-million sha-512 hashes based on the Openwall mangled.txt which uses MPI.

What I have created reads in a chunk (8M) of the 40-million line dictionary file and then splits that chunk into NPROC pieces (at the time of this writing, 32-cpu.)

Since there are 32-compute nodes the code will assign each node only 1/32 of that chunk.

Then I create the SHA-512 hashes line by line of that nodes' portion. When the node has reach the end of it's assigned portion it will then read in another chunk (8M) and then selects the nodes' portion again. Each node only has to encrypt 1/32 of the chunk so that all nodes will constantly encrypt only their portion of the work.

Since each node works on a unique portion of the chunk it discards the other portions and then outputs it's 1/32 of work.

I like this design as it allows me to process very huge files (40-million lines!) in 8MB chunks which re-use the same 8MB memory over and over which means I don't have to worry about memory as well when working with larger files.

Last night I ran this same operation on the same file but using only one linear process and it took all night to generate about 850K sha-512 hashes. This code has only been running about 300-minutes but has generated over 1.38 million hashes (32 nodes doing the work!)

I am having a lot of fun with mpi4py and python. Some tasks are perfect for parallel processing and this sort of thing seems to be a classic example.

Why am I building a 40-million line file of sha-512 hashes? The same reason I am building a 40-million line des hash file.

Mainly to benchmark the cluster's ability to process all types of crypt operations but also these routines are building things to the point where I hope to create a parallel python cryptoengine that can handle extremely large dictionaries by splitting up the dictionary among every MPI node… This will be some sort of quick and dirty brute force run before resorting to all the rules and mangling parallel John the Ripper does.

Parallel John the Ripper is already available but I am using this to learn as much python as I can. The real thing that makes it interesting for me is the parallel computing aspects of coding this. If I were really interested in cracking passwords I'd use something a lot more powerful as cpu engines.

This design for me is simply a learning model and not a lot more that that though I have fun with it.

It will be even more fun when it has 66 systems but I'll only describe it as 64-compute nodes as I might work on making the master high availability some day.

I also have some interesting ideas for some projects at my day job using some of the techniques I am using here for related work but those are still in the R&D stage.

2014-09-14

I gave up up on waiting for the local carpenter and found a suitable box for the Cubical Monolith base from Tilnic Creations. I ordered (2) of them.

wooden-boxes.jpg

They are about 4" deeper than I had wished but I can either cut them to 12" or simply let the monolith end up being rather tall. It might add to it's "stature" … I will simply bolt the 12"x!2" aluminum plate over the top and probably cut some venting holes into the wood box for the 35A power supply/charger.

I am also investigating adding 32-more cpu to this design for a total of 66-systems (64 compute nodes, two masters.)

Updated: I am definitely going to add 33-more cpu to this design.

2014-09-12

"DES encryption sucks!"

Everything has been running about 18-hours and there has been some interesting results.

Average cracks/second per node = 153976.25
Total cracks/second per cluster = 4927240.00

Parallel JtR has guessed 2554 passwords so far after 18-hours… There is an obvious lesson here for everyone and that is not to ever use des encryption (i.e. using sha-256 or sha-512 slows down the attack greatly.) You already knew about that of course.

19 60g 0:18:34:11 32.62% 2/3 (ETA: 2014-09-14 09:44) 0.000897g/s 72.40p/s 152934c/s 6465KC/s kuipen6..kulaks6
30 0g 0:18:34:10 51.37% 2/3 (ETA: 12:57:28) 0g/s 69.54p/s 154270c/s 6522KC/s kanteL..kantrY
21 0g 0:18:34:24 36.12% 2/3 (ETA: 2014-09-14 04:13) 0g/s 77.54p/s 152986c/s 6468KC/s lasciai...lassers.
18 30g 0:18:34:12 30.86% 2/3 (ETA: 2014-09-14 12:58) 0.000448g/s 76.08p/s 153181c/s 6476KC/s lomboy8..loners8
31 0g 0:18:34:14 52.92% 2/3 (ETA: 11:53:50) 0g/s 66.89p/s 154559c/s 6534KC/s 2pelzige..2penantt
3 64g 0:18:34:08 3.75% 2/3 (ETA: 2014-10-02 16:13) 0.000957g/s 77.68p/s 154471c/s 6530KC/s Kingfish..Ledzep97
10 12g 0:18:34:10 15.97% 2/3 (ETA: 2014-09-16 21:03) 0.000179g/s 72.23p/s 154119c/s 6516KC/s TRADUCI..TRAESSIM
6 10g 0:18:34:10 9.81% 2/3 (ETA: 2014-09-19 22:05) 0.000149g/s 69.52p/s 153605c/s 6494KC/s Murchy1..Muriez1
2 539g 0:18:34:23 2.26% 2/3 (ETA: 2014-10-16 05:53) 0.008061g/s 72.38p/s 155992c/s 6595KC/s vouwmach..vowmakin
22 0g 0:18:34:14 37.88% 2/3 (ETA: 2014-09-14 01:49) 0g/s 72.44p/s 153773c/s 6501KC/s notifia?..nouent?
11 213g 0:18:34:07 18.58% 2/3 (ETA: 2014-09-16 04:43) 0.003186g/s 91.10p/s 153944c/s 6508KC/s onnera2..onorata2
9 2g 0:18:34:18 14.33% 2/3 (ETA: 2014-09-17 10:25) 0.000029g/s 72.11p/s 154309c/s 6524KC/s 1paieras..1paining
8 18g 0:18:34:11 12.45% 2/3 (ETA: 2014-09-18 05:55) 0.000269g/s 74.58p/s 153895c/s 6506KC/s rednrekc..regigeat
27 2g 0:18:34:13 46.11% 2/3 (ETA: 17:05:10) 0.000029g/s 72.13p/s 154427c/s 6529KC/s einorI..idarrI
20 4g 0:18:34:32 34.37% 2/3 (ETA: 2014-09-14 06:51) 0.000059g/s 76.71p/s 152793c/s 6460KC/s kloekst0..klonten0
1 155g 0:18:34:14 0.13% 2/3 (ETA: 2016-04-25 23:02) 0.002318g/s 78.05p/s 155894c/s 6591KC/s incideru..inclinen
25 23g 0:18:34:12 85.19% 2/3 (ETA: 22:36:32) 0.000344g/s 69.66p/s 153865c/s 6505KC/s 6siareut..6snobmot
5 788g 0:18:34:22 8.06% 2/3 (ETA: 2014-09-21 15:16) 0.01178g/s 69.41p/s 154310c/s 6524KC/s pinatas1..pinenes1
15 55g 0:18:34:21 25.60% 2/3 (ETA: 2014-09-15 01:21) 0.000822g/s 73.75p/s 153988c/s 6510KC/s ordino9..ordos9
29 6g 0:18:34:15 77.37% 2/3 (ETA: 00:48:43) 0.000089g/s 69.20p/s 152899c/s 6464KC/s 3cavalle..3cheah
7 17g 0:18:34:15 77.37% 2/3 (ETA: 00:48:37) 0.000254g/s 75.03p/s 153860c/s 6505KC/s 3ribatti..3ricolme
12 0g 0:18:34:29 20.33% 2/3 (ETA: 2014-09-15 20:09) 0g/s 72.03p/s 152867c/s 6463KC/s kranker!..kratzen!
24 3g 0:18:34:10 40.47% 2/3 (ETA: 2014-09-13 22:41) 0.000044g/s 69.57p/s 154235c/s 6520KC/s trvvkrsp..tsksmmll
23 144g 0:18:34:21 40.05% 2/3 (ETA: 2014-09-13 23:10) 0.002153g/s 72.66p/s 154418c/s 6528KC/s gemotord..gemuisde
32 0g 0:18:34:18 54.68% 2/3 (ETA: 10:46:29) 0g/s 67.02p/s 154180c/s 6518KC/s 4ovenfor..4overaci
28 0g 0:18:34:24 47.54% 2/3 (ETA: 15:52:37) 0g/s 69.17p/s 153821c/s 6503KC/s Giheaffp..Netskrea
13 96g 0:18:34:19 22.09% 2/3 (ETA: 2014-09-15 12:52) 0.001435g/s 72.29p/s 153677c/s 6497KC/s neist3..nenes3
26 3g 0:18:34:25 77.39% 2/3 (ETA: 00:48:34) 0.000044g/s 75.16p/s 154168c/s 6518KC/s 3forcast..3fracmas
4 34g 0:18:34:49 6.31% 2/3 (ETA: 2014-09-24 07:27) 0.000508g/s 72.19p/s 154334c/s 6525KC/s puimens..puisties
14 117g 0:18:34:26 23.85% 2/3 (ETA: 2014-09-15 06:41) 0.001749g/s 85.74p/s 153823c/s 6503KC/s odiate7..odling7
16 105g 0:18:34:34 27.36% 2/3 (ETA: 2014-09-14 20:42) 0.001570g/s 72.37p/s 153795c/s 6502KC/s odso5..oege5
17 57g 0:18:34:58 29.11% 2/3 (ETA: 2014-09-14 16:38) 0.000852g/s 79.78p/s 153855c/s 6504KC/s openen4..operos

2014-09-11

A friend of mine gave me a 175k-line (175179) file full of des-encrypted passwords and after I created a boiler-plate shadow file with these strings I fed it into JtR.

10 6g 0:00:54:20 15.79% 2/3 (ETA: 06:32:34) 0.001840g/s 746.8p/s 152716c/s 6495KC/s MANDATE1..MANIKIN1
17 31g 0:00:54:54 28.09% 2/3 (ETA: 04:03:24) 0.009410g/s 894.6p/s 152489c/s 6486KC/s brink4..brook4
1 134g 0:00:54:23 0.00% 2/3 (ETA: 2016-12-02 04:01) 0.04106g/s 857.1p/s 154691c/s 6579KC/s Nether1..Neverthe
16 65g 0:00:54:31 26.34% 2/3 (ETA: 04:15:21) 0.01986g/s 749.2p/s 152431c/s 6484KC/s bamboo5..barber5
9 0g 0:00:54:28 14.04% 2/3 (ETA: 07:16:17) 0g/s 741.8p/s 153000c/s 6508KC/s 1falloff..1fantail
6 6g 0:00:54:20 8.80% 2/3 (ETA: 11:06:14) 0.001840g/s 693.7p/s 152467c/s 6486KC/s Arch1..Ark1
22 0g 0:00:54:24 36.87% 2/3 (ETA: 03:16:03) 0g/s 751.8p/s 151573c/s 6447KC/s wrestle?..abbot?
11 115g 0:00:54:18 17.57% 2/3 (ETA: 05:57:40) 0.03529g/s 1135p/s 153399c/s 6525KC/s baby2..bail2
15 15g 0:00:54:28 24.59% 2/3 (ETA: 04:29:58) 0.004588g/s 777.0p/s 153454c/s 6528KC/s brake9..bred9
25 22g 0:00:54:20 56.25% 2/3 (ETA: 02:25:09) 0.006746g/s 694.4p/s 148312c/s 6309KC/s Sumet2..Tezkou2
29 6g 0:00:54:21 49.18% 2/3 (ETA: 02:39:04) 0.001839g/s 689.6p/s 148806c/s 6330KC/s halseesl..hamzaazm
2 98g 0:00:54:33 1.78% 2/3 (ETA: 2014-09-14 03:56) 0.02993g/s 738.1p/s 153487c/s 6529KC/s yraicidu..repinuj
30 0g 0:00:54:20 50.90% 2/3 (ETA: 02:35:18) 0g/s 691.0p/s 153690c/s 6538KC/s naF..reirraF
20 0g 0:00:54:41 33.36% 2/3 (ETA: 03:32:08) 0g/s 839.8p/s 150755c/s 6414KC/s studly0..teresa0
26 0g 0:00:54:25 43.92% 2/3 (ETA: 02:52:23) 0g/s 805.9p/s 153345c/s 6523KC/s KhajurKh..KhotKhot
24 2g 0:00:54:16 40.36% 2/3 (ETA: 03:03:06) 0.000614g/s 692.2p/s 150897c/s 6417KC/s brgbrg..brvbrv
27 0g 0:00:54:22 45.64% 2/3 (ETA: 02:47:39) 0g/s 742.5p/s 152053c/s 6468KC/s debutinG..decocT
3 35g 0:00:54:18 3.52% 2/3 (ETA: 2014-09-13 02:32) 0.01074g/s 857.5p/s 153764c/s 6541KC/s Breast..Brickbat
31 0g 0:00:54:25 52.64% 2/3 (ETA: 02:31:51) 0g/s 634.4p/s 153334c/s 6522KC/s 2fifteen..2film
23 85g 0:00:54:26 38.63% 2/3 (ETA: 03:09:21) 0.02602g/s 753.1p/s 153372c/s 6523KC/s predicab..prehisto
4 18g 0:00:54:48 5.29% 2/3 (ETA: 18:04:28) 0.005473g/s 739.4p/s 152835c/s 6502KC/s cancels..cantors
5 425g 0:00:54:30 7.04% 2/3 (ETA: 13:42:19) 0.1299g/s 686.2p/s 153081c/s 6510KC/s bobcat1..bombast1
18 18g 0:00:54:22 29.85% 2/3 (ETA: 03:50:40) 0.005516g/s 830.1p/s 153239c/s 6519KC/s belfast8..benight8
14 58g 0:00:54:25 22.83% 2/3 (ETA: 04:46:49) 0.01776g/s 1024p/s 152927c/s 6505KC/s awfully7..baboon7
28 0g 0:00:54:26 47.37% 2/3 (ETA: 02:43:21) 0g/s 684.3p/s 151906c/s 6462KC/s Wayward..Weatherw
8 5g 0:00:54:21 12.29% 2/3 (ETA: 08:10:50) 0.001533g/s 796.2p/s 153431c/s 6526KC/s nosilla..gnola
12 0g 0:00:54:35 19.32% 2/3 (ETA: 05:30:48) 0g/s 745.5p/s 150484c/s 6401KC/s harvey!..iguana!
7 13g 0:00:54:21 10.58% 2/3 (ETA: 09:22:08) 0.003985g/s 805.5p/s 153015c/s 6509KC/s juntosju..jynxjynx
13 53g 0:00:54:21 21.08% 2/3 (ETA: 05:06:24) 0.01625g/s 750.3p/s 152821c/s 6500KC/s awfully3..baboon3
32 0g 0:00:54:21 54.39% 2/3 (ETA: 02:28:28) 0g/s 639.7p/s 153359c/s 6524KC/s 4fed..4feminin
19 30g 0:00:54:23 31.60% 2/3 (ETA: 03:40:35) 0.009193g/s 755.4p/s 151777c/s 6455KC/s accusal6..actress6
21 0g 0:00:54:35 35.11% 2/3 (ETA: 03:23:47) 0g/s 857.9p/s 153162c/s 6515KC/s bowleg...braise.

DES seem to be very easy but so far all these passwords seem to be very weak (by checking john.pot very trivial passwords…)

[snip]

gators7          (u139490)
lances4          (u137110)
lances4          (u064355)
loves7           (u169304)
loves7           (u098631)
Finances         (u045859)
Finances         (u099852)
Finances         (u130927)
Finances         (u146381)
Finances         (u149898)
mills2           (u054908)
mills2           (u087662)
nips1            (u075805)
nips1            (u108845)
nobles1          (u118310)
puddles1         (u158396)
puddles1         (u128585)
strings4         (u145044)
strings4         (u031128)
walls1           (u007011)
feelgood         (u088639)
feelgood         (u137392)
reccos4          (u116531)
reccos4          (u141302)
gfdsa4           (u005491)
gfdsa4           (u007908)
gfdsa3           (u002704)
gfdsa3           (u101965)
gfdsa3           (u088290)
gfdsa3           (u069848)
gfdsa3           (u123984)
gfdsa3           (u094618)
gfdsa5           (u018231)
gfdsa5           (u080028)
gfdsa5           (u061219)
gfdsa5           (u077887)
Oceans           (u069644)
Oceans           (u053983)

PENTIUMS         (u086256)
PENTIUMS         (u006852)
etihw1           (u083635)
etihw1           (u143629)
nonsequi         (u071005)
crm1             (u060765)
crm1             (u092417)
semperfi         (u018555)
DODGE1           (u096839)
thankyou         (u104311)

[snip]

So the huge file full of des encrypted hashes needed to be converted to work with JtR so in this case I wrote a small script to create (1) shadow entry per encrypted hash creating a "user" based on the position in the huge file so I could reference that later.

#!/usr/bin/python

import sys

BUF_SIZE = 8192

uid = 1
i = 0

somefile = open('/master/tmp/some','r',BUF_SIZE)
somelines = somefile.readlines(BUF_SIZE)
while somelines:
    for password  in somelines:
        password = password.strip()
        print "u%06d:%s:15650:0:99999:7:::" % (i,password)
        i += 1
    somelines = somefile.readlines(BUF_SIZE)

This is what the above script creates (a 175K-line shadow file.):

u000000:..2KaMWN1uQhs:15650:0:99999:7:::
u000001:..4J9TDDtAQsc:15650:0:99999:7:::
u000002:..BrGLyF..NVo:15650:0:99999:7:::
u000003:..C7Zo5dLC9ac:15650:0:99999:7:::
u000004:..CU8tJmh.P7g:15650:0:99999:7:::
u000005:..ElbyGUCpSCA:15650:0:99999:7:::
u000006:..FgHz5UfJRjA:15650:0:99999:7:::
u000007:..GK2BbVM.cAQ:15650:0:99999:7:::
[snip]

2014-09-09

I have been spending some time learning more MPI python methodology and I have to say that scipy and numpy and of course mpi4py together create something incredibly powerful. I have been having a lot of fun

I obtained mangled.lst (donation) from Openwall so I am restarting the work on (4) test passwords using this improved "mangled" dictionary from the JtR team.

I doubt if I will ever live long enough to see one of my real passwords broken but GPUs are cheaper now and who knows.

Speaking of GPUs I have been tempted by the NVidia systems. The NVidia Jetson TK1 is particularly interesting. See MPI Solutions for GPUs as well as Jetson TK1 (elinux) and last but not least to buy hundreds of them. Ok, maybe just dozens… Oh well enough about the NVidia systems.

I wonder if the Mali 400 can be used I suppose only if they sponsor Openwall to write drivers for that gpu.

I rebooted node10 today as I thought I had a problem with that one node but it seems to have been my mpi c code misbehaving. I also was writing some C today to learn about how to read and write files using MPI. Namely MPI_File_seek, and MPI_File_write using offsets that were based on which node was writing.

i.e.

#include <stdio.h>
#include <string.h>
#include <mpi.h>

int main(int argc, char **argv) {
int ierr, rank, size;

MPI_Offset offset;
MPI_File file;
MPI_Status status;

int msgsize=25;
int    namelen;
char message[msgsize], name[msgsize], process[msgsize];
char   processor_name[MPI_MAX_PROCESSOR_NAME];

ierr = MPI_Init(&argc, &argv);
ierr = MPI_Get_processor_name(processor_name,&namelen);
ierr = MPI_Comm_size(MPI_COMM_WORLD, &size);
ierr = MPI_Comm_rank(MPI_COMM_WORLD, &rank);

sprintf(message, "host %s has process: %i \n", processor_name, rank+1);

msgsize = strlen (message);

offset = (msgsize*rank);

MPI_File_open(MPI_COMM_WORLD, "data.txt", MPI_MODE_CREATE|MPI_MODE_WRONLY,
MPI_INFO_NULL, &file);
MPI_File_seek(file, offset, MPI_SEEK_SET);
MPI_File_write(file, message, msgsize, MPI_CHAR, &status);
MPI_File_close(&file);
MPI_Finalize();
return 0;
}

I had the usual trouble manipulating strings in C. That's ok, I still have the same trouble with that since the early 80s.

This code produces an ASCII text file that looks like this:

$ cat data.txt
host node01 has process: 1
host node02 has process: 2
host node03 has process: 3
host node04 has process: 4
host node05 has process: 5
host node06 has process: 6
host node07 has process: 7
host node08 has process: 8
host node09 has process: 9
host node10 has process: 10
host node11 has process: 11
host node12 has process: 12
host node13 has process: 13
host node14 has process: 14
host node15 has process: 15
host node16 has process: 16
host node17 has process: 17
host node18 has process: 18
host node19 has process: 19
host node20 has process: 20
host node21 has process: 21
host node22 has process: 22
host node23 has process: 23
host node24 has process: 24
host node25 has process: 25
host node26 has process: 26
host node27 has process: 27
host node28 has process: 28
host node29 has process: 29
host node30 has process: 30
host node31 has process: 31
host node32 has process: 32
$

I was off today so I also learned how to split up a large file of data and distribute it to each node (using python with mpi4py) I had a great deal of fun doing that. The implications are unlimited and I am expending some effort to learn more python as I feel I can create some very interesting code with that.

I'll clean up some code and paste it here soon. Python is fast replacing perl for me but I have a lot more work to do before it's as easy as writing perl code/

28 0g 0:00:15:11 47.37% 2/3 (ETA: 20:23:03) 0g/s 3.515p/s 9.987c/s 9.987C/s Yrotinomda
18 0g 0:00:15:11 29.82% 2/3 (ETA: 20:41:54) 0g/s 3.514p/s 9.918c/s 9.918C/s adept8
1 0g 0:00:15:11 0.00% 2/3 (ETA: 2020-06-06 07:45) 0g/s 16.91p/s 50.14c/s 50.14C/s insubstantial
30 0g 0:00:15:11 50.88% 2/3 (ETA: 20:20:50) 0g/s 3.529p/s 10.01c/s 10.01C/s adopT
21 0g 0:00:15:11 35.09% 2/3 (ETA: 20:34:16) 0g/s 3.530p/s 9.939c/s 9.939C/s adequate.
14 0g 0:00:15:11 22.81% 2/3 (ETA: 20:57:34) 0g/s 3.493p/s 9.842c/s 9.842C/s actuary7
27 0g 0:00:15:11 45.61% 2/3 (ETA: 20:24:17) 0g/s 3.464p/s 9.830c/s 9.830C/s egadA
32 0g 0:00:15:11 54.39% 2/3 (ETA: 20:18:55) 0g/s 3.473p/s 9.855c/s 9.855C/s 4addendum
16 0g 0:00:15:11 26.32% 2/3 (ETA: 20:48:41) 0g/s 3.502p/s 9.907c/s 9.907C/s adenoid5
25 0g 0:00:15:11 42.19% 2/3 (ETA: 20:26:59) 0g/s 3.405p/s 9.632c/s 9.632C/s carolinarail
15 0g 0:00:15:11 24.56% 2/3 (ETA: 20:52:49) 0g/s 3.460p/s 9.764c/s 9.764C/s acrobat9
17 0g 0:00:15:11 28.07% 2/3 (ETA: 20:45:05) 0g/s 3.495p/s 9.860c/s 9.860C/s adagio4
11 0g 0:00:15:11 17.54% 2/3 (ETA: 21:17:33) 0g/s 3.478p/s 9.779c/s 9.779C/s acrimony2
10 0g 0:00:15:11 15.79% 2/3 (ETA: 21:27:10) 0g/s 3.503p/s 9.855c/s 9.855C/s ACCLIMATIZE
2 0g 0:00:15:12 1.76% 2/3 (ETA: 10:16:40) 0g/s 16.47p/s 48.83c/s 48.83C/s lightweight
3 0g 0:00:15:11 3.51% 2/3 (ETA: 03:03:43) 0g/s 3.478p/s 9.818c/s 9.818C/s Ambiguity
26 0g 0:00:15:11 43.86% 2/3 (ETA: 20:25:38) 0g/s 3.469p/s 9.817c/s 9.817C/s BunBun
31 0g 0:00:15:11 52.63% 2/3 (ETA: 20:19:51) 0g/s 3.517p/s 9.974c/s 9.974C/s 2adjure
20 0g 0:00:15:11 33.33% 2/3 (ETA: 20:36:33) 0g/s 3.419p/s 9.615c/s 9.615C/s accurate0
9 0g 0:00:15:11 14.04% 2/3 (ETA: 21:39:11) 0g/s 3.457p/s 9.756c/s 9.756C/s 1acrid
29 0g 0:00:15:11 49.12% 2/3 (ETA: 20:21:55) 0g/s 3.442p/s 9.748c/s 9.748C/s bumpyypmub
24 0g 0:00:15:11 40.35% 2/3 (ETA: 20:28:38) 0g/s 3.447p/s 9.782c/s 9.782C/s vtrx
5 0g 0:00:15:11 7.02% 2/3 (ETA: 23:27:22) 0g/s 3.480p/s 9.837c/s 9.837C/s acupuncture1
23 0g 0:00:15:11 38.61% 2/3 (ETA: 20:30:20) 0g/s 3.477p/s 9.816c/s 9.816C/s battlefront
7 0g 0:00:15:11 10.53% 2/3 (ETA: 22:15:15) 0g/s 3.526p/s 9.943c/s 9.943C/s bursaebursae
6 0g 0:00:15:12 8.77% 2/3 (ETA: 22:44:16) 0g/s 3.517p/s 9.948c/s 9.948C/s Adjacent1
13 0g 0:00:15:12 21.05% 2/3 (ETA: 21:03:11) 0g/s 3.496p/s 9.892c/s 9.892C/s addressee3
12 0g 0:00:15:11 19.30% 2/3 (ETA: 21:09:41) 0g/s 3.494p/s 9.840c/s 9.840C/s actually!
19 0g 0:00:15:11 31.58% 2/3 (ETA: 20:39:05) 0g/s 3.410p/s 9.621c/s 9.621C/s accustomed6
22 0g 0:00:15:11 36.84% 2/3 (ETA: 20:32:13) 0g/s 3.486p/s 9.847c/s 9.847C/s acyclic?
4 0g 0:00:15:11 5.26% 2/3 (ETA: 00:39:29) 0g/s 3.500p/s 9.904c/s 9.904C/s adheres
8 0g 0:00:15:12 12.28% 2/3 (ETA: 21:54:45) 0g/s 16.74p/s 49.60c/s 49.60C/s yffij

As you can see it has only been running for 15-minutes.

2014-09-08

1819 Right now the cluster is producing an average of 13.86 cracks/second per node with the total cracks/second cluster = 443.59 so it's only eked out minimal gains in almost 12-hours.

The status report also keeps producing differing format output with the ETA missing completely at times.

16 0g 0:20:11:04 27.15% 2/3 (ETA: 2014-09-11 00:22) 0g/s 3.132p/s 10.13c/s 10.13C/s glassed5
19 0g 0:20:11:07 32.41% 2/3 (ETA: 2014-09-10 12:17) 0g/s 3.136p/s 10.14c/s 10.14C/s glengarries6
28 0g 0:20:11:28 48.20% 2/3 (ETA: 15:53:54) 0g/s 3.131p/s 10.13c/s 10.13C/s Atamotsalboilg
12 0g 0:20:11:30 20.13% 2/3 (ETA: 2014-09-12 02:19) 0g/s 3.129p/s 10.11c/s 10.11C/s glad!
6 0g 0:20:10:58 9.60% 2/3 (ETA: 2014-09-16 16:12) 0g/s 3.131p/s 10.12c/s 10.12C/s Gladite1
10 0g 0:20:10:53 16.64% 2/3 (ETA: 2014-09-12 23:19) 0g/s 3.139p/s 10.15c/s 10.15C/s GROWING
13 0g 0:20:10:52 21.88% 2/3 (ETA: 2014-09-11 18:14) 0g/s 3.132p/s 10.12c/s 10.12C/s gland3
15 0g 0:20:11:33 25.39% 2/3 (ETA: 2014-09-11 05:32) 0g/s 3.120p/s 10.08c/s 10.08C/s gigs9
21 0g 0:20:11:35 35.92% 2/3 (ETA: 2014-09-10 06:13) 0g/s 3.138p/s 10.14c/s 10.14C/s globetrot.
22 0g 0:20:11:25 37.67% 2/3 (ETA: 2014-09-10 03:36) 0g/s 3.133p/s 10.14c/s 10.14C/s glaucous?
27 0g 0:20:10:51 46.45% 2/3 (ETA: 17:28:08) 0g/s 3.137p/s 10.15c/s 10.15C/s sredilG
30 0g 0:20:11:43 51.71% 2/3 (ETA: 13:03:31) 0g/s 3.136p/s 10.15c/s 10.15C/s glottideaN
9 0g 0:20:11:20 14.87% 2/3 (ETA: 2014-09-13 13:48) 0g/s 3.136p/s 10.14c/s 10.14C/s 1glidings
29 0g 0:20:11:26 71.14% 2/3 (ETA: 02:23:31) 0g/s 3.140p/s 10.15c/s 10.15C/s Isotropous8
1 2g 0:20:12:08  3/3 0.000027g/s 18.64p/s 50.50c/s 50.50C/s 190849j
2 0g 0:20:11:35  3/3 0g/s 15.56p/s 50.35c/s 50.35C/s 0separ
24 0g 0:20:11:06 41.05% 2/3 (ETA: 2014-09-09 23:11) 0g/s 3.133p/s 10.13c/s 10.13C/s dsrptrs
32 0g 0:20:11:45 55.22% 2/3 (ETA: 10:34:46) 0g/s 3.128p/s 10.12c/s 10.12C/s 4gladiola
18 0g 0:20:11:01 30.65% 2/3 (ETA: 2014-09-10 15:51) 0g/s 3.130p/s 10.12c/s 10.12C/s glacieret8
7 0g 0:20:10:55 70.13% 2/3 (ETA: 02:47:40) 0g/s 3.040p/s 9.951c/s 9.951C/s Viabilities6
23 0g 0:20:11:41 65.61% 2/3 (ETA: 04:47:09) 0g/s 3.120p/s 10.09c/s 10.09C/s Disruptment7
25 0g 0:20:11:50 81.17% 2/3 (ETA: 22:53:04) 0g/s 3.122p/s 10.09c/s 10.09C/s 9bedground
26 0g 0:20:11:22 70.96% 2/3 (ETA: 02:27:40) 0g/s 3.128p/s 10.11c/s 10.11C/s Fishily8
20 0g 0:20:10:56 34.16% 2/3 (ETA: 2014-09-10 09:05) 0g/s 3.123p/s 10.11c/s 10.11C/s gilia0
14 0g 0:20:11:19 23.64% 2/3 (ETA: 2014-09-11 11:25) 0g/s 3.131p/s 10.12c/s 10.12C/s glaire7
4 0g 0:20:11:42 6.09% 2/3 (ETA: 2014-09-21 17:24) 0g/s 3.127p/s 10.12c/s 10.12C/s glaserites
3 0g 0:20:11:52 4.60% 2/3 (ETA: 2014-09-26 05:20) 0g/s 3.133p/s 10.14c/s 10.14C/s Musicians
17 0g 0:20:11:23 28.90% 2/3 (ETA: 2014-09-10 19:51) 0g/s 3.138p/s 10.14c/s 10.14C/s globated4
31 0g 0:20:11:32 53.46% 2/3 (ETA: 11:46:36) 0g/s 3.130p/s 10.12c/s 10.12C/s 2glams
8 0g 0:20:11:23  3/3 0g/s 15.30p/s 49.45c/s 49.45C/s jujemu
11 0g 0:20:11:05 18.35% 2/3 (ETA: 2014-09-12 11:59) 0g/s 3.040p/s 9.949c/s 9.949C/s frypan2
5 0g 0:20:11:30 7.85% 2/3 (ETA: 2014-09-18 15:16) 0g/s 3.133p/s 10.13c/s 10.13C/s glaurs1

In the example above you can see node01 has located (2) valid passwords.

$ ls -l *.pot
-rw------- 1 mpi gpio 217 Sep  8 08:49 john.pot
$ cat john.pot
$6$mYd9MYsk$0zW0S2HBiKjx2Z4Xzvl7NMQPadxdJUUab97Ali/B3IjVMNPu371XTm/oRO.3MLWshaiuchGihPJjtRwCALnAS0:**password**
$6$.O9ruMEK$ZwKMS7xcGO0Fewww8ay/PEmBp9/eGoTiK5HZrW8UhQlBiGYxfbV59ulJEwUZr3xt2H11qkklC2Jq2uz1nDSV60:**raspberry**

So as you can see it doesn't take very long to find a password if it's in a dictionary.

My complaint about formatting is that the "ETA" fields are missing. For scripting it would be very nice if it simply recorded how many days/hours it took to find it.

The documentation that I can locate which may only refer to an earlier version (this is life with JtR it seems) it is supposed to force the cluster to reload and stop work on the items that were already guessed. I am referring to the config. file option "ReloadAtCrack = Y" …
i.e.

# If set to Y, a session using --fork or MPI will signal to other nodes when
# it has written cracks to the pot file (note that this writing is delayed
# by buffers and the "Save" timer above), so they will re-sync.
ReloadAtCrack = Y

README.mpi also states:

There is also a john.conf option "ReloadAtCrack" that when enabled will make an MPI
node (or —fork process) signal to the others that they should resync.
Unless you have independant jobs running this should be enough.

I'm not sure it did that but it did print the information in the log file (I redirected stdout/stderr) and seemingly displayed current status of every node around that time. There are still (2) passwords remaining to be cracked so I intend to keep going if I can. These last two are "real" passwords and I sincerely doubt they will be cracked. "Show Me!"

0720 Right now the cluster is producing an average of 13.85 cracks/second per node with the total cracks/second cluster = 443.16. I really need to write some tool to do this calculation.

2014-09-07

Today I obtained the latest John the Ripper source code and compiled everything. It went very well and there have been substantial improvements. I also like now that they officially support ARM cpu now. (1.8.0.2-bleeding-jumbo_mpi [linux-gnueabihf 32-bit armv7l-autoconf])

The options have changed a lot and some methods that used to work don't appear to work any more.

A good example of that was how I used to check the status or progress of the cluster running John.

mpirun -n 32 -f /master/mpi_tests/machinefile ./john --status=mpi

That method now produces an error: "Can't use —fork with MPI."

They have changed the status functionality it seems (I haven't looked at the code to make comparisons yet.)

If you simply invoke it as "john —status=mpi" that works fine. Although there is much more information available now I haven't yet found out how to produce a summary value for the entire cluster.

Here is a snippet from the updated JtR FAQ

Q: What do the various numbers printed on the status line mean?
A: As of version 1.8.0, the status line may include: successful guess count ("g"), session duration (in the D:HH:MM:SS format for days, hours, minutes, and seconds), progress indicator (percent done and optionally pass number out of the total number of passes), up to four speed metrics ("g/s", "p/s", "c/s", and "C/s"), and the current (range of) candidate password(s) being tested (John is often able to test multiple candidate passwords in parallel for better performance, hence a range). The four speed metrics are as follows: g/s is successful guesses per second (so it'll stay at 0 until at least one password is cracked), p/s is candidate passwords tested per second, c/s is "crypts" (password hash or cipher computations) per second, and C/s is combinations of candidate password and target hash per second. Versions of John prior to 1.8.0 displayed only the C/s rate (calling it c/s). When you restore a pre-1.8.0 session with version 1.8.0 or newer, only the g/s and C/s rates will be displayed, because the older .rec file format lacked information needed to compute p/s and c/s.

When I have some time I will write some python tool to create a status report with averages and summaries.

There is also a small bug with the percentage completion and ETA is reported for the first node. It might have something to do with the percent completed value being less than 1.0 at this point. We will see if it begins reporting properly after a day or so.

Example

1 0g 0:02:28:52  2/3 0g/s 12.48p/s 49.79c/s 49.79C/s
2 0g 0:02:28:43 59.00% 2/3 (ETA: 02:09:40) 0g/s 12.32p/s 49.16c/s 49.16C/s
3 0g 0:02:28:37 3.00% 2/3 (ETA: 2014-09-11 08:31) 0g/s 2.545p/s 10.05c/s 10.05C/s
4 0g 0:02:28:47 5.00% 2/3 (ETA: 2014-09-09 23:33) 0g/s 2.540p/s 10.04c/s 10.04C/s
[snip]

If you notice Node 1 is missing the percentage which plays hell with any script you want to write that parses defined fields.

$ ./john --status=mpi
1 0g 0:00:21:53  2/3 0g/s 11.94p/s 46.96c/s 46.96C/s
2 0g 0:00:21:51 1.00% 2/3 (ETA: 2014-09-09 10:20) 0g/s 11.96p/s 47.03c/s 47.03C/s
3 0g 0:00:21:38 3.00% 2/3 (ETA: 09:57:14) 0g/s 2.627p/s 9.644c/s 9.644C/s
4 0g 0:00:21:51 5.00% 2/3 (ETA: 05:12:55) 0g/s 2.605p/s 9.601c/s 9.601C/s
5 0g 0:00:21:49 7.00% 2/3 (ETA: 03:07:37) 0g/s 2.566p/s 9.418c/s 9.418C/s
6 0g 0:00:21:48 8.00% 2/3 (ETA: 02:28:28) 0g/s 2.618p/s 9.627c/s 9.627C/s
7 0g 0:00:21:33 10.00% 2/3 (ETA: 01:31:43) 0g/s 2.658p/s 9.731c/s 9.731C/s
8 0g 0:00:21:45 12.00% 2/3 (ETA: 00:57:16) 0g/s 12.03p/s 47.26c/s 47.26C/s
9 0g 0:00:21:49 14.00% 2/3 (ETA: 00:31:47) 0g/s 2.630p/s 9.656c/s 9.656C/s
10 0g 0:00:21:50 15.00% 2/3 (ETA: 00:21:29) 0g/s 2.629p/s 9.619c/s 9.619C/s
11 0g 0:00:21:49 17.00% 2/3 (ETA: 00:04:17) 0g/s 2.611p/s 9.541c/s 9.541C/s
12 0g 0:00:21:53 19.00% 2/3 (ETA: 23:51:03) 0g/s 2.593p/s 9.482c/s 9.482C/s
13 0g 0:00:21:36 21.00% 2/3 (ETA: 23:39:01) 0g/s 2.576p/s 9.489c/s 9.489C/s
14 0g 0:00:21:34 22.00% 2/3 (ETA: 23:34:13) 0g/s 2.633p/s 9.638c/s 9.638C/s
15 0g 0:00:21:45 24.00% 2/3 (ETA: 23:26:38) 0g/s 2.591p/s 9.504c/s 9.504C/s
16 0g 0:00:21:39 26.00% 2/3 (ETA: 23:19:23) 0g/s 2.581p/s 9.498c/s 9.498C/s
17 0g 0:00:21:53 28.00% 2/3 (ETA: 23:14:02) 0g/s 2.638p/s 9.712c/s 9.712C/s
18 0g 0:00:21:31 29.00% 2/3 (ETA: 23:10:26) 0g/s 2.576p/s 9.426c/s 9.426C/s
19 0g 0:00:21:51 31.00% 2/3 (ETA: 23:06:24) 0g/s 2.633p/s 9.693c/s 9.693C/s
20 0g 0:00:21:32 33.00% 2/3 (ETA: 23:01:29) 0g/s 2.603p/s 9.509c/s 9.509C/s
21 0g 0:00:21:51 35.00% 2/3 (ETA: 22:58:20) 0g/s 2.639p/s 9.654c/s 9.654C/s
22 0g 0:00:21:47 36.00% 2/3 (ETA: 22:56:29) 0g/s 2.587p/s 9.501c/s 9.501C/s
23 0g 0:00:21:53 38.00% 2/3 (ETA: 22:53:28) 0g/s 2.614p/s 9.612c/s 9.612C/s
24 0g 0:00:21:40 40.00% 2/3 (ETA: 22:50:16) 0g/s 2.580p/s 9.522c/s 9.522C/s
25 0g 0:00:21:51 56.00% 2/3 (ETA: 22:34:56) 0g/s 2.615p/s 9.646c/s 9.646C/s
26 0g 0:00:21:45 43.00% 2/3 (ETA: 22:46:35) 0g/s 2.603p/s 9.609c/s 9.609C/s
27 0g 0:00:21:34 45.00% 2/3 (ETA: 22:44:07) 0g/s 2.517p/s 9.275c/s 9.275C/s
28 0g 0:00:21:50 47.00% 2/3 (ETA: 22:42:23) 0g/s 2.574p/s 9.535c/s 9.535C/s
29 0g 0:00:21:47 49.00% 2/3 (ETA: 22:40:26) 0g/s 2.615p/s 9.654c/s 9.654C/s
30 0g 0:00:21:48 50.00% 2/3 (ETA: 22:39:34) 0g/s 2.585p/s 9.532c/s 9.532C/s
31 0g 0:00:21:47 52.00% 2/3 (ETA: 22:37:52) 0g/s 2.623p/s 9.706c/s 9.706C/s
32 0g 0:00:21:47 54.00% 2/3 (ETA: 22:36:19) 0g/s 2.592p/s 9.585c/s 9.585C/s
$

Detailed Statistics

There now seems to be a new and improved status functionality if you send a USR1 signal to the mpirun process, You get a lot of detail. Much more detail than —status produces.

As you may notice the ETA output below is also missing fields unpredictably.

Some have merely hours, others have a date days into the future with a time as well.) The ETA fields can be handled though as the parser can key on the parentheses. See entries for node01, node02, node03 below.

32 0g 0:01:03:45 54.43% 2/3 (ETA: 23:52:24) 0g/s 2.545p/s 9.912c/s 9.912C/s 4bexley
24 0g 0:01:03:38 40.38% 2/3 (ETA: 00:32:58) 0g/s 2.542p/s 9.898c/s 9.898C/s blkn's
2 0g 0:01:03:49 1.96% 2/3 (ETA: 2014-09-10 04:05) 0g/s 12.19p/s 48.49c/s 48.49C/s marilee
31 0g 0:01:03:45 52.67% 2/3 (ETA: 23:56:18) 0g/s 2.565p/s 9.991c/s 9.991C/s 2bianchi
21 0g 0:01:03:49 35.13% 2/3 (ETA: 00:56:53) 0g/s 2.568p/s 9.966c/s 9.966C/s bhotia.
22 0g 0:01:03:45 36.88% 2/3 (ETA: 00:48:07) 0g/s 2.551p/s 9.917c/s 9.917C/s bexhill?
18 0g 0:01:03:29 29.86% 2/3 (ETA: 01:28:07) 0g/s 2.549p/s 9.900c/s 9.900C/s betoya8
12 0g 0:01:03:51 19.34% 2/3 (ETA: 03:25:21) 0g/s 2.542p/s 9.865c/s 9.865C/s betti!
11 0g 0:01:03:47 17.58% 2/3 (ETA: 03:57:59) 0g/s 2.561p/s 9.935c/s 9.935C/s bfrg2
23 0g 0:01:03:51 38.65% 2/3 (ETA: 00:40:23) 0g/s 2.560p/s 9.951c/s 9.951C/s bulgarics
25 0g 0:01:03:49 57.21% 2/3 (ETA: 23:46:45) 0g/s 2.555p/s 9.941c/s 9.941C/s Molifying2
20 0g 0:01:03:30 33.37% 2/3 (ETA: 01:05:48) 0g/s 2.559p/s 9.932c/s 9.932C/s betulites0
8 0g 0:01:03:43 12.49% 2/3 (ETA: 06:25:23) 0g/s 12.31p/s 48.94c/s 48.94C/s oiccasam
10 0g 0:01:03:48 15.83% 2/3 (ETA: 04:38:12) 0g/s 2.564p/s 9.950c/s 9.950C/s BLUEY
1 0g 0:01:03:51 0.11% 2/3 (ETA: 2014-10-16 17:20) 0g/s 12.35p/s 49.15c/s 49.15C/s Evart's
3 0g 0:01:03:36 3.89% 2/3 (ETA: 2014-09-09 01:12) 0g/s 2.566p/s 9.970c/s 9.970C/s Adelaster
27 0g 0:01:03:32 45.65% 2/3 (ETA: 00:14:39) 0g/s 2.527p/s 9.840c/s 9.840C/s remesseB
7 0g 0:01:03:31 10.63% 2/3 (ETA: 07:52:55) 0g/s 2.580p/s 10.01c/s 10.01C/s elburnelburn
5 0g 0:01:03:47 7.06% 2/3 (ETA: 12:59:00) 0g/s 2.545p/s 9.890c/s 9.890C/s bevash1
15 0g 0:01:03:43 24.60% 2/3 (ETA: 02:14:18) 0g/s 2.554p/s 9.925c/s 9.925C/s bexley9
14 0g 0:01:03:32 22.85% 2/3 (ETA: 02:33:34) 0g/s 2.569p/s 9.973c/s 9.973C/s bhabha7
28 0g 0:01:03:48 47.41% 2/3 (ETA: 00:09:48) 0g/s 2.547p/s 9.927c/s 9.927C/s Ztirraib
4 0g 0:01:03:49 5.30% 2/3 (ETA: 17:58:31) 0g/s 2.554p/s 9.938c/s 9.938C/s bhotiyas
29 0g 0:01:03:45 49.23% 2/3 (ETA: 00:04:46) 0g/s 2.560p/s 9.967c/s 9.967C/s elliille
16 0g 0:01:03:37 26.36% 2/3 (ETA: 01:56:47) 0g/s 2.552p/s 9.930c/s 9.930C/s bevon5
9 0g 0:01:03:48 14.08% 2/3 (ETA: 05:28:30) 0g/s 2.564p/s 9.961c/s 9.961C/s 1bhola
17 0g 0:01:03:52 28.11% 2/3 (ETA: 01:42:21) 0g/s 2.572p/s 9.998c/s 9.998C/s bibionidae4
26 0g 0:01:03:43 43.97% 2/3 (ETA: 00:20:14) 0g/s 2.556p/s 9.952c/s 9.952C/s ElaraElara
6 0g 0:01:03:46 8.81% 2/3 (ETA: 09:58:53) 0g/s 2.561p/s 9.956c/s 9.956C/s Bhikku1
30 0g 0:01:03:47 50.92% 2/3 (ETA: 00:00:31) 0g/s 2.551p/s 9.929c/s 9.929C/s biancA
19 0g 0:01:03:50 31.62% 2/3 (ETA: 01:17:05) 0g/s 2.562p/s 9.962c/s 9.962C/s bhudevi6
13 0g 0:01:03:35 21.09% 2/3 (ETA: 02:56:55) 0g/s 2.528p/s 9.835c/s 9.835C/s besselian3

This will also be a challenge to parse as it is dumped to stderr so I ran john with stdout and stderr appending to a log file which will contain more than just statistics.

I also added a few more test password entries to offer some more work for the cluster and am benchmarking some test passwords. Overall I don't like all the chaos with the options, I was reading in the same release file how one option added was superseded by a later option in the same release file for the source I am using.

2014-09-06

The cluster has been running Parallel Jack the Ripper since yesterday. The wireless network has been stable for every compute node and the master node has been flawless since I started using a wired connection again.

+ mpirun -n 32 -f /master/mpi_tests/machinefile ./john --config=/master/tmp/john/john-1.7.9-jumbo-7/run/john.conf --status=mpi
 23: guesses: 0 time: 2:05:45:07 0.00% (3) c/s: 9.51
 12: guesses: 0 time: 2:06:32:34 0.00% (3) c/s: 9.36
 11: guesses: 0 time: 2:06:33:12 0.00% (3) c/s: 9.36
 16: guesses: 0 time: 2:05:45:00 0.00% (3) c/s: 9.50
  6: guesses: 0 time: 2:06:23:08 0.00% (3) c/s: 9.35
  4: guesses: 0 time: 2:06:22:18 0.00% (3) c/s: 9.36
 20: guesses: 0 time: 2:06:29:26 0.00% (3) c/s: 9.36
 30: guesses: 0 time: 2:06:41:36 0.00% (3) c/s: 9.26
 15: guesses: 0 time: 2:05:46:25 0.00% (3) c/s: 9.50
 19: guesses: 0 time: 2:05:44:29 0.00% (3) c/s: 9.49
 24: guesses: 0 time: 2:05:57:22 0.00% (3) c/s: 9.50
 28: guesses: 0 time: 2:05:54:21 0.00% (3) c/s: 9.49
  9: guesses: 0 time: 2:03:18:11 0.00% (3) c/s: 9.84
  7: guesses: 0 time: 2:02:41:55 0.00% (3) c/s: 41.83
 10: guesses: 0 time: 2:03:41:03 0.00% (3) c/s: 9.85
  5: guesses: 0 time: 2:02:39:41 0.00% (3) c/s: 10.05
  3: guesses: 0 time: 2:06:38:43 0.00% (3) c/s: 9.29
 14: guesses: 0 time: 2:06:34:16 0.00% (3) c/s: 9.35
  8: guesses: 0 time: 2:06:21:52 0.00% (3) c/s: 9.34
  1: guesses: 0 time: 2:02:24:29 0.00% (3) c/s: 50.83
 29: guesses: 0 time: 2:06:14:12 0.00% (3) c/s: 9.30
  2: guesses: 0 time: 2:06:23:23 0.00% (3) c/s: 9.37
 26: guesses: 0 time: 2:06:30:21 0.00% (3) c/s: 9.37
 27: guesses: 0 time: 2:06:43:10 0.00% (3) c/s: 9.33
 21: guesses: 0 time: 2:05:45:11 0.00% (3) c/s: 9.50
  0: guesses: 0 time: 2:02:23:58 0.00% (3) c/s: 50.73
 17: guesses: 0 time: 2:05:45:16 0.00% (3) c/s: 9.50
 18: guesses: 0 time: 2:06:33:30 0.00% (3) c/s: 9.29
 13: guesses: 0 time: 2:06:33:12 0.00% (3) c/s: 9.35
 22: guesses: 0 time: 2:06:32:11 0.00% (3) c/s: 9.34
 25: guesses: 0 time: 2:05:44:18 0.00% (3) c/s: 9.51
 31: guesses: 0 time: 2:05:56:04 0.00% (3) c/s: 9.46
SUM: guesses: 0 time: 2:06:43:10 0.00% (3) c/s: 410 avg12.83

2014-09-05

Changing the authentication timeout stopped the deauthentication errors but I am still losing the wlan0 link. I am very frustrated about that now so I have placed the master node back onto a hardwired network connection for now.

This works very well and has solved all the issues I have been having with the cluster nodes being able to access NFS reliably again and cluster message traffic is very fast again with wall clock times dropping to 400ms on some measurements that were taking 3-4 seconds earlier when the master node was also on wifi.

NOTE: individual compute nodes network access is still solid and wireless is working flawlessly for them which leads me to believe this is still some kind of congestion issue (on the master node only.)

I will keep investigating. I need to understand what is the limiting factor and I have a feeling it's that the Airlink-101 is failing under large numbers of connections and 'resets'. The driver appears to reassociate but you have to restart networking to get it back up.

This was why I originally preferred to use a wired network connection, purely for stability and performance since NFS is involved as well. Everything is working perfectly on a wired network with very nice nfs performance at each node.

I may have located some 24-port gig-e switches that run on 5VDC but I really don't want all those wires again. I'd have to have two of them in order to attach all 33 systems.

Moving the operating system into nand memory

I decided to move the operating system into the internal nand using the procedure documented here

NOTE: Make sure you do not have any other local filesystems mounted (i.e. a usb or sata disk.)

This is because the procedure uses rsync and will copy the contents of any other mounted filesystems indiscriminately onto the internal nand.

If the contents of the mounted filesystems are larger than the nand (4GB) then you will run out of space and the procedure will fail.

If you get any error during the procedure whatsoever let it complete and then restart the procedure again. Whatever you do just don't reboot.

If you are not prompted to shut down at the end of the procedure then something went wrong and you should investigate before you reboot. Additionally, it's already handled resizing the filesystem so that procedure to resize the filesystem mentioned in the link above seems to be unnecessary.

This procedure would best be performed with a clean image from the distribution image. This procedure should most likely be done in single user mode.

Other than running out of nand space because I DID have a mounted filesystem and it did run out of space during the procedure (rsync.)

I re-started the procedure and everything went well from that point forward and I am running on the operating system installed on internal nand.

My swap is on a sata disk and all disk writing (other than minimal logging) is being done to sda (sata /dev/sda) and all data from the compute nodes is accessed via NFS to that disk as well. (both read and write) so that everything (including code) is accessible from the master node.

The ongoing wifi issues (master node)

I have been fighting wifi performance issues with the master node and the Netgear WNDR3700v2 and the stock Netgear software.

I miss having a hard-wired cluster network and the performance on wireless will never compare.

I ended up loading DD-WRT (v24-sp2 (04/18/14) std - build 23919) onto the WNDR3700v2 and that helped a bit allowing me to tune the access point/router for a large number of clients and handling some congestion issues.

echo "16384" > /proc/sys/net/ipv4/netfilter/ip_conntrack_max
echo "16384" > /sys/module/nf_conntrack/parameters/hashsize

prefix=/proc/sys/net/ipv4/netfilter/ip_conntrack

echo 600 > ${prefix}_generic_timeout
echo 30 > ${prefix}_udp_timeout
echo 60 > ${prefix}_udp_timeout_stream
echo 1800 > ${prefix}_tcp_timeout_established
echo 120 > ${prefix}_tcp_timeout_syn_sent
echo 60 > ${prefix}_tcp_timeout_syn_recv
echo 120 > ${prefix}_tcp_timeout_fin_wait
echo 120 > ${prefix}_tcp_timeout_time_wait
echo 10 > ${prefix}_tcp_timeout_close
echo 60 > ${prefix}_tcp_timeout_close_wait
echo 30 > ${prefix}_tcp_timeout_last_ack

I was running into a new issue with my original master image for the cluster with it disconnecting (deauthenticating) under heavy network/high packet traffic so I tried several new distributions to see if the issue was something specific to my original A10 Generic Debian distribution's driver for the RealTek but I realized that everyone is using the same driver.

So after proving that every Debian Wheezy variant I had was experiencing this symptom I just decided to rebuild the master node using Cubian. Note: The CubieEZ distribution kernel does not support NFS. How disappointing.

Now that there is an "official" image for the Cubie board I might as well get with the program.

Note: 32 client nodes are still running the CubieEZ image because I like the fact that it uses the "green led" for disk activity. The Cubian image doesn't seem to do that though I must admit that I have not yet investigated enabling that.

The issue that spurred the total rebuild of the master node was that the wifi kept deauthenticating itself. This would also happen faster when the cluster was busy (600-700 connections between the master and all the nodes - NFS, MPI, cfengine, etc)

Example:

Sep  5 06:17:31 master kernel: [ 1276.495514] wlan0: authenticate with 74:44:01:7c:ac:1b
Sep  5 06:17:31 master kernel: [ 1276.513668] wlan0: send auth to 74:44:01:7c:ac:1b (try 1/3)
Sep  5 06:17:31 master kernel: [ 1276.518527] wlan0: authenticated
Sep  5 06:17:31 master kernel: [ 1276.544277] wlan0: associate with 74:44:01:7c:ac:1b (try 1/3)
Sep  5 06:17:31 master kernel: [ 1276.554509] wlan0: RX AssocResp from 74:44:01:7c:ac:1b (capab=0x411 status=0 aid=36)
Sep  5 06:17:31 master kernel: [ 1276.574867] wlan0: associated
Sep  5 06:17:41 master kernel: [ 1286.622518] wlan0: no IPv6 routers present
Sep  5 06:30:06 master kernel: [ 2031.016629] wlan0: deauthenticating from 74:44:01:7c:ac:1b by local choice (reason=3)
Sep  5 06:30:08 master kernel: [ 2033.577322] wlan0: authenticate with 74:44:01:7c:ac:1b
Sep  5 06:30:08 master kernel: [ 2033.596299] wlan0: send auth to 74:44:01:7c:ac:1b (try 1/3)
Sep  5 06:30:08 master kernel: [ 2033.600656] wlan0: authenticated
Sep  5 06:30:08 master kernel: [ 2033.626096] wlan0: associate with 74:44:01:7c:ac:1b (try 1/3)
Sep  5 06:30:08 master kernel: [ 2033.637500] wlan0: RX AssocResp from 74:44:01:7c:ac:1b (capab=0x411 status=0 aid=36)
Sep  5 06:30:08 master kernel: [ 2033.658140] wlan0: associated
Sep  5 06:30:19 master kernel: [ 2044.374359] wlan0: no IPv6 routers present
Sep  5 06:59:05 master kernel: [ 3770.150715] wlan0: deauthenticated from 74:44:01:7c:ac:1b (Reason: 3)
Sep  5 07:04:36 master kernel: [ 4101.390627] wlan0: authenticate with 74:44:01:7c:ac:1b
Sep  5 07:04:36 master kernel: [ 4101.408662] wlan0: send auth to 74:44:01:7c:ac:1b (try 1/3)
Sep  5 07:04:36 master kernel: [ 4101.414271] wlan0: authenticated
Sep  5 07:04:36 master kernel: [ 4101.439115] wlan0: associate with 74:44:01:7c:ac:1b (try 1/3)
Sep  5 07:04:36 master kernel: [ 4101.449880] wlan0: RX AssocResp from 74:44:01:7c:ac:1b (capab=0x411 status=0 aid=36)
Sep  5 07:04:36 master kernel: [ 4101.470528] wlan0: associated

I kept seeing references to disabling ipv6 but nothing helped. In fact this was a dead-end and may no longer even be an issue on newer kernels.

I finally seemed to have found the issue, and if you ask me, it's the router in the sense that the host doesn't seem to be able to re-authenticate under certain conditions. I have noticed this same behaviour with Debian Wheezy not only on Cubieboard but also the Raspberry Pi. When I reboot the WNDR3700v2 I often have to reboot the RPI that I use as a terminal server.

Any way to solve my "deauthentication issue" I had to change the re-authentication interval there (99999 seconds.)

It seems that the netgear (when using DD-WRT) can be set to N second interval to re-authenticate the key and that seems to at least delayed the problem. I ran into this with the original master image (I forget where I got the base system from), the CubieEZ image (I liked it for the way the led is used for disk activity), and lastly Cubian.

For reference and noted earlier I am using the Airlink-101 brand of wifi interface (RealTek) i.e.

(lsusb output)
Bus 001 Device 002: ID 0bda:8176 Realtek Semiconductor Corp. RTL8188CUS 802.11n WLAN Adapter

I ran into issues with permissions on the python package numpy when installed with cfengine as well. I had to manually fix those permissions on 2 systems so far and I assume all are broken so I will devise some cfengine script to fix that later.

I am only using numpy package on one MPI benchmark, "prime.py" which has another problem and needs a slight tweak as it fails with a python error now.

$ python prime.py
Fri Sep  5 12:30:46 2014

PRIME_MPI
  Python/MPI version
  Count the primes between  1 and 131072

  Use MPI to divide the computation among
  multiple processes.
Traceback (most recent call last):
  File "prime.py", line 85, in <module>
    prime_mpi ( )
  File "prime.py", line 58, in prime_mpi
    for i in range ( 2 + id, n + 1, id ):
ValueError: range() step argument must not be zero
$

I will look into fixing that soon but until I can stablize the wireless network and solve all the performance issues I have with MPI communications I don't really have the time.

I have been running John the Ripper (again) to stress test everything. There is a huge packet load on the cluster during initial start-up and/or restoring sessions as my wordlist files are huge and it slams all the nodes during initial loads.

Using DD-WRT has given me some nice tools to watch the network now and it's pretty much keeping 650-700 connections open at all times with spikes over 1000 at times.

Parallel John the Ripper (stress testing)

Initialization

john-initializing.jpg

This is a graph from DD-WRT on that network during the initialization when John is distributing pieces of the dictionary files. As you can see connections are being hammered very briefly in the beginning as the wordlist data is copied to each node.

Running one instance of John on every node brings up the load on each to about 1.2 and each node CPU is dedicated to John so it's running at about 90% of available cpu.

I have interrupted it a lot but you can see what is going on here:

+ mpirun -n 32 -f /master/mpi_tests/machinefile ./john --config=/master/tmp/john/john-1.7.9-jumbo-7/run/john.conf --status=mpi
  19: guesses: 0 time: 1:09:04:29 0.00% (3) c/s: 9.09
 28: guesses: 0 time: 1:09:14:21 0.00% (3) c/s: 9.07
  3: guesses: 0 time: 1:09:58:42 0.00% (3) c/s: 8.76
  0: guesses: 0 time: 1:05:43:58 0.00% (3) c/s: 50.70
 11: guesses: 0 time: 1:09:53:12 0.00% (3) c/s: 8.88
 26: guesses: 0 time: 1:09:50:21 0.00% (3) c/s: 8.89
 22: guesses: 0 time: 1:09:52:10 0.00% (3) c/s: 8.87
 14: guesses: 0 time: 1:09:54:16 0.00% (3) c/s: 8.87
 21: guesses: 0 time: 1:09:05:11 0.00% (3) c/s: 9.09
 15: guesses: 0 time: 1:09:06:25 0.00% (3) c/s: 9.08
 24: guesses: 0 time: 1:09:17:22 0.00% (3) c/s: 9.10
 23: guesses: 0 time: 1:09:05:07 0.00% (3) c/s: 9.10
  5: guesses: 0 time: 1:05:59:40 0.00% (3) c/s: 9.96
 10: guesses: 0 time: 1:07:01:03 0.00% (3) c/s: 9.64
  7: guesses: 0 time: 1:06:01:55 0.00% (3) c/s: 35.89
  6: guesses: 0 time: 1:09:43:07 0.00% (3) c/s: 8.86
 12: guesses: 0 time: 1:09:52:33 0.00% (3) c/s: 8.87
  8: guesses: 0 time: 1:09:41:52 0.00% (3) c/s: 8.85
  1: guesses: 0 time: 1:05:44:29 0.00% (3) c/s: 50.80
 17: guesses: 0 time: 1:09:05:16 0.00% (3) c/s: 9.09
 13: guesses: 0 time: 1:09:53:12 0.00% (3) c/s: 8.86
  4: guesses: 0 time: 1:09:42:17 0.00% (3) c/s: 8.86
  2: guesses: 0 time: 1:09:43:23 0.00% (3) c/s: 8.87
  9: guesses: 0 time: 1:06:38:11 0.00% (3) c/s: 9.63
 16: guesses: 0 time: 1:09:04:59 0.00% (3) c/s: 9.08
 18: guesses: 0 time: 1:09:53:30 0.00% (3) c/s: 8.85
 20: guesses: 0 time: 1:09:49:26 0.00% (3) c/s: 8.88
 27: guesses: 0 time: 1:10:03:10 0.00% (3) c/s: 8.83
 25: guesses: 0 time: 1:09:04:17 0.00% (3) c/s: 9.10
 29: guesses: 0 time: 1:09:34:12 0.00% (3) c/s: 8.78
 31: guesses: 0 time: 1:09:16:04 0.00% (3) c/s: 9.03
 30: guesses: 0 time: 1:10:01:36 0.00% (3) c/s: 8.87
SUM: guesses: 0 time: 1:10:03:10 0.00% (3) c/s: 388 avg12.14

The numbers above are skewed as it spent much of the last 24-hours hung with network issues (nfs mostly) while I was troubleshooting. I kept having to reboot each node as well to clear hung John processes that were impacted by the master server being disconnected from the access point.

I know it was disconnecting the master when connections went above 400 during my troubleshooting which took hours as I kept trying new distributions to see if there was some network kernel issue (or driver issue.)

Last night I was well above 450 c/s and I just left everything hung before rebooting today.

Power System

Basically I have the 12v 50AH battery connected to the power supply/charger and have been leaving the cluster running for several days. The 35A "power converter/charger" I obtained is perfect to keep the battery charged and well conditioned and handle the current demand from the cluster (about 11.5A on the DC side.)

If I can ever get the 12"x12"x10" box made everything will fit nicely. I will need to make sure the intake and exhaust (muffin fan) are vented in the box. The fan only runs periodically under full operating load but I assume once it's in the box it will run much longer.

2014-09-05-002-small.jpg