Cubical Monolith 2013 07

2013-07-30

It appears the mpi linking and compiling tools on any of my systems are not quite 100% working as I can't build any binary using hardware floating point or neon/vsfpv3. I can build non-mpi hardware floating point using both hardware fpu or neon/vsfpv3 and they both perform well. It's a non-issue and there are a few things for me left to try.

First convert everything to openmpi and only use openmpi to repeat all my efforts at building mpi-enabled binaries using fpu or neon. Obviously I'd prefer neon.

I ran linpack with fpu and with neon and neon is faster of course. I'm very pleased with the results on all the benchmarks so I redact all my uninformed statements about the distribution(s) I have tested with regards to this "floating point" issue.

My only issue is with mpicc bring broken in some stupid manner and I can try to build it from source or switch to openmpi (first) and see if the package is also broken. If it is also broken and not allowing me to create binaries that use hardware fpu or neon then I will have to build it all from scratch from source I guess to see if I can get mpicc to work with all the Cubieboard hardware (the fpu is slightly inferior to the neon as the neon isn't a pipeline according to what I've read any way.)

Here are some notes from attempts I made when trying to build John the Ripper to use hardware floating point. I was trying to hard code the right flags to gcc as I know that it can produce very fast floating point binaries, I've tested all the benchmarks and my 'master' unit is running faster than the published benchmark I am so happy to say.

None of the below methods seem to work at this point. I used various permutations of the below whilst troubleshooting, indeed.

#JOHN_CFLAGS = -mfloat-abi=hard -marm -mthumb-interwork -mcpu=cortex-a8 \
-mtune=cortex-a8 -march=armv7-a -funsafe-math-optimizations -fomit-frame-pointer \
-ffast-math -funroll-loops -funsafe-loop-optimizations

#JOHN_CFLAGS = -ftree-vectorize -mfpu=neon -mfloat-abi=softfp -marm -mthumb-interwork \
-mcpu=cortex-a8 -mtune=cortex-a8 -march=armv7-a -funsafe-math-optimizations \
-fomit-frame-pointer -ffast-math -funroll-loops -funsafe-loop-optimizations

#CC = mpicc -DHAVE_MPI -DJOHN_MPI_BARRIER -DJOHN_MPI_ABORT -mfloat-abi=hard -marm \
 -mthumb-interwork -mcpu=cortex-a8 -mtune=cortex-a8 -march=armv7-a \
-funsafe-math-optimizations -fomit-frame-pointer -ffast-math -funroll-loops \
-funsafe-loop-optimizations

#CC = mpicc -DHAVE_MPI -DJOHN_MPI_BARRIER -DJOHN_MPI_ABORT -ftree-vectorize \
-mfpu=neon -mfloat-abi=softfp -marm -mthumb-interwork -mcpu=cortex-a8 \
-mtune=cortex-a8 -march=armv7-a -funsafe-math-optimizations -fomit-frame-pointer \
-ffast-math -funroll-loops -funsafe-loop-optimizations

#CFLAGS = -c -Wall -O2 -fomit-frame-pointer -Wdeclaration-after-statement \
-I/usr/local/include $(HAVE_NSS) $(OMPFLAGS) $(JOHN_CFLAGS) $(AMDAPP)

#CFLAGS = -c -Wall -O2 -mcpu=cortex-a8 -march=armv7-a -mfloat-abi=hard \
-marm -mthumb-interwork -funsafe-math-optimizations -fno-fast-math \
-fomit-frame-pointer -Wdeclaration-after-statement -I/usr/local/include $(HAVE_NSS) \
$(OMPFLAGS) $(JOHN_CFLAGS) $(AMDAPP)

#CFLAGS = -c -Wall -O2 -mcpu=cortex-a8 -march=armv7-a -mfpu=neon -mfloat-abi=softfp \
-funsafe-math-optimizations -fno-fast-math -fomit-frame-pointer \
-Wdeclaration-after-statement -I/usr/local/include $(HAVE_NSS) $(OMPFLAGS) \
$(JOHN_CFLAGS) $(AMDAPP)

#CFLAGS = -c -Wall -O3 -funsafe-math-optimizations -fomit-frame-pointer -ffast-math \
-funroll-loops -funsafe-loop-optimizations -fomit-frame-pointer \
-Wdeclaration-after-statement -I/usr/local/include $(HAVE_NSS) $(OMPFLAGS) \
$(JOHN_CFLAGS) $(AMDAPP)

#ASFLAGS = -c $(JOHN_CFLAGS) --float_support=VFPv3 --neon $(OMPFLAGS)
#ASFLAGS = -c $(JOHN_CFLAGS) --neon $(OMPFLAGS)
#ASFLAGS = -c $(JOHN_CFLAGS) $(OMPFLAGS)

#LDFLAGS = -s -mfloat-abi=hard -marm -mthumb-interwork -mcpu=cortex-a8 \
-mtune=cortex-a8 -march=armv7-a -funsafe-math-optimizations -fomit-frame-pointer \
-ffast-math -funroll-loops -funsafe-loop-optimizations -L/usr/local/lib -L/usr/local/ssl/lib \
 -lssl -lcrypto -lm -lz $(JOHN_CFLAGS) $(OMPFLAGS) $(NSS_LDFLAGS)

#LDFLAGS = -s -ftree-vectorize -mfpu=neon -mfloat-abi=softfp -marm -mthumb-interwork \
-mcpu=cortex-a8 -mtune=cortex-a8 -march=armv7-a -funsafe-math-optimizations \
-fomit-frame-pointer -ffast-math -funroll-loops -funsafe-loop-optimizations \
-L/usr/local/lib -L/usr/local/ssl/lib -lssl -lcrypto -lm -lz $(JOHN_CFLAGS) $(OMPFLAGS) \
$(NSS_LDFLAGS)

I need to work on some hardware for the monolith I finally got some parts I've been waiting on for almost 2 weeks.

I also need to order 8-more units shortly to fill out the second tier of cpus. I was only able to get 11 across without rebuilding the entire enclosure so I worked out a method to stack 33 + 1 cpus without changing the exoskeleton.

The further I dig into this issue with mpicc and the floating point hardware I realize that I must be the only one proceeding in this direction, at least within the english-speaking sphere I am trapped within.

I guess no one is crazy enough to devote this much time and energy into turning a low performance, low cost arm design into some kind of parallel computing cluster.

I am because I think it makes a lot of sense. I can build a useful tool I believe once I can at least get MY applications working with floating point hardware even if I have to rebuild the entire mpi package. I suspect the underlying problem could be worse than I imagine if I can't build mpi to work with the proper fpu/neon switches into gcc. I know mpicc is supposedly just a wrapper (I will look into that shortly) but I just have a premonition it's because there may not be a way to build it unless I have a cross-compilation environment on an INTEL box.

Doom and gloom — but at least I am learning more about more recent gcc and the tool chain as I have not had to delve so deeply in many years.

TODAY'S SUMMARY: It is very stupid to have a so-called super computer that can't do floating point math using the embedded, built-in (2x) hardware floating point units. Right now the RPI outperforms the Cubie using "John the Ripper" (example mpi application using math) with 17cps vs Cubieboard 10.36 cps. (cracks per second) IT HAS BEEN VERY FRUSTRATING to locate the root issue but I am making progress as I work my way backwards troubleshooting things. I might have to hack on mpicc just to get it working which is hard to believe. Before I do that I will remove everything and switch to openmpi (I've been running mpich2 vanilla versions not openmpi) …

EARLIER:

See http://stackoverflow.com/questions/17364690/making-use-of-mfloat-abi-hard-and-mfpu-vfp-neon-codesourcery-lite-2013-05-24

I've heard ubuntu or linaro might.

Whenever I try to use neon support with softfp (which is the recommended way to access fpu) it fails. I am trying to use mpicc so I can't build working mpi-enabled code with hardware float support.

I might again be able to build working cross-compiled binaries if I do this on an intel server and then build those mpi-enabled binaries on that and not a Cubieboard itself. I am going to try linaro next.

When basic things like hardware floating support isn't working, how do you expect end users to use these systems? Don't tell me only engineers use them. Show me a working floating point support in the supplied linker and compilers before you tell me that.

2013-07-29

I know for certain that everything was built properly now I have to see if there is normal performance or if it is still hosed.

CFLAGS = -c -Wall -O2 -mcpu=cortex-a8 -march=armv7-a -mfloat-abi=hard \
-funsafe-math-optimizations -fno-fast-math -fomit-frame-pointer -Wdeclaration-after-statement \
-I/usr/local/include $(HAVE_NSS) $(OMPFLAGS) $(JOHN_CFLAGS) $(AMDAPP)

mpi@node01:/master/tmp/john-hf/john-1.7.9-jumbo-7/run$ ldd john
        libssl.so.1.0.0 => /usr/lib/arm-linux-gnueabihf/libssl.so.1.0.0 (0x40238000)
        libcrypto.so.1.0.0 => /usr/lib/arm-linux-gnueabihf/libcrypto.so.1.0.0 (0x40278000)
        libm.so.6 => /lib/arm-linux-gnueabihf/libm.so.6 (0x40117000)
        libz.so.1 => /lib/arm-linux-gnueabihf/libz.so.1 (0x40044000)
        libmpich.so.3 => /usr/lib/libmpich.so.3 (0x4039e000)
        libopa.so.1 => /usr/lib/libopa.so.1 (0x40039000)
        libmpl.so.1 => /usr/lib/libmpl.so.1 (0x4005d000)
        librt.so.1 => /lib/arm-linux-gnueabihf/librt.so.1 (0x400be000)
        libpthread.so.0 => /lib/arm-linux-gnueabihf/libpthread.so.0 (0x40069000)
        libgcc_s.so.1 => /lib/arm-linux-gnueabihf/libgcc_s.so.1 (0x40182000)
        libc.so.6 => /lib/arm-linux-gnueabihf/libc.so.6 (0x40505000)
        libdl.so.2 => /lib/arm-linux-gnueabihf/libdl.so.2 (0x40084000)
        /lib/ld-linux-armhf.so.3 (0x400f8000)

I have benchmarked the Raspberry Pi vs Cubie using fbench and if you compile the Cubie version with the right flags to direct gcc to load the hardware floating point support the two perform closely.

The Floating Point Benchmark Results: Cubieboard 5.448s vs RPI 5.589s

As you can see the Cubieboard was faster which you would expect given the cpu speed differences. I suppose I could temporarily overclock the RPI to get them both running 1ghz. Maybe next time.

I have to work on the Makefiles for John the Ripper to get it to use the hardware floating point support. I'm going to assume this is where I can solve it. I thought I had done this but I must have missed something since I KNOW that the Cubie can perform floating point math in a comparable manner now.

This is also why the Cubie needs it's own distro so people don't have to struggle to get the compilers to use the correct configuration by default. I also have to examine mpicc as well shortly to see if the problem lies there.

Raspberry PI @900mhz

pi@master:/master/tmp$ time ./fbench
Ready to begin John Walker's floating point accuracy
and performance benchmark.  80000 iterations will be made.

Measured run time in seconds should be divided by 80
to normalise for reporting results.  For archival results,
adjust iteration count so the benchmark runs about five minutes.

No errors in results.

real    0m5.589s
user    0m5.530s
sys     0m0.020s
pi@master:/master/tmp$

2013-07-28

I have some new images to upload shortly showing the redesign, but not tonight.

I have been seeing really serious floating point math issues with parallel John the Ripper as opposed to the Raspberry Pi so I have been investigating this.

mpi@master:/master/tmp$ gcc -O fbench.c -Wall -O2 -mcpu=cortex-a8 -march=armv7-a -mfpu=neon -mfloat-abi=hard -funsafe-math-optimizations -fno-fast-math -fomit-frame-pointer -Wdeclaration-after-statement -o fbench

mpi@master:/master/tmp$ time ./fbench
Ready to begin John Walker's floating point accuracy
and performance benchmark.  80000 iterations will be made.

Measured run time in seconds should be divided by 80
to normalise for reporting results.  For archival results,
adjust iteration count so the benchmark runs about five minutes.

No errors in results.

real    0m5.448s
user    0m5.440s
sys     0m0.000s
mpi@master:/master/tmp$

Note the compiler flags. If you compile fbench.c without the above, you will see a serious performance hit.

I've recompiled John the Ripper several times with those flags but it isn't helping.

Here is what fbench looks like with a basic generic compile without any flags.

cubie@node14:/master/tmp$ time ./fbench-01
Ready to begin John Walker's floating point accuracy
and performance benchmark.  80000 iterations will be made.

Measured run time in seconds should be divided by 80
to normalise for reporting results.  For archival results,
adjust iteration count so the benchmark runs about five minutes.

No errors in results.

real    0m9.879s
user    0m9.620s
sys     0m0.010s
cubie@node14:/master/tmp$

The NEON support seems to work. This might be an issue with mpicc not really processing those specific arguments but I will find out more soon.

2013-07-20

node10 has not died since yesterday when I rewired all of the power harness for that group of (4) systems.

I have to widen the framework of the Monolith in order to accommodate 12 systems across. I will have to widen the interior to at least 10.5"

What might be clear from the drawing below is that the boards will be hanging on threaded rods suspended between the framework on each side, with (3) rows of (12) systems for a total capacity of 36-processing nodes. The rods are 4-40 and I may need to hang them through supports above them if (2) threaded rods can't hold up the weight of all 12 systems per "hanging rack" … This is a terrible drawing but will hopefully help explain the larger illustration below.

suspended-rack-monolith.jpg

Instead of using 4-40 rods for the bottom set of holes I might use 3" machine screws and still secure them in groups of (4) systems to make the group "rigid" … I'm not quite sure what I will do yet. The 4-40 rod is on order. Having three racks of 12 systems each all in one location (one one side of the monolith) won't be as cool as having them running in "cubes" of 4 but I'll be able to fit about 36 systems (I plan on only 32+1) which means I will probably have to install a third 16-port ethernet switch just to get two extra ports.

I wanted to mount them in clusters of (4) as I had been but there is no physical room for that. This will allow me to also mount the fans just above the racks of systems as the other components won't have heat issues.

2013-07-19

Updated 2334MT: I have decided to redesign how the cards are installed since I am not going to have enough space the way it's currently designed.

cubical-monolith-redesign.jpg

Updated 2122MT: I have cut off the USB-A connector ends on the latest (4) units and wired them into the 5VDC distribution board. I also cut off the USB-A connector end of the rebuilt cable for node07 and wired it back into the harness.

node10 keeps dying but it's not power-related, or at least when it's down the power led is still illuminated. I need to attach a console to it so I can capture any messages when it dies. If it is a silent death I'll have to show it to George for a solution. The power supply handles 5VDC@24A and no other node is failing. This is one of the last (4) units I received. I'll know more about it soon. Power cycling always works fine after that however.

2013-07-17

Updated 2304MT: I am being plagued by intermittent power cables. I need to order a lot more power connectors as the power cables are being damaged by the packaging in China (I assume it's packaged in China) and I have seen some people complaining about them shutting down. The issue is the power cable of course. I just had node11 shut down and then node10 in that order after touching their power cables awhile ago.

I have been musing about everything and the only way I am ever going to be able to do the things I need to do is to build a cross-compilation system so I can make changes to the kernel as needed. I just feel it's a damned shame that we can't build kernels on our existing ARM systems.

2013-07-17-001-low.jpg

2013-07-16

I discovered a python module to control the Cubieboard LEDs (green and blue) by Mikkel Oscar Lyderik

See https://pypi.python.org/pypi/cubieleds/0.1 cubieleds 0.1

It is trivial to build and install using pip.

mpi@master:/master/tmp/src$ sudo su -
root@master:~# pip install cubieleds
Downloading/unpacking cubieleds
  Downloading cubieleds-0.1.tar.gz
  Running setup.py egg_info for package cubieleds

Installing collected packages: cubieleds
  Running setup.py install for cubieleds
    building 'Cubie.leds' extension
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c src/py_leds.c -o build/temp.linux-armv7l-2.7/src/py_leds.o
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c src/leds.c -o build/temp.linux-armv7l-2.7/src/leds.o
    gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-z,relro build/temp.linux-armv7l-2.7/src/py_leds.o build/temp.linux-armv7l-2.7/src/leds.o -o build/lib.linux-armv7l-2.7/Cubie/leds.so
Successfully installed cubieleds
Cleaning up...

You need to follow the procedure here to activate the leds.

NOTE: You will need to most likely rebuild a kernel to get the leds supported. Right now I don't have a compatible linux server at home to build the cross-compilation environment and I'm not willing to trash the intel systems I have running Windows (believe it or not) as I have some specialized video editing and encoding software on them.

I will have to put led control on hold for now.

2013-07-16-001-low.jpg

I built one cable last night and am now building an interim cable for node07. I am re-using some of the cut-off ends of micro-a cables I modified for the "Tower of Pi" project to solder on the replacement power connector. These cables I am building mimic the original supplied cable with a USB A on the other end as I am not ready to cut the USB-A end yet.

2013-07-15

I mounted (4) new systems today. George is undoubtedly the fastest shipper I have ever seen.

I checked the post office today and the (4) new units were already here.

I am building images for the micro sd cards now and will work on integrating all 4 units into the cluster today.

2013-07-15-001-low.jpg

Updated: I made a new cable for node08 tonight. I will tackle the other intermittent cable on node07 Tuesday. I'll dress up all the loose cables when things have burned in a bit. Right now I'll just let it keep running John The Ripper (MPI version) for a few days …

Updated: There are now (12) systems operational. It looks like one of the 4 new power cables is also intermittent, though. The "node08" system is the one with the intermittent cable. It has died 3-4 times in 2 hours now. However, I received those new power connectors today from Radio Shack so I will work on fabricating some replacement cables shortly. The part number is 274-1532 to probably repeat myself.

Here are some examples of what is running right now.

mpi@master:/master/tmp/john/john-1.7.9-jumbo-7/run$ sh cluster-restore.txt
+ mpirun -f /master/mpi_tests/machinefile -n 11 ./john --restore=mpi
Loaded 1 password hash (sha512crypt [32/32])
MPI: each node loaded 1/11 of wordfile to memory (about 605 KB/node)

[snip]
mpi@master:/master/tmp/john/john-1.7.9-jumbo-7/run$ sh cluster-status.txt
+ mpirun -n 11 -f /master/mpi_tests/machinefile ./john --status=mpi
  0: guesses: 0 time: 0:02:20:00 0.00% (2) c/s: 10.37
  5: guesses: 0 time: 0:02:20:00 0.00% (2) c/s: 10.29
 10: guesses: 0 time: 0:02:20:00 0.00% (2) c/s: 10.38
  1: guesses: 0 time: 0:02:20:00 0.00% (2) c/s: 10.40
  6: guesses: 0 time: 0:02:20:00 0.00% (2) c/s: 10.38
  3: guesses: 0 time: 0:02:20:00 0.00% (2) c/s: 10.38
  8: guesses: 0 time: 0:02:20:00 0.00% (2) c/s: 10.36
  4: guesses: 0 time: 0:02:20:00 0.00% (2) c/s: 10.27
  2: guesses: 0 time: 0:02:20:00 0.00% (2) c/s: 10.33
  9: guesses: 0 time: 0:02:20:00 0.00% (2) c/s: 10.36
  7: guesses: 0 time: 0:01:50:00 0.00% (2) c/s: 10.37
SUM: guesses: 0 time: 0:02:20:00 0.00% (2) c/s: 113 avg10.35
+ mpirun -np 11 -f /master/mpi_tests/machinefile /master/mpi_tests/system
node06 20:44:46 up 2 days, 1:58, 0 users, load average: 1.02, 1.02, 1.05
node09 20:44:46 up 4:29, 0 users, load average: 1.08, 1.04, 1.05
node11 20:44:46 up 2:41, 0 users, load average: 1.04, 1.04, 1.05
node02 20:44:46 up 2 days, 1:58, 0 users, load average: 1.05, 1.04, 1.05
node01 20:44:47 up 2 days, 1:58, 0 users, load average: 1.06, 1.04, 1.05
node07 20:44:47 up 5:15, 0 users, load average: 1.02, 1.02, 1.05
node08 20:44:47 up 34 min, 0 users, load average: 1.16, 1.09, 0.94
node05 20:44:47 up 2 days, 1:58, 0 users, load average: 1.02, 1.03, 1.05
node10 20:44:47 up 3:28, 1 user, load average: 1.04, 1.05, 1.07
node03 20:44:47 up 2 days, 1:58, 0 users, load average: 1.01, 1.04, 1.05
node04 20:44:47 up 2 days, 1:58, 0 users, load average: 1.18, 1.07, 1.06
+ sleep 300
^C
mpi@master:/master/tmp/john/john-1.7.9-jumbo-7/run$

2013-07-14

I am building the mpptest suite for the Cubical Monolith project system.

See http://www.mcs.anl.gov/research/projects/mpi/mpptest/ for detals

Updated: I built it but there seem to be some incompatibilities with my version of mpich-2 implementation. I haven't been using openmpi and I think it's probably time I rebuilt everything to use that since that seems to be the most commonly adopted version nowadays.

2013-07-13

I finally mounted the second stack of cpus Saturday.

2013-07-10

I ordered (4) more units today which will bring the total number of cpus up to 12. I can't wait to add more units!! I also ordered more nuts and bolts, all #4-40

2013-07-09

I mounted (and tested) the second ethernet switch today. This means I now have 32-ports ready but the rub is I wanted 32 compute nodes plus one master plus an uplink so I might have to expand to 48-ports just to get 34-ports. The chassis and mounting will certainly accomodate another switch but it's very annoying to add a 16-port switch just to get (2) more ports.

2013-07-09-002-low.jpg

2013-07-08

The cubie board's headphone connector is a lot more fragile than it looks . Today the one on the master node just popped right off the circuit board as I was inserting the audio cable.

I don't know why, I haven't abused it.

2013-07-08-002-low.jpg 2013-07-08-005-low.jpg

I did plug and unplug a set of powered external speakers a few times though so maybe there is an issue with how it's soldered as the connector seems sturdy enough… I've had this "master node" the longest it was one of the original pair I purchased.

I'm guessing you shouldn't touch it much. I am going to have to replace the master node now as I need the audio capability and I'm not sure I am good enough with a soldering iron to get that connector back on safely.

Other than the headphone connector falling off the master system seems fine and everything appears to be functioning normally.

I have been making custom network cables to help clean up the mess of wiring for the first (4) nodes.

Updated: 2051MT

I have devised a repair procedure involving a soldering vise, a small c-clamp and of course my soldering iron and some quality solder. I have a magnifying visor or I can use my magnifying bench lamp (swivel with articulated boom) though I am leaning towards the visor for now. I am also studying to see what a small application of epoxy glue underneath the connector body itself will do as I re-solder those tabs. I'm hoping the glue would take the stress off the soldered tabs this time when it bonds the body of the connector to the pcb. That's the theory any way.

Needless to say it will happen when I am ready to do it which may not be that soon as the system is busy doing things. I keep the Monolith plugged into a small UPS in case any one wondered so it's had great uptime for the most part. The problem child is node05 with it's intermittent power cable. It went down twice in the last 12 hours. Usually from me bumping the cable as I was working on something else near there.

2013-07-05

I'm having trouble with 3 of 4 power cables shipped with the last set of boards. They are intermittent, one was intermittent if flexed below the molded power connector and two others were intermittent near the usb end. My theory is that they're bent too sharply (in the packaging) for the cable to survive at least on the last shipment received. This is only a minor nuisance in the long term as I located a replacement connector here: https://www.radioshack.com/product/index.jsp?productId=2102598 4.0x1.7mm Coaxial DC Power Plug (2-Pack)

cubie_power_connector.jpg
I hope to get some soon and make my own power cables so I can resolve this issue. I needed to get a bunch of these any way just to have as spares. When I can get all 4 nodes to stay powered up (right now it looks like cutting off the last few inches below the usb connector fixed 2 of them. (I wired them directly to the +5vdc distribution panel with all (4) systems sharing a 5A fused line so I don't need those USB connector-ends any way) but the one with the intermittent near the power connector is the immediate problem as it's dropping out whenever you touch the power cable near the molded power connector end. It seems so close to the molded part that I am not sure I can repair that. I'm not going to try — I'll just try to get another power cable asap and order some raw connectors as well.

Updated: 2219MT I ordered (3) sets of 2 connectors each from the above link, with shipping it was US $18.44 … I have plenty of spare cable and wire and a more than a few soldering irons so I'll probably just start making custom wiring harnesses eventually any way.

I hope that I've gotten that down to just one of the new nodes with a intermittent power cable. The next 24-hours will tell and then maybe some time next week I will get the new connectors to fabricate replacement power cables.

+ mpirun -n 7 -f /master/mpi_tests/machinefile ./john --status=mpi
  1: guesses: 0 time: 0:19:00:41 0.00% (2) c/s: 10.35
  0: guesses: 0 time: 0:19:00:41 0.00% (2) c/s: 10.37
  2: guesses: 0 time: 0:19:00:42 0.00% (2) c/s: 10.35
  5: guesses: 0 time: 0:19:00:41 0.00% (2) c/s: 10.34
  6: guesses: 0 time: 0:14:30:43 0.00% (2) c/s: 10.27
  4: guesses: 0 time: 0:16:40:42 0.00% (2) c/s: 10.37
  3: guesses: 0 time: 0:10:00:44 0.00% (2) c/s: 10.36
SUM: guesses: 0 time: 0:19:00:42 0.00% (2) c/s: 72.45 avg10.35
+ mpirun -np 7 -f /master/mpi_tests/machinefile /master/mpi_tests/system
Fri Jul 5 22:30:42 MDT 2013 node03 22:30:42 up 6 days, 4:50, 0 users, load average: 1.04, 1.04, 1.01
Fri Jul 5 22:30:42 MDT 2013 node06 22:30:42 up 1:25, 0 users, load average: 1.02, 1.04, 1.05
Fri Jul 5 22:30:42 MDT 2013 node04 22:30:42 up 1:28, 1 user, load average: 1.00, 1.03, 1.05
Fri Jul 5 22:30:42 MDT 2013 node07 22:30:42 up 1:26, 1 user, load average: 1.02, 1.02, 1.01
Fri Jul 5 22:30:42 MDT 2013 node01 22:30:42 up 6 days, 4:50, 0 users, load average: 1.01, 1.02, 1.02
Fri Jul 5 22:30:42 MDT 2013 node02 22:30:42 up 6 days, 4:50, 0 users, load average: 1.05, 1.03, 1.00
Fri Jul 5 22:30:42 MDT 2013 node05 22:30:42 up 1:21, 1 user, load average: 1.00, 1.02, 1.01
+ sleep 300

You can also find the power connectors here: http://www.mcmelectronics.com/product/27-4345&scode=GS401&CAWELAID=220317738

cubie_power_connector_alt.jpg

22HP062 4.0 x 1.75mm DC Barrel Connector
Manufacturer’s Part Number: 22HP062

2013-07-04: Independence Day

I received (4) new units a few days ago from George who once again proves to be one of the fastest persons to ship stuff that I've ever seen. My kudos and Thanks again to George Ioakimedes at IO Technologies, LLC http://iotllc.com/ Their web store is https://store.iotllc.com/

I finally got a chance to work on this again so I am starting to build the sd cards for the new systems… I ran out of 4gb microsd cards so I have to use 8gb cards until I can get more of the smaller size.

Once I transfer the image to the new microsd card I bring them up and change network settings for the "compute node" image I use. I attach an ethernet cable to the internal switches, powering each from a usb hub instead of wiring it in yet.

As I bring them up I have a "console" server with wifi (It is an RPI in one of those clear adafruit cases with a wifi-dongle and a USB ttl serial adapter.) I test each new node mpich2 software (including python and fortran and soon to be mpi-enabled perl). I sort of burn in the hardware for a few days and then last I will mount (4) nodes at one time.

Right now the preliminary design is to stack 4 nodes and then mount them to an acrylic sheet so they are vertical to let the hot air rise "through" (between) them using basic convection hopefully..

Once I build each one I will resize the file system to something that will fit onto a 2GB card hopefully and then make image backups (I'll have to calculate how many blocks to copy for a 2GB filesystem from the 8gb card since I don't want an 8gb copy of a 2gb filesystem do I?)

cube-stack.jpg
Example (the master and 3 computer nodes)

I've also run out of short yellow ethernet cables so I am going to make some today if I have time.

Updated: 1815MT

root@master:/master/tmp# resize2fs -M -f /dev/sdc2
resize2fs 1.42.5 (29-Jul-2012)
Resizing the filesystem on /dev/sdc2 to 283077 (4k) blocks.
...

Bleh Bleh I am creating a new compute node image I seem to have missed doing that the last time and I have (4) more nodes to bring up. I have node04 working fine and it's the basis for the new image. I'm just trying to resize it so I can still use it on a 2gb card.
I'll have to specify the proper amount of blocks to copy of course in order to get the filesystem copied correctly.

I'll get that done when I get a working image ready.

I had to remove all the openmpi stuff from this image it was causing issues when building mpi4py using pip. I also note that you have to manually nuke the mpi4py module in python if you happen to remove or reinstall openmpi as you will have issues later if you don't. Right now I'm just using mpich2 with everything and NOT openmpi. It just seems broken and I can't get things working right. I ran into this on the RPI project too.

I also made (5) new short yellow ethernet cables (around 2 feet each) without wasting too many connectors. The last 3 cables were perfect the first time, the first two had to be redone twice each as I guess I didn't make the exposed wires long enough or something so I over did it and then trimmed back so that I could be sure the wire ends hit the limit inside the connector before I crimped it. Some day I will get a whiz bang tester thing but not today, hah.

Updated: 2305MT

I have all (4) new systems up and working in the cluster.

Thu Jul 4 22:54:41 MDT 2013 node02 22:54:41 up 5 days, 5:14, 0 users, load average: 1.20, 1.20, 1.14
+ mpirun -np 7 -f ./machinefile ./cpi
Process 5 of 7 is on node05
Process 7 of 7 is on node07
Process 3 of 7 is on node03
Process 6 of 7 is on node06
Process 4 of 7 is on node04
Process 1 of 7 is on node01
Process 2 of 7 is on node02
pi is approximately 3.1415926544231239, Error is 0.0000000008333307
wall clock time = 0.010650
+ mpirun -f machinefile python helloworld.py
Hello, World! I am process 4 of 7 on node04.
Hello, World! I am process 7 of 7 on node07.
Hello, World! I am process 6 of 7 on node06.
Hello, World! I am process 5 of 7 on node05.
Hello, World! I am process 2 of 7 on node02.
Hello, World! I am process 1 of 7 on node01.
Hello, World! I am process 3 of 7 on node03.
+ set +x
+ mpirun -np 7 -f ./machinefile ./system
Thu Jul 4 22:54:45 MDT 2013 node06 22:54:45 up 48 min, 0 users, load average: 0.07, 0.10, 0.07
Thu Jul 4 22:54:45 MDT 2013 node04 22:54:45 up 2:10, 0 users, load average: 1.17, 1.15, 1.09
Thu Jul 4 22:54:45 MDT 2013 node05 22:54:45 up 1:48, 1 user, load average: 0.26, 0.23, 0.14
Thu Jul 4 22:54:45 MDT 2013 node03 22:54:45 up 5 days, 5:14, 0 users, load average: 1.14, 1.20, 1.13
Thu Jul 4 22:54:45 MDT 2013 node01 22:54:45 up 5 days, 5:14, 0 users, load average: 1.62, 1.31, 1.16
Thu Jul 4 22:54:45 MDT 2013 node07 22:54:45 up 12 min, 0 users, load average: 0.23, 0.16, 0.09
Thu Jul 4 22:54:45 MDT 2013 node02 22:54:45 up 5 days, 5:14, 0 users, load average: 1.26, 1.22, 1.14

That's just a 'stress test' script to exercise all the nodes.

Below is the output from running "John The Ripper" which directly supports mpi of course.

mpi@master:/master/tmp/john/john-1.7.9-jumbo-7/run$ sh cluster-status.txt
+ mpirun -n 7 -f /master/mpi_tests/machinefile ./john --status=mpi
  0: guesses: 0 time: 0:00:00:43 0.00% (2) c/s: 9.95
  1: guesses: 0 time: 0:00:00:43 0.00% (2) c/s: 9.86
  3: guesses: 0 time: 0:00:00:42 0.00% (2) c/s: 9.95
  2: guesses: 0 time: 0:00:00:43 0.00% (2) c/s: 9.90
  5: guesses: 0 time: 0:00:00:44 0.00% (2) c/s: 9.56
  4: guesses: 0 time: 0:00:00:43 0.00% (2) c/s: 9.69
  6: guesses: 0 time: 0:00:00:45 0.00% (2) c/s: 9.46
SUM: guesses: 0 time: 0:00:00:45 0.00% (2) c/s: 68.83 avg9.83
+ mpirun -np 7 -f /master/mpi_tests/machinefile /master/mpi_tests/system
Thu Jul 4 23:03:28 MDT 2013 node03 23:03:28 up 5 days, 5:23, 0 users, load average: 1.05, 1.02, 1.07
Thu Jul 4 23:03:28 MDT 2013 node07 23:03:28 up 21 min, 0 users, load average: 1.07, 0.51, 0.23
Thu Jul 4 23:03:28 MDT 2013 node01 23:03:28 up 5 days, 5:23, 0 users, load average: 0.99, 1.01, 1.08
Thu Jul 4 23:03:28 MDT 2013 node02 23:03:28 up 5 days, 5:23, 0 users, load average: 1.17, 1.04, 1.08
Thu Jul 4 23:03:28 MDT 2013 node05 23:03:28 up 1:57, 1 user, load average: 0.95, 0.48, 0.26
Thu Jul 4 23:03:28 MDT 2013 node04 23:03:28 up 2:18, 0 users, load average: 1.07, 1.01, 1.05
Thu Jul 4 23:03:28 MDT 2013 node06 23:03:28 up 56 min, 0 users, load average: 1.01, 0.51, 0.23
+ sleep 300
^C
mpi@master:/master/tmp/john/john-1.7.9-jumbo-7/run$
2013-07-04-001-low.jpg
The next 4 nodes…