Cubical Monolith 2013 09

2013-09-16

I am musing using a board like this "SainSmart USB Eight Channel Relay Board for Automation" for software power control but the feature of how it 'resets' each output every time USB is initialized from the host is unaccceptable.

SainSmart-USB-Eight.jpg

I might just build some relay board and use manual switches or adapt one like this for manual use. I don't like hitting the power system or the power supply with 20-30A at power up so I need to sequence the systems on group by group. The Cubical Monolith's power supply will last longer.

2013-09-15

I have a lot of ethernet cables to shorten it seems. I also have to shorten the power harnesses and dress all that up neatly.

2013-09-15-005-low.jpg

2013-09-11

I have a clamp-on ammeter attached to the +5VDC power supply and I'm seeing 19.6A to about 16.5A at power up, as each system is booting. When the systems are somewhat idle, the current draw is 16.4A.

When the cluster is under load (cpu busy calculating something) the current draw rises to 25.7A!!

I can't measure the AC current right now I don't have an adapter or an exposed wire to do so.

I need to fabricate some sort of adapter cable that can handle about 30A just for safety margins.

2013-09-09

2013-09-09-003-low.jpg

I still have quite a lot of details left to clean up as you can see.

2013-09-09-006-low.jpg

2013-09-08

0940MT Here are some numbers so far from the 32-node configuration.

+ mpirun -n 32 -f /master/mpi_tests/machinefile ./john --status=root
 11: guesses: 0 time: 0:11:30:03 0.00% (3) c/s: 2330
  8: guesses: 0 time: 0:11:35:38 0.00% (3) c/s: 2283
 12: guesses: 0 time: 0:11:35:39 0.00% (3) c/s: 2321
  2: guesses: 0 time: 0:11:35:40 0.00% (3) c/s: 2274
 16: guesses: 0 time: 0:11:35:40 0.00% (3) c/s: 2310
 15: guesses: 0 time: 0:11:35:38 0.00% (3) c/s: 2274
 24: guesses: 0 time: 0:11:35:40 0.00% (3) c/s: 2323
  5: guesses: 0 time: 0:11:35:38 0.00% (3) c/s: 2330
  9: guesses: 0 time: 0:11:35:39 0.00% (3) c/s: 2322
  0: guesses: 0 time: 0:11:35:37 0.00% (3) c/s: 2277
  3: guesses: 0 time: 0:11:35:38 0.00% (3) c/s: 2368
 14: guesses: 0 time: 0:11:35:40 0.00% (3) c/s: 2299
 18: guesses: 0 time: 0:11:35:40 0.00% (3) c/s: 2351
  6: guesses: 0 time: 0:11:35:41 0.00% (3) c/s: 2345
 19: guesses: 0 time: 0:11:35:37 0.00% (3) c/s: 2323
 17: guesses: 0 time: 0:11:35:37 0.00% (3) c/s: 2314
 13: guesses: 0 time: 0:11:35:42 0.00% (3) c/s: 2238
 10: guesses: 0 time: 0:11:35:39 0.00% (3) c/s: 2327
 22: guesses: 0 time: 0:11:35:40 0.00% (3) c/s: 2325
  7: guesses: 0 time: 0:11:35:37 0.00% (3) c/s: 2220
  4: guesses: 0 time: 0:11:30:03 0.00% (3) c/s: 2351
 21: guesses: 0 time: 0:11:35:36 0.00% (3) c/s: 2306
 29: guesses: 0 time: 0:11:35:39 0.00% (3) c/s: 2225
 20: guesses: 0 time: 0:11:35:40 0.00% (3) c/s: 2322
 30: guesses: 0 time: 0:11:35:40 0.00% (3) c/s: 2319
  1: guesses: 0 time: 0:11:35:37 0.00% (3) c/s: 2342
 23: guesses: 0 time: 0:11:35:39 0.00% (3) c/s: 2359
 27: guesses: 0 time: 0:11:35:38 0.00% (3) c/s: 2275
 31: guesses: 0 time: 0:11:35:38 0.00% (3) c/s: 2291
 26: guesses: 0 time: 0:11:35:40 0.00% (3) c/s: 2352
 28: guesses: 0 time: 0:11:35:38 0.00% (3) c/s: 2328
 25: guesses: 0 time: 0:11:35:40 0.00% (3) c/s: 2295
SUM: guesses: 0 time: 0:11:35:42 0.00% (3) c/s: 73935 avg 2310
+ date
Sun Sep  8 09:48:54 MDT 2013

2013-09-07

2224MT I am having trouble with all the power cables again, I've got 2-3 more becoming intermittent. The cables on node05, node01 and node14 are now intermittent.

mpi@master:/master/tmp/john-hf/john-1.7.9-jumbo-7/run$ sh cluster-status.txt
+ mpirun -n 32 -f /master/mpi_tests/machinefile ./john --status=root
 12: guesses: 0 time: 0:00:05:41 0.00% (3) c/s: 2256
  2: guesses: 0 time: 0:00:05:42 0.00% (3) c/s: 2256
  4: guesses: 0 time: 0:00:05:42 0.00% (3) c/s: 2246
  1: guesses: 0 time: 0:00:05:38 0.00% (3) c/s: 2267
  6: guesses: 0 time: 0:00:05:43 0.00% (3) c/s: 2257
 16: guesses: 0 time: 0:00:05:41 0.00% (3) c/s: 2259
 21: guesses: 0 time: 0:00:05:38 0.00% (3) c/s: 2255
 15: guesses: 0 time: 0:00:05:40 0.00% (3) c/s: 2253
 11: guesses: 0 time: 0:00:05:44 0.00% (3) c/s: 2243
  7: guesses: 0 time: 0:00:05:39 0.00% (3) c/s: 2259
 13: guesses: 0 time: 0:00:05:44 0.00% (3) c/s: 2238
  5: guesses: 0 time: 0:00:05:40 0.00% (3) c/s: 2253
 10: guesses: 0 time: 0:00:05:40 0.00% (3) c/s: 2259
  3: guesses: 0 time: 0:00:05:40 0.00% (3) c/s: 2259
 19: guesses: 0 time: 0:00:05:39 0.00% (3) c/s: 2253
 17: guesses: 0 time: 0:00:05:39 0.00% (3) c/s: 2252
 14: guesses: 0 time: 0:00:05:42 0.00% (3) c/s: 2254
  9: guesses: 0 time: 0:00:05:41 0.00% (3) c/s: 2251
  0: guesses: 0 time: 0:00:05:38 0.00% (3) c/s: 2267
  8: guesses: 0 time: 0:00:05:40 0.00% (3) c/s: 2255
 22: guesses: 0 time: 0:00:05:42 0.00% (3) c/s: 2260
 18: guesses: 0 time: 0:00:05:41 0.00% (3) c/s: 2262
 23: guesses: 0 time: 0:00:05:41 0.00% (3) c/s: 2254
 25: guesses: 0 time: 0:00:05:41 0.00% (3) c/s: 2256
 26: guesses: 0 time: 0:00:05:42 0.00% (3) c/s: 2256
 27: guesses: 0 time: 0:00:05:40 0.00% (3) c/s: 2259
 24: guesses: 0 time: 0:00:05:42 0.00% (3) c/s: 2252
 29: guesses: 0 time: 0:00:05:41 0.00% (3) c/s: 2252
 20: guesses: 0 time: 0:00:05:42 0.00% (3) c/s: 2255
 30: guesses: 0 time: 0:00:05:42 0.00% (3) c/s: 2251
 28: guesses: 0 time: 0:00:05:40 0.00% (3) c/s: 2257
 31: guesses: 0 time: 0:00:05:40 0.00% (3) c/s: 2259
SUM: guesses: 0 time: 0:00:05:44 0.00% (3) c/s: 72349 avg 2260
+ date
Sat Sep  7 22:24:12 MDT 2013
+ sleep 300

1941MT I have all 32-compute nodes working with MPI, and passing the stress test.

2013-09-07-006-cropped.jpg 2013-09-07-005-low.jpg
mpi@master:/master/mpi_tests$ ./stress
+ mpirun -np 32 -f /master/mpi_tests/machinefile /master/mpi_tests/cpi
Process 2 of 32 is on node02
Process 4 of 32 is on node04
Process 18 of 32 is on node18
Process 3 of 32 is on node03
Process 8 of 32 is on node08
Process 27 of 32 is on node28
Process 26 of 32 is on node26
Process 15 of 32 is on node15
Process 20 of 32 is on node20
Process 7 of 32 is on node07
Process 19 of 32 is on node19
Process 13 of 32 is on node13
Process 23 of 32 is on node23
Process 12 of 32 is on node12
Process 22 of 32 is on node22
Process 6 of 32 is on node06
Process 25 of 32 is on node25
Process 16 of 32 is on node16
Process 24 of 32 is on node24
Process 21 of 32 is on node21
Process 28 of 32 is on node29
Process 14 of 32 is on node14
Process 11 of 32 is on node11
Process 17 of 32 is on node17
Process 9 of 32 is on node09
Process 31 of 32 is on node32
Process 29 of 32 is on node30
Process 5 of 32 is on node05
Process 30 of 32 is on node31
Process 10 of 32 is on node10
Process 32 of 32 is on node01
Process 1 of 32 is on node01
pi is approximately 3.1415926544231265, Error is 0.0000000008333334
wall clock time = 0.019723

+ mpirun -f /master/mpi_tests/machinefile python /master/mpi_tests/helloworld.py
Hello, World! I am process 3 of 32 on node03.
Hello, World! I am process 13 of 32 on node13.
Hello, World! I am process 11 of 32 on node11.
Hello, World! I am process 14 of 32 on node14.
Hello, World! I am process 22 of 32 on node22.
Hello, World! I am process 24 of 32 on node24.
Hello, World! I am process 23 of 32 on node23.
Hello, World! I am process 1 of 32 on node01.
Hello, World! I am process 21 of 32 on node21.
Hello, World! I am process 12 of 32 on node12.
Hello, World! I am process 19 of 32 on node19.
Hello, World! I am process 26 of 32 on node26.
Hello, World! I am process 18 of 32 on node18.
Hello, World! I am process 20 of 32 on node20.
Hello, World! I am process 28 of 32 on node28.
Hello, World! I am process 9 of 32 on node09.
Hello, World! I am process 31 of 32 on node31.
Hello, World! I am process 27 of 32 on node27.
Hello, World! I am process 30 of 32 on node30.
Hello, World! I am process 32 of 32 on node32.
Hello, World! I am process 4 of 32 on node04.
Hello, World! I am process 5 of 32 on node05.
Hello, World! I am process 17 of 32 on node17.
Hello, World! I am process 16 of 32 on node16.
Hello, World! I am process 29 of 32 on node29.
Hello, World! I am process 15 of 32 on node15.
Hello, World! I am process 25 of 32 on node25.
Hello, World! I am process 2 of 32 on node02.
Hello, World! I am process 6 of 32 on node06.
Hello, World! I am process 7 of 32 on node07.
Hello, World! I am process 10 of 32 on node10.
Hello, World! I am process 8 of 32 on node08.

+ set +x
+ mpirun -np 32 -f /master/mpi_tests/machinefile /master/mpi_tests/cpi
^CCtrl-C caught... cleaning up processes

mpi@master:/master/mpi_tests$

1637MT I finally finished building 11 ethernet cables, as yet untested.

2013-09-06

1944MT I am finally starting to build the ethernet cables.

2013-09-06-003-small.jpg

0836MT I have to build (11) ethernet cables today, each about 16" in length as well. Last night after completing the power harness I powered up all 33 nodes and the power supply worked flawlessly.

I need to devise a power sequencing design to introduce a 2-3 second time delay for each system to power up, or at least each grouping of systems so the power supply sees a gradually-increased power load.

I need to measure the full load on the +5VDC supply to see what we have, I suspect it will be about 15A.

0027MT I have completed building the power harness and have all 33 nodes + 1 master node powered up. I need to make a bunch of ethernet cables shortly as well but I still haven't mounted the third tier of systems so this is all just sort of lashed together. I'll have to take some pictures later. If not for ethernet cables everything would be up and in production.

2013-09-05

2055MT I have finished loading all the images for 11 systems and configuring same for operation in the cluster. I have tested each new board and so far so good.

I'm building the power harness now, with 5-systems on one 5A 5VDC circuit and 6-systems on another 5A 5VDC circuit (fused values).

2013-09-05-005-low.jpg 2013-09-05-011-small.jpg

I also fabricated a right-angle USB-A to USB-B cable from two separate cables. I took great care with making sure the shielding is intact (both the wire braid and the silver foil re-wrapped around the splice) I want to be able to push some data without errors. It seems it went well as I have been using it all day burning at least 7 images with the home-made cable connecting the powered usb hub. I didn't want it to jut outward beyond the confines of the rack since I am going to place the master node into the stack.

2013-09-05-006-small.jpg

You can also see the right-angle SATA cable I am using.

0957MT I have finally started to build the third tier of systems, I've built node23 through node27 as of right now.

2013-09-05-003-small.jpg 2013-09-05-002-cropped.jpg