Cubical Monolith 2014 08

2014-08-24

I have all 33 nodes up and powered from (1) of the 12V-24V to 5V converters.

The entire cluster is drawing about 12A on the 5V side (during high cpu activity.)

Earlier I quoted it as 9A but I was using an older measurement without all of the wifi adapters connected I realized.

I was able to sustain over 6-hours of uptime before I have to shut down the cluster at 12.2V on the battery, but after adding the rest of the wifi adapters current demand has increased. It looks like I might be able to keep it operating for about 3 hours (right now it's been up 2.5 hours and the battery voltage is down to 12.4V.)

I have been disconnecting the battery and charging it with a standalone charger.. The cooling fins on the dc-dc converter are barely warm at 12A.

I measured the current earlier without the wifi adapters installed and ended up with the wrong values.

I also made measurements during heavy compute activity and they were higher of course. The above 11.5A was measured during stress tests involving calculating pi and running some mpi4py code but when the cluster is idle it reads about 500ma lower (makes sense.) Updated: I am seeing 12.1A now during the same cluster stress test.

2014-08-24-001-low.jpg

Those green leds indicate disk in use, not booted into Android as the image I am using makes use of the green led to indicate disk i/o activity.

I had some trouble with cfengine today, I had to reset the host key for the master server on all the nodes for some very odd reason. It was easy since I have ssh enabled with no passwords so it is trivial to issue ssh host . "sudo rm /var/lib/cfengine2/ppkeys/root-192.168.1.221.pub" to every node. In this example, 192.168.1.221 is the IP address of the CFEngine2 server.

I don't know why that changed.

I also added a few more files to be managed under cfengine, and changed /etc/fstab NFS entry on the master copy to use "_netdev" on the nfs mount arguments because networking didn't come up fast enough before we brought up nfs client mounts.

Until I can get the right dc power converter/charger for the Cubicle Monolith I have been shutting the cluster down when the battery reaches 12.2V and then recharging it elsewhere on the bench. I am going to order a 35A model that will fit in the base next to the 50AH battery and call all that "completed" as 35A will be sufficient to power the cluster and charge the battery at the same time (if needed.)

I plan to use a product from the same company that I used in another project to keep a large battery bank ready for use at all times (you can see this here.) I have a 75A model already and the 35A model is the same physical size as that one.

They both should fit into the 12"x12"x10" base according to preliminary measurements but I will definitely have to make sure there is adequate ventilation.

I expect that a hole cut into the side of the box for a muffin fan that is attached to the power converter/charger and some vents cut into the top of the base to vent any possible hydrogen from charging the battery will be a minimum.

If I have to install additional fans I will but intend to build some temperature monitoring for the base as well as making sure I use a low-voltage cut-off device as well as some small volt meter that is visible from outside the base.

I believe when I am finished with this power system I can leave the cluster up 24/7 but only if I have a useful application running of course.

I ran into two issues of corruption on micro sd cards last night and had to reload the operating system on two nodes.

It was only a minimal amount of effort once I had cfengine installed on them and used them to fine-tune the rules to turn the standard image into a "Cubical Monolith Compute Node".

I hadn't seen this behaviour on micro-sd cards but it's like the filesystem wasn't applying SOME of the writes. Repeated fsck operations didn't not reveal any issues other than dirty journals every time you checked, even if you did a proper filesystem shut down.
i.e.

root@master-dev:~# e2fsck /dev/sdc2
e2fsck 1.42.5 (29-Jul-2012)
/dev/sdc2: recovering journal
Superblock needs_recovery flag is clear, but journal has data.
Run journal anyway<y>?
e2fsck: unable to set superblock flags on /dev/sdc2
/dev/sdc2: ********** WARNING: Filesystem still has errors **********
root@master-dev:~#

I had to reset the CFEngine server's keys for those rebuilt nodes of course.

One of the fun things about this cluster is that I am always looking for the best way to make use of the tools from the parallel cluster. Here is the best way to shut down the cluster quickly and safely.

mpi@master:/master/mpi_tests$ mpirun -np 32 -f /master/mpi_tests/machinefile sudo poweroff
shutdown: already running.
shutdown: already running.
shutdown: already running.
shutdown: already running.
mpi@master:/master/mpi_tests$

In the past I just ran a shell script that used ssh to do this, it took a very long time to run and shut down all 32-nodes, serially.

By using mpirun it happened all at once.

Yeah. I know it's obvious. I still think it was cool.

Since I use mpich2 framework I tested using "slots" in the hostfile for the first time recently and python works perfectly with that allocating (2) processes per node of any process you execute. I need to start writing more python code but I don't really have a lot of time nowadays.

node01:2
node02:2
node03:2
node04:2
...
node30:2
node31:2
node32:2

With OpenMPI the format to use "slots" is "hostname slots=X" but I don't use OpenMPI now.

2014-08-23

I received the remainder of the Airlink-101 wifi adapters so I brought up all 33-nodes. I worked late on it and before I was finished everything was up despite the loss of node01 and node02 (micro-sd card corruption.)

Now that everything is again wireless (I originally tried this on the Tower of Pi project but RPI usb is lame and falls down under any significant network traffic.) I should most-likely locate an access point that allows the wireless cluster to function as a stand-alone independent entity.

I have been looking at Airlink access points/routers that are solely powered by 5VDC.

2014-08-22

I received 8 of the Airlink-101 wifi adapters so I brought up 8-more nodes and wired up the first and second tier of systems. Right now I am fully powered from the 12VDC battery using the +5VDC converter. I am waiting on the other 14 wifi adapters and the 12"x12"x10" wooden base. I checked at the 12V side and it's only drawing 3.5A and on the 5V side it is 9.0A. That is cool. That means this 50AH battery will last a long time and I was running some cpu stress tests on 19 nodes at the time. I suspect the current draw will increase to over 10A when the rest are meshed and calculating something or other.

2014-08-22-004-low.jpg 2014-08-22-002-small.jpg

2014-08-19

I received the DC-DC convertors today.

2014-07-08-19-017-small.jpg

2014-08-16

Since I am waiting on parts mostly, I spent the time getting cfengine running on the cluster so that I can bootstrap a compute node easily. I will spend the next few days tuning all the requisite changes and packages that should be installed. I should have done this a long time ago. Basically all I have to do is install cfengine2 on the compute node and run 'cfagent -q' to build a new compute node.

This is the node.conf I use for the cluster nodes.

# Cubical Monolith Node
#
packages:
  locate          action=install
  dnsutils          action=install
  mpich2          action=install
  libmpich2-dev   action=install
  libmpich2-3     action=install
  python-pip      action=install elsedefine=install_mpi4py

#
copy:
  /master/cfengine/etc/hosts dest=/etc/hosts
       owner=root group=root mode=0644
       server=$(cfsrvhost) type=checksum

  /master/cfengine/etc/sudoers dest=/etc/sudoers
        owner=root group=root mode=0440
        server=$(cfsrvhost) type=checksum

  /master/cfengine/etc/crontab dest=/etc/crontab
        owner=root group=root mode=0644
        server=$(cfsrvhost) type=checksum

shellcommands:
  # install mpi4py once pip is installed.
  install_mpi4py::
    "/usr/bin/pip install mpi4py"
# end

I also have arranged with a local carpenter who is building a base for the Cubical Monolith. I've decided for esthetic reasons to mount everything on top of a 12"x12"x10" wooden box. I will finish the box in some way that doesn't detract from the design. I may also be cutting holes in the base for a fan and cables.

cubical-base.jpg

I am hoping to mount the rack on top of some heavy aluminum plate then mount the plate on top of the box. I will drill holes for the power cables to feed into the base so that the top rack will remain comparatively uncluttered. I will know more when I actually get the box delivered. I ordered 4 pieces of 3003-H14 Aluminum Sheet 1/8" x 12" x 12" today as well.

2014-08-12

I am using one of the +5V power supplies I ordered from Ebay (used) but in sterling condition, I have tested it with all 33-nodes and eveything seemed fine. I was able to adjust it to 5.5 with load on it as well. I have a spare since I ordered (2).

You can see the power supply sitting at the bottom of the cubical monolith. This is only a temporary solution until I can get the 12V-5V DC-DC converters.

2014-07-08-12-006-low.jpg

I ordered 22 more Airlink-101 N-150 WIFI adapters so I can eliminate the ethernet switches. If I can ever find some 16-port switches that work on +5VDC only I may reconsider since the latency is pretty high with 33 of these vying for bandwidth.

I haven't connected the other two tiers of systems to the power supply yet as I do not have a way to access them until I can get the wireless adapters. I'd rather not even power them up until then.

I noted yesterday that the serial display unit had stopped functioning so I am going to toss that out of the design as well. Other than occasionally needing to read/write micro sd cards I don't really need the USB hub either. I was thinking about eliminating that as well as I can read and write micro sd cards on other systems.

Basically you are seeing a redesign that has at it's goal the ability to power everything from a 12VDC deep cycle storage battery with everything inside running from 5VDC using dc-dc converters.

It was just too ugly and cluttered. I might install a dedicated access point just to keep this network traffic off my main network here and also make it independent if I have to use it in a location that has no wifi or allows no wifi access. I have found several pieces of equipment that operate solely from 5VDC.

2014-08-01

I ordered the 12V-5V converters(4) so I will use (1) per rack of units and have a spare.

I also ordered (2) used 30A +5VDC supply from Ebay for $43.00 including shipping. I will most likely use one of them to temporarily power the Cubical Monolith.

I am thinking I will have to incorporate a storage battery into the design so that I can shift power sources without having to shut down.

This will act as a "battery backup" indeed and will offer a lot of flexibility about charging the built-in battery while the Cubical Monolith is powered up .

1. You can use smaller charge controllers attached to some input source (16-24VDC unregulated can accept very dirty and unregulated DC)
2. You can use a "power converter/charger" from a salvaged RV that is plugged into the wall socket. These were powered by "shore power" as they parked and plugged into 30A or 60A circuits or the built-RV generator. This is how the "coach batteries" are charged not from the vehicle's alternator. They work very well with generators. They are also designed to charge batteries while also providing full power output. They typically are 12VDC units but you can find 24VDC if you look. Note: You can get them new on EBay for (12VDC@55A) $140
3. You could just use a stand alone 120VAC-powered (5VDC @30A) regulated supply (This is if you wish to eliminate the battery.) I will use this temporarily until I get a battery working reliably as I don't want the Cubical Monolith to be down any longer and I will need to test the dc-dc converters soon.
4. You can also power/charge it from a vehicle using jumper cables.

cubical-monolith-dc-power.jpg

This will be a 12V battery of course. I will most likely choose something in 35AH range.

Then I will use the large Andersen power-pole connectors and cable to whatever high-current 12VDC supply I am using.

Running the Cubical Monolith full time from a battery will provide the best stability but I will have to make sure to keep the internal battery charged.

I located a source of 12VDC @55A "converter/chargers" that were designed to work with 120VAC "shore power" or generator power input with a very tolerant input range. The charger will produce up to 55A and should handle the load plus charge the internal battery at all times.

It will remain plugged into an AC source when I am using the system but I will most likely disconnect it when the monolith is powered down. I will use some sort of circuit breaker to isolate the battery from the system as well.