Notes on installing Rocks Cluster Software…..


– Front-end machine is a Dell 2950 with 2 x 1 GigE Broadcom ports onboard and one myricom 10GigE card.
– Broadcom port #2 is disabled in BIOS
– Broadcom port #1 is enabled and configured for external network (Internet)
– Myricom 10 GigE card is hooked into a Foundry 8x10GigE switch that uplinks into our private Class B network. We own a portion of this network aa.bb.cc.130-190. Netmask is 255.255.255.192. The Foundry also downlinks to a Force10 48port Gig switch (with optional 10GigE port). This is where the cluster compute nodes are connected at 1 GigE (soon to be bonded 4GigE).
– Rocks install on front-end machine likes to bringup broadcom #1 port (External) as eth0 and myri is not installed by default. So when prompted during install I configure it as per normal (eth0 for inside and eth1 for external) as far as IP addresses are concerned. This will keep the config files sane!!!
– Then I have to install the myri10g device driver from their site
– Now eth0 and 1 are backwards. Rocks wants eth0 to be private and eth1 to be public. To swap them we have to tell the kernel to swap the devices via udev rules. Edit /etc/udev/rules.d/11-local.rules and insert the following line inside:
KERNEL=="eth*",SYSFS{address}=="00:60:dd:47:75:a6",NAME="eth0"
This will force nic with mac address 00:60:dd:47:75:a6 to come up as eth0. Now we also have to change ifcfg-eth0 and ifcfg-eth1 files in /etc/sysconfig/network-scripts to make sure the right IP goes with the right interface/MAC address.
– Lastly we have to add “modprobe myri10ge” and “route add -net aa.bb.cc.0/26 gw aa.bb.cc.129 dev eth0” to /etc/rc.d/rc.local to shoehorn the driver and the route.

This should bring up a sane frontend machine.

– Before doing insert-ethers on the frontend, we have to edit /opt/rocks/lib/python2.4/site-packages/rocks/commands/sync/dns/plugin_dns.py since we have a portion of a larger subnet as our private address space. The Python file assumes a private class C address/mask which is not the case for me. We have to make the small change to make the file look like this (Thanks to Scott Hamilton for his post):

def reverseIP(self, addr, mask):
"Reverses the elements of a dot-decimal address."

if type(addr) != types.ListType:
addr = string.split(addr,".")

addr.reverse()

clip = mask/8
if (mask % 8):
clip += 1
# I added this section to fix a bug that breaks the dns configuration when
# isntalling on subnets smaller than 255.255.255.0
if (clip == 4):
clip = 3
# Only show the host portion of the address.
addr = addr[:-clip]

reversed = addr[0]
for i in addr[1:]:
reversed = "%s.%s" % (reversed, i)

return reversed

– This gets insert-ethers going but there is still the problem of being able to tell the program that you don’t want to start at 190 (which is the end of my address space) and count down whenever there is a new compute node online. I want to start at 180 and count down (180-190 space I want to reserve for admin stuff for the Xserve raids). So the command to issue is:
insert-ethers --baseip=aa.bb.cc.180
– Now I can power the computer nodes which have four interfaces (2 broadcom onboard plus 2 extra intel gige cards) each, making sure that broadcom 1 port is hooked up to the switch on all the machines. This is default PXE port for the Dell 1950 III’s. If everything is groovy insert-ethers will detect the machine and hand it aa.bb.cc.179 as IP address.
– At this point once the install is done on compute-0-0 (first machine you turn on) you can check /etc/dhcpd.conf on the frontend and notice that all the interface instances have the same IP. This is something we have to change once we bond the interfaces (maybe not….not sure yet).
– If during insert-ethers on frontend something screws up you can get a listing using “rocks list host” or “rocks list host interface”. Once you find the offending node you can “rocks remove compute-0-0” for example, followed by “rocks sync config” and “rocks sync dns”.
– I initially ran into a problem where Ganglia would not update the nodes info. This I think was caused because ganglia uses multicast to pass info between clients (compute nodes) and server (frontend machine). I changed the /etc/gmond.conf file on the compute nodes to be as follows (only portion shown here):

/* UDP Channels for Send and Recv */

udp_recv_channel {
port = 8649
}

udp_send_channel {
host = aa.bb.cc.130
port = 8649
}

This way the listening portion of ganglia can communicate with itself on port 8649 on each of the compute nodes and the collected stats can then be sent to aa.bb.cc.130 which is my frontend machine. Similarly on the frontend machine I modified /etc/gmond.conf to look like:

/* UDP Channels for Send and Recv */

udp_recv_channel {
/* mcast_join = 236.149.78.5 */
port = 8649
}

udp_send_channel {
/* mcast_join = 236.149.78.5 */
host = aa.bb.cc.130
port = 8649
}

Note the commented multicast address which is not in use anymore. This way all the clients (compute nodes) send their info to the server (frontend), who’s listening on port 8649. The Server itself also sends it’s own information to it’s own IP address (snakke eating it’s own tail kinda thing). Once this is done I do a “/etc/init.d/gmond restart” on all the machines (compute nodes and frontend). Now the website for ganglia should be happy and full of info about the nodes.

More later…..

, , ,

2 responses to “Notes on installing Rocks Cluster Software…..”

  1. Hi ! i think i have de same problem with my del sc1435, i have 14 nodes + 1 frontend. When i try to install rocks 5.1 there is a problem with eth0 and eth1 assignations. the fact is eth0 and eth1 seem to be “duplicated” as eth2 and eth3. I had OSCAR installed and we were using eth0 for internat cluster and eth2 for external but now rocks wants eth0 for internal and eth1 for external. Do you think if i do as you posted i will be able to install rocks correctly ? thank you !

  2. Hi Ivief,

    I can not really guarantee it :-), that said I would try to search for a kernel boot command that you can run when you launch the 5.1 install on your frontend node. Something that would tell which MAC address should be assigned eht0 and eth1. Failing that you can go through the install and let it find eth2 and eth3 and swap them later (after the install) using my little note on editing /etc/udev/rules.d/11-local.rules.

    But I think you should search a bit more (sorry I don’t have any info on this) and look for a kernel boot string that would do the swap for you at boot time prior to installation.

    Good luck,
    TTYL
    Many

Leave a Reply