Saturday 6 October 2012

Testing MPICH2 cluster on Ubuntu 12.04

All have been already explained many times, and it's not difficult to find information. But there were some parts not very clear, at the end for me it doesn't show how to test if the cluster is doing it's job on different hosts. The point is you can run Mpich on only one host after all, but this is not very useful. So first let set the MPICH. My sources of information were:

http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Manager
http://www.petur.eu/blog/?p=59
http://webappl.blogspot.ca/2011/05/setting-up-mpich2-cluster-with-ubuntu.html

Well, how exactly I set my cluster? The steps are below, detailed information on all of them can be found on the sources I published, here is just the list:

1. Install MPICH2 - sudo aptitude install mpich2
2. Change /etc/hosts file, include hosts information for the cluster hosts. Don't forget to remove the 127.0.1.1 address for your hostname.
3. Enable ssh-key login for some (cluster) user.
4. Create mpich host file. I used mpd.hosts, but this is not the default name, and I don't know what it is (or if there is a default name at all). I just executed mpiexec with -f parameter.
5. Set .mpd.conf with the password.

That's all to have functional cluster. But how to prove the cluster is populated around the hosts? I used three tools to demonstrate this.

1. mpi_hello.c There are many variations of this simple program, just choose one of them.
2. John the ripper as described in petur.eu blog
3. tcpdump

Now the results. Compile mpi_hello.c and John. On one of the hosts run tcpdump host other_host. On other execute mpi_hello like this:

cluster@d:~$ mpiexec -f mpd.hosts -n 4 ./mpi_hello
Hello from processor 2 of 4
Hello from processor 0 of 4
Hello from processor 3 of 4
Hello from processor 1 of 4

Well, that will work even if there is one host, and give the same result. But the real proof is the tcpdump output on the other host:


tcpdump host host_2
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
16:54:34.439518 ARP, Request who-has x.lan tell d.lan, length 46
16:54:34.439588 ARP, Reply x.lan is-at 08:00:27:dc:9d:cc (oui Unknown), length 28
16:54:34.440101 IP d.lan.33199 > x.lan.ssh: Flags [S], seq 922409158, win 14600, options [mss 1460,sackOK,TS val 2205427 ecr 0,nop,wscale 3], length 0
16:54:34.440157 IP x.lan.ssh > d.lan.33199: Flags [S.], seq 1120118912, ack 922409159, win 14480, options [mss 1460,sackOK,TS val 2210524 ecr 2205427,nop,wscale 3], length 0
16:54:34.441520 IP d.lan.33199 > x.lan.ssh: Flags [.], ack 1, win 1825, options [nop,nop,TS val 2205428 ecr 2210524], length 0
16:54:34.456768 IP x.lan.ssh > d.lan.33199: Flags [P.], seq 1:40, ack 1, win 1810, options [nop,nop,TS val 2210528 ecr 2205428], length 39
16:54:34.457608 IP d.lan.33199 > x.lan.ssh: Flags [.], ack 40, win 1825, options [nop,nop,TS val 2205432 ecr 2210528], length 0
16:54:34.458207 IP d.lan.33199 > x.lan.ssh: Flags [P.], seq 1:40, ack 40, win 1825, options [nop,nop,TS val 2205432 ecr 2210528], length 39
16:54:34.458381 IP x.lan.ssh > d.lan.33199: Flags [.], ack 40, win 1810, options [nop,nop,TS val 2210529 ecr 2205432], length 0
16:54:34.460435 IP x.lan.ssh > d.lan.33199: Flags [P.], seq 40:1024, ack 40, win 1810, options [nop,nop,TS val 2210529 ecr 2205432], length 984
16:54:34.462420 IP d.lan.33199 > x.lan.ssh: Flags [P.], seq 40:1312, ack 1024, win 2071, options [nop,nop,TS val 2205433 ecr 2210529], length 1272
16:54:34.502874 IP x.lan.ssh > d.lan.33199: Flags [.], ack 1312, win 2172, options [nop,nop,TS val 2210540 ecr 2205433], length 0
--- cut ---

No need to continue, clearly the first host is talking to the second in the moment of mpi_hello execution. Let's do one more test with John. First execute standalone test, then MPI-enabled one.

cluster@d:~/src/john-1.7.2-bp17-mpi8$ run/john -format=DES -test
Benchmarking: Traditional DES [128/128 BS SSE2]... DONE
Many salts:     1112K c/s real, 1112K c/s virtual

Only one salt:  1011K c/s real, 1013K c/s virtual

cluster@d:~/src/john-1.7.2-bp17-mpi8$ mpiexec -f ~/mpd.hosts -n 4 run/john -format=DES -test
Benchmarking: Traditional DES [128/128 BS SSE2]... DONE
Many salts:     2804K c/s real, 2806K c/s virtual

Only one salt:  3790K c/s real, 3806K c/s virtual

Definitely a different results. First one is run on one core only. In my config second one is run on two hosts, two cores each. The result is not x4, but more than x2, which proves the case. Run tcpdump if you want also, or best start top on second host and watch how john kicks in like this:

 top - 17:11:50 up  2:49,  1 user,  load average: 0.10, 0.06, 0.05
Tasks:  93 total,   3 running,  90 sleeping,   0 stopped,   0 zombie
Cpu(s): 99.5%us,  0.4%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.2%si,  0.0%st
Mem:    507796k total,   462940k used,    44856k free,    52220k buffers
Swap:  1046524k total,      272k used,  1046252k free,   330820k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                      
 6106 cluster   20   0 13248 1692 1420 R   96  0.3   0:03.05 john                                                                                          
 6105 cluster   20   0 13248 1720 1448 R   96  0.3   0:03.15 john                                                                                          
  988 lightdm   20   0 99028  11m 9492 S    1  2.3   0:35.98 lightdm-gtk-gre                                                                               
 5959 root      20   0  2832 1232  984 R    1  0.2   0:00.10 top

This is it, now you can use your cluster to do some more work.

No comments: