Search

Recent Posts

Tags


« | Main | »

VirtualBox on the ‘Rocks’

By Dale Reagan | October 5, 2011

Rocks is an OS + tools clustering solution that requires front-end and compute nodes with ~1GB of RAM and 30GB of disk.  I decided to give Rocks a whirl using VirtualBox – no real need but it does provide an opportunity to explore some current HPC/clustering solutions.

** Since this post is my reaction to what I am finding I encourage readers to see the About page –  I tend to be a disruptive user, especially when reviewing new solutions….

A little background:

“Rocks is an open-source Linux cluster distribution that enables end users to easily build computational clusters, grid endpoints and visualization tiled-display walls. …

Since May 2000, the Rocks group has been addressing the difficulties of deploying manageable clusters. We have been driven by one goal: make clusters easy. By easy we mean easy to deploy, manage, upgrade and scale.”

After taking a look my impression is that the goal of making clusters easy for users has been met – as outlined below I did encounter some issues but they were not difficult to resolve.  Note that I cannot speak directly to the actual use of this solution to solve problems via software running in a cluster; my exploration is limited to a small cluster in an entirely virtual environment.  While I do speculate/touch on a few systems items, in a future post I may explore how all of the components in the Rocks actually work (something that is needed if you are actually working/supporting such an environment or if you are planning to implement your own solution.)


In my case the front-end system will have two NICS with one NIC connected to ‘the world’ and the other NIC connected to the ‘internal network’ (‘rocks_net’ of course…)  The compute_node will only use/have a connection on rocks_net.  Sounds simple so far.

I will follow the install docs from the project web site. [I did have to enable IO APIC on the VMs – found under System–Extended Features – see end of post for VirtualBox machine details.]

I start out by creating two VirtualBox ‘machines’ with the configuration noted above (** note I had to set the switch ‘Enable IO Apic‘ – see details at end of this post.)  After starting the install process I discover that ‘eth0’  will be used for rocks_net communication and that ‘eth1’ will be the ‘world connection’ – this is the opposite of how I configured the NICs.  I proceed any way (manually entering IP information for eth1.)  Note that while using VirtualBox you will need to use the VM menu options to locate CD images for your install (no need to burn CD with this approach – just download the images and place them in an easy to remember location…)

Attempt #1 (actually a series of tries, ~2+ hours during multi-tasking)

During the first install attempt I hit an ‘unhandled exception’ during the hard disk setup (I think) – I decide to re-try after changing the network item noted above.

Network Change:

After some time the install requests the ‘kernel’ CD…

I reach a hang/error – the debug output notes a ‘missing’ package… Hmmm…

Since DHCP did not work during the last attempt I change the network to a ‘bridged’ adapter for ‘eth1’…

Hmm, when I enter manual IP info why not ask for host name/domain info instead of ‘trying’ to determine it?

I get to the CD setup section, add the ‘area51 set’ (without Xen) and proceed as before…

Another hang/error… Hmmm.  I revert to NAT for eth1 and re-start.

Error is:   PackageSackError:  No Package Matching kernel

Ok, I search a bit and find that this was  problem in 2009…  I also find multiple posts indicating that Rocks will work with Virtualbox.  I decide to try a ‘network install’…

I get to the page to select my ‘rolls’ – time passes…  and I get “unable to read package meta-data…”  Abort/Continue – I Continue… and get “Unable to read group…”

Hmm – the debug information indicates that the same problem has been encountered (PackageSackError:  No Package Matching kernel.)

Since I am consistently seeing the same error I will guess that the version of Rocks that I am working with:

Time for Plan ‘B’

I locate a DVD install disk that contains multiple ‘rolls’ – using this approach I am able to install a frontend relatively quickly.

I install with the DVD image:  area51+base+bio+condor+ganglia+hpc+java+kernel+os+perl+python+sge+web-server+xen-5.4.3.i386.disk1.iso (yes, it’s one long name showing the rolls included on the DVD…)

Once the fronted is up adding ‘compute nodes’ is fairly quick:

After the frontend install completes I have a Centos, GUI based system. Next I add the VirtualBox extensions that provide relatively seamless video/mouse support and reboot to activate the extensions.  Now I can add ‘compute nodes’.

My first nodes failed due to lack of disk space (I tried using 8GB disks) so I restarted and increased disk storage to 30GB.  I am only using 512MB of RAM (or less for compute nodes) since my objective is to explore how the cluster ‘works’ from a management perspective.

On the frontend I have the following disk results after auto-partitioning:

[root@cluster ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1              16G  4.7G  9.7G  33% /
/dev/sda5             9.5G  3.1G  5.9G  34% /state/partition1
/dev/sda2             3.8G  222M  3.4G   7% /var
tmpfs                 379M     0  379M   0% /dev/shm
tmpfs                 185M  1.9M  183M   2% /var/lib/ganglia/rrds

To build compute nodes:

  1. on the frontend I open a terminal window and start the ‘insert-ethers‘ tool (which ‘listens’ for connection requests, creates database entries for new systems, and provides the needed files for auto-provisioning of the nodes)
  2. I start up a new, ’empty’ compute node using the same DVD use to create the frontend – the compute node ‘registers’ with the frontend and the install proceeds; note that in my instance the minimum RAM for installs was at least 768MB for compute nodes – otherwise the install would ‘hang’… After the install I lowered the RAM to 256MB on compute nodes.
  3. I repeat step two for any needed compute nodes (a total of three in this case.)

I now have a reasonably busy Linux host system – the compute nodes show up as:

─┤ Inserted Appliances ├────────────────┐
 │                                                |
 │ 08:00:27:f7:11:56        compute-0-2        (*)|
 │ 08:00:27:0a:1a:4c        compute-0-1        (*)│
 │ 08:00:27:6a:18:27        compute-0-0        (*)│

Each compute node installs auto-magically (after many minutes on my Quad core AMD system with 6GB of RAM.)  The CPUs are operating at 20-30% and host system RAM is at ~70%.  I successfully build three compute nodes along with the frontend.   As with most automation efforts things need to be consistent and replication is really, really nice.  So, of course, I changed the frontend (rebuilt it and included more options/software from the DVD) – now my compute nodes ‘hang’ during build;  I rebuilt the frontend without ‘condor’ and now compute nodes build without issues…

Once you complete your frontend build you can surf to http://localhost/wordpress/ and see summary information as well as explore Ganglia and other resources (what you see depends upon what you have installed/activated.)

Time for some reading and coffee…

The Rocks Base User’s Guide is a good resource:

An OS Level Review of a Rocks frontend

So how does Rocks work?   Python is key – the good/bad news – the solution was not built with change in mind – there seems to be a dependency on an older version of Python (2.4.3?) while the solution offers a ‘Python Roll’ for user’s needing a newer version of Python?  If a frontend never has to change then perhaps this type of dependency is OK; I suppose that chances are good that a new solution will simply replace this one so this will eventually become a non-issue/concern.  If another solution uses Rocks as a base then it may have a similar limitation.

Rant on (for a short bit)

Given the security improvements for node management with the 5.4.3 release I was surprised to find a significant concern with the frontend – namely how the root account is used…

The root user account uses a somewhat unusual environment – note that PATH outputs below – first for a typical Fedora Linux (Red Hat) system followed by a Rocks frontend.  There are usually a number of reasons for taking such an approach but I would prefer the use of standard alternatives, i.e. the root account is essentially not expanded in this fashion; instead the use of custom user accounts and groups provides a granular, application specific approach to managing any needed resource – in this case perhaps something like rocksman (rocks manager account and group.)

Fedora Linux - root account PATH items (sample):
  1. /usr/kerberos/sbin
  2. /usr/kerberos/bin
  3. /usr/lib/ccache
  4. /usr/local/sbin
  5. /sbin
  6. /bin
  7. /usr/sbin
  8. /usr/bin
  9. /root/bin

The potential problem with using a systems account (as shown below) for essentially application/service use lies in the fact that such privileged accounts have access to more resources than a standard user.  When you provide a PATH like the one shown below then any tool that lies in the PATH could be a vehicle to compromise the entire system…  When you ‘upgrade’ a component then you are re-exposing yourself to potential compromise.  Since the frontend provides services/resources for all nodes in a cluster then all nodes are equally open to compromise.  I will guess that the reason that this approach (using the root system account for what is essentially a service) was taken is that such a privileged account makes it easier (i.e. you usually won’t run into permission problems – the system ‘sees you’ as a trusted, knowledgeable user and tries to follow all of your requests.)  While it could be argued that clusters would/will not be running in on a public network as soon as a connection is provided to an outside resource we have a potential security hole.

Essentially, a breach/compromise on your frontend would be comparable to the recent hack (summer 2011) of Linux.org – think for a moment…  With every Linux release relying on Kernel.org this hack could introduce a compromise into every release/tool/resource downloaded from Linux.org – of course the really bad part is that any compromise is then spread in a viral fashion…

Now the Rocks frontend:
#echo $PATH | tr ":" "\n" | awk '{printf "%3d. %s\n", NR, $0}'
  1. /usr/java/latest/bin
  2. /opt/openmpi/bin
  3. /usr/kerberos/sbin
  4. /usr/kerberos/bin
  5. /usr/java/latest/bin
  6. /usr/local/sbin
  7. /usr/local/bin
  8. /sbin
  9. /bin
 10. /usr/sbin
 11. /usr/bin
 12. /usr/X11R6/bin
 13. /opt/bio/ncbi/bin
 14. ...
 62. /usr/share/pvm3/bin/LINUX
 63. /opt/rocks/bin
 64. /opt/rocks/sbin
 65. /opt/condor/bin
 66. /opt/condor/sbin
 67. /opt/gridengine/bin/lx26-x86

When I re-build the frontend, the PATH for the root account is ‘shorter’ so I will assume that it will vary based on the roll(s) installed on any given Rocks build.

  1. /usr/java/latest/bin
  2. /opt/openmpi/bin
  3. /usr/kerberos/sbin
  4. /usr/kerberos/bin
  5. /usr/java/latest/bin
  6. /usr/local/sbin
  7. /usr/local/bin
  8. /sbin
  9. /bin
 10. /usr/sbin
 11. /usr/bin
 12. /usr/X11R6/bin
 13. /opt/eclipse
 14. /opt/ganglia/bin
 15. /opt/ganglia/sbin
 16. /opt/pdsh/bin
 17. /usr/share/pvm3/bin/LINUX
 18. /opt/rocks/bin
 19. /opt/rocks/sbin
 20. /opt/gridengine/bin/lx26-x86
 21. /root/bin
 22. /opt/eclipse
 23. /opt/ganglia/bin
 24. /opt/ganglia/sbin
 25. /opt/pdsh/bin
 26. /usr/share/pvm3/bin/LINUX
 27. /opt/rocks/bin
 28. /opt/rocks/sbin
 29. /opt/gridengine/bin/lx26-x86

While there is significant effort involved I do suggest that solutions like Rocks use a user/group approach for providing/managing cluster applications & services (along with sudo as needed.)  Why not take advantage of lessons learned from previous ‘services’ (i.e. Apache use has been refined/evolved with a ‘best practices’ approach for security) and only use a best practices approach from the start instead of waiting for some outrageous compromise that is easily avoided?  I can’t think of a good reason to not take Rocks to the best practice level for all components…

Ok, rant off and back to How does Rocks Work?

From the FAQ – my comments within []:
A compute node kickstart (an automated build process) requires the following services to be running on the frontend:

  1. dhcpd [provides IP assignement & cluster DNS management]
  2. httpd [‘serves’ build image/RPM files to new nodes]
  3. mysqld [tracks cluster entities & properties]
  4. autofs [manages auto-mounting of resources for the cluster]

Some items that are most likely in the background (i.e. built into the base of the frontend):

  1. tftp – or something similar used during the PXE boot process for new nodes; if PXE booting does not work then you simply use a CD/DVD  – which is what I am doing – debugging a PXE boot process can be time consuming…
  2. firewall management (allowing/denying needed traffic)
  3. a well-defined backend storage for all cluster-related resources
  4. a process to manage building/serving ‘rolls’ (custom node types or resources); the system appears to use something like Cobbler.

An simple overview of getting started with Rocks  Clusters

  1. Install the frontend with desired services/rolls
  2. Install desired nodes (compute, appliance, etc.)
  3. Review monitoring of the cluster (via Ganglia)
  4. Start/test running jobs
    • useradd rocksuser ## create a user account to submit jobs with
    • rocks sync users ### ‘sync’ your users across your cluster
    • ssh-agent $SHELL ## start the ssh agent
    • ssh-add ## authenticate with the agent
    • rocks host ps ## now you can run commands via SSH

Once you have a user account you can try out some simple simple scripts or commands across the cluster. After logging in and starting a terminal session on your frontend you need to setup ssh keys but running the ssh-agent and then adding hosts.  Note that simple shell commands seem work well as argument for ‘rocks run host’ command but complex commands (i.e. using pipes and commands requiring quoted arguments may present problems) are probably best saved to a shell script for running on the nodes…

[rocksuser@cluster work]$ ssh-agent /bin/bash
[rocksuser@cluster work]$ ssh-add
Enter passphrase for /home/rocskuser/.ssh/id_rsa:
Identity added: /home/rocksuser/.ssh/id_rsa (/home/rocksuser/.ssh/id_rsa)

The first time that you connect to a new node your user-ssh info is exchanged.
[rocksuser@cluster work]$ rocks run host ps
Warning: Permanently added 'compute-0-1' (RSA) to the list of known hosts.
/usr/bin/xauth:  creating new authority file /home/rocksuser/.Xauthority
  PID TTY          TIME CMD
11319 ?        00:00:00 sshd
11324 ?        00:00:00 ps
Warning: Permanently added 'compute-0-0' (RSA) to the list of known hosts.
  PID TTY          TIME CMD
11441 ?        00:00:00 sshd
11446 ?        00:00:00 ps
Warning: Permanently added 'compute-0-2' (RSA) to the list of known hosts.
  PID TTY          TIME CMD
11347 ?        00:00:00 sshd
11352 ?        00:00:00 ps

No ssh 'warnings' are generated on subsequent runs.
[rocksuser@cluster work]$ rocks run host "hostname;uptime;w"
compute-0-0.local
 19:24:59 up  3:51,  0 users,  load average: 0.27, 0.28, 0.27
 19:24:59 up  3:51,  0 users,  load average: 0.27, 0.28, 0.27
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
compute-0-1.local
 19:25:00 up  3:54,  0 users,  load average: 0.36, 0.33, 0.29
 19:25:00 up  3:54,  0 users,  load average: 0.36, 0.33, 0.29
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
compute-0-2.local
 19:24:59 up  3:55,  0 users,  load average: 0.38, 0.36, 0.29
 19:24:59 up  3:55,  0 users,  load average: 0.38, 0.36, 0.29

Example output from Some simple Rocks commands

rocks help | \
   awk '{printf "%3d. %s\n", NR, $0}' | more ## currently ~180 commands

rocks help | \
   awk '{print $1}'|sort -u | \
   awk '{printf "%3d. %s\n", NR, $0}' ### ~16 'base' commands

rocks list host ## list of nodes/frontend
HOST         MEMBERSHIP CPUS RACK RANK RUNACTION INSTALLACTION
cluster:     Frontend   1    0    0    os        install      
compute-0-0: Compute    1    0    0    os        install      
compute-0-1: Compute    1    0    1    os        install      
compute-0-2: Compute    1    0    2    os        install

rocks report host ## local IP table - auto-managed
# Added by rocks report host #
#        DO NOT MODIFY       #
#  Add any modifications to  #
#    /etc/hosts.local file   #

127.0.0.1       localhost.localdomain   localhost

10.1.1.1        cluster.local   cluster
10.1.255.254    compute-0-0.local       compute-0-0
10.1.255.253    compute-0-1.local       compute-0-1
10.1.255.252    compute-0-2.local       compute-0-2
10.0.3.15       cluster.inhouse
The 'base' commands expand into cluster-wide or host specific management.
rocks help | awk '{print $1}'|sort -u | awk '{printf "%3d. %s\n", NR, $0}'
  1. add
  2. config
  3. create
  4. disable
  5. dump
  6. enable
  7. help
  8. iterate
  9. list
 10. remove
 11. report
 12. run
 13. set
 14. swap
 15. sync
 16. update

Overall Rocks seems to be a good solution for exploring managing/using cluster resources.  In a future post I will take a more in-depth look into using Rocks with VirtualBox .  As always, your mileage should vary – at least a small bit.  🙂


VirtualBox Configuration (only showing ‘active’, non-local settings)

While using VirtualBox  I encountered some disk space issues during compute node installs if the RAM was less than 768MB – I will guess that this is due to the use of RAM for temporary file systems during the install.  Once the install completes I reduced the RAM and the nodes seem to be working fine…  Note that VMs using such a small amount of RAM/CPU are probably not very useful for actual ‘cluster computing’ but they should be fine for testing most of the components used in a clustering solution (i.e. finding problems with the automation components of such a solution.)


Name:            rocks-base
Guest OS:        Linux 2.6
Memory size:     1024MB
VRAM size:       128MB
Chipset:         piix3
Firmware:        BIOS
Number of CPUs:  1
Boot menu mode:  message and menu
Boot Device (1): DVD
Boot Device (2): HardDisk
ACPI:            on
IOAPIC:          on
RTC:             UTC
Hardw. virt.ext: on
Hardw. virt.ext exclusive: on
Nested Paging:   on
VT-x VPID:       on
Monitor count:   1
3D Acceleration: on
Teleporter Port: 0
Storage Controller Name (0):            IDE Controller
Storage Controller Type (0):            PIIX4
Storage Controller Instance Number (0): 0
Storage Controller Max Port Count (0):  2
Storage Controller Port Count (0):      2
Storage Controller Bootable (0):        on
Storage Controller Name (1):            SATA Controller
Storage Controller Type (1):            IntelAhci
Storage Controller Instance Number (1): 0
Storage Controller Max Port Count (1):  30
Storage Controller Port Count (1):      4
Storage Controller Bootable (1):        on
SATA Controller (2, 0): Empty
SATA Controller (3, 0): Empty
NIC 2 Settings:  MTU: 0, Socket( send: 64, receive: 64),
        TCP Window( send:64, receive: 64)
Pointing Device: USB Tablet
Keyboard Device: PS/2 Keyboard
Audio:           enabled (Driver: PulseAudio,
        Controller: AC97)
Clipboard Mode:  Bidirectional
USB:             enabled
OS type:                             Linux26
Additions run level:                 0
Configured memory balloon size:      0 MB

Name: compute_node01
 Guest OS: Debian
 Memory size: 256MB
 VRAM size: 12MB
 Chipset: piix3
 Firmware: BIOS
 Number of CPUs: 1
 Boot menu mode: message and menu
 Boot Device (1): Network
 Boot Device (2): HardDisk
 ACPI: on
 IOAPIC: on
 RTC: UTC
 Hardw. virt.ext: on
 Hardw. virt.ext exclusive: on
 Nested Paging: on
 VT-x VPID: on
 Monitor count: 1
 3D Acceleration: on
 Teleporter Port: 0
 Storage Controller Name (0): IDE Controller
 Storage Controller Type (0): PIIX4
 Storage Controller Instance Number (0): 0
 Storage Controller Max Port Count (0): 2
 Storage Controller Port Count (0): 2
 Storage Controller Bootable (0): on
 Storage Controller Name (1): SATA Controller
 Storage Controller Type (1): IntelAhci
 Storage Controller Instance Number (1): 0
 Storage Controller Max Port Count (1): 30
 Storage Controller Port Count (1): 1
 Storage Controller Bootable (1): on
 IDE Controller (1, 0): Empty
 Pointing Device: USB Tablet
 Keyboard Device: PS/2 Keyboard
 Audio: enabled (Driver: PulseAudio, Controller: AC97)
 Clipboard Mode: Bidirectional
 USB: enabled
 OS type: Debian
 Additions run level: 0
 Configured memory balloon size: 0 MB

Topics: Computer Technology, System and Network Security, Unix-Linux-Os, Virtual-Cloud Computing | Comments Off on VirtualBox on the ‘Rocks’

Comments are closed.


________________________________________________
YOUR GeoIP Data | Ip: 73.21.121.1
Continent: NA | Country Code: US | Country Name: United States
Region: | State/Region Name: | City:
(US only) Area Code: 0 | Postal code/Zip:
Latitude: 38.000000 | Longitude: -97.000000
Note - if using a mobile device your physical location may NOT be accurate...
________________________________________________

Georgia-USA.Com - Web Hosting for Business
____________________________________