Building high-performance clusters with LAM/MPI

On the LAM

© Witold Krasowski, 123RF

© Witold Krasowski, 123RF

Article from Issue 108/2009
Author(s):

The venerable LAM/MPI infrastructure is a stable and practical platform for building high-performance applications.

Nowadays, a large number of commercial applications, especially in the areas of engineering, data mining, scientific/medical research, and oil exploration, are based on the capabilities of parallel computing. In most cases, programmers must design or customize these parallel applications to benefit from the possibilities of the parallel architecture. Message-Passing Interface (MPI) is a set of API functions that pass messages between processes (even between processes that are running on different physical servers) so a program can operate in a completely parallel fashion. MPI has now become a standard in the parallel programming world.

LAM/MPI [1] is an open source implementation of the MPI standard maintained by students and faculty at Indiana University. The LAM/MPI system has been around for 20 years, and it is considered a stable and mature tool. According to the website, LAM/MPI is now in "maintenance mode," with most of the new development work occurring instead on the alternative next-generation Open MPI implementation. However, LAM/MPI is still receiving critical patches and bug fixes, and the large install base of LAM/MPI systems around the world means that developers who specialize in parallel programming are still quite likely to encounter it.

In addition to offering the necessary libraries and files to implement the mandated MPI API functions, LAM/MPI also provides the LAM run-time environment. This run-time environment is based on a user-level daemon that provides many of the services required for MPI programs.

The major components of the LAM/MPI package are well integrated and designed as an extensible component framework with a number of small modules. These modules are selectable and configurable at run time. This component framework is known as the System Services Interface (SSI).

LAM/MPI provides high performance on a variety of platforms, from small off-the-shelf single-CPU clusters to large SMP machines with high-speed networks. In addition to high performance, LAM/MPI comes with a number of usability features that help with developing large-scale MPI applications.

In this article, I take a look at how to get started with parallel programming in LAM/MPI.

The Cluster

The cluster used in this article is based on an Intel blade center with five Intel blades. Each blade had two processors, two Ethernet cards, one FC card, and 8GB of memory. I used Red Hat Linux Enterprise Edition v4 as the operating system for this High-Performance Computing (HPC) example.

For low-cost implementations, you can used SCSI or SAN-based storage (for the master node only) and then make it available to all client nodes through NFS. Another possibility is to use NAS instead of SAN. A cluster filesystem such as GFS, GPFS, or Veritas can provide additional performance improvements; however, note that almost 95% of cluster computing implementations work well with a simple NAS or SAN.

Each blade system should have at least two Ethernet interfaces. You might even consider having additional Ethernet cards on your client or master nodes and then using Ethernet bonding on public network or private network interfaces to improve performance and availability.

On my network, the addressing for intra-cluster communications uses a private, non-routable address space (e.g., 10.100.10.xx/24). Slave nodes use only a 1GB Ethernet port (eth0) with an address in the private network. The master node (eth0) address is used as the gateway for the slave nodes.

SELinux is enabled by default on RHEL 4 systems, but if this enhanced security functionality is not necessary, consider turning it off. The iptables firewall is also enabled by default. If the cluster is on a completely trusted network, you can disable the firewall. If not, the firewall should be enabled on the master node only with SSH access enabled.

If your cluster contains a large number of nodes, you will also want some form of administration tool for central management of the cluster. I'll use the C3 open source cluster management toolset for this example. (See the box titled "Setting up C3.")

Setting Up C3

Several open source distributed shell tools provide cluster management capabilities, but I decided to rely on C3 [2] – an excellent, fast, and efficient tool for centralized cluster management. The C3 suite provides a number of tools for managing files and processes on the cluster.

You can download the C3 RPM from the project website [3] and install it as root. The installer will set up the package in the /opt/c3-4 directory for the current version. Then add the following symbolic links:

ln -s /opt/c3-4/cexec  /usr/bin/cexec
ln -s /opt/c3-4/cpush  /usr/bin/cpush

The next step is to specify the cluster nodes in the /etc/c3.conf file, as shown in Listing 1. (The example in Listing 1 assumes all cluster nodes are able to resolve hostnames with their /etc/hosts files.)

Now you can test functionality of the C3 tools. For instance, the cexec tool executes a command on each cluster node. The command:

#/home/root> cexec df

outputs the filesystem layout on all client nodes.

Listing 1

Example /etc.c3.conf file

 

Creating a LAM User

Before you start up the LAM run time, create a non-root user to own the LAM daemons. On master node, execute:

/home/root> useradd -d lamuser
/home/root> passwd lamuser (enter lamuser passwd at prompt)

To synchronize security information related to lamuser from the master node to all client nodes, use cpush:

/home/root> cpush /etc/passwd
cpush /etc/shadow
cpush /etc/group
cpush /etc/gshadow

Also, you need to set up an SSH passwordless configuration for the lamuser account.

Installing LAM/MPI

Conceptually, using LAM/MPI for parallel computation is very easy. First, you have to launch the LAM run-time environment (or daemon) on all master and client nodes before compiling or executing any parallel code. Then you can compile the MPI programs and run them in the parallel environment provided by the LAM daemon. When you are finished, you can shut down the LAM daemons on the nodes participating in the computation.

Start by installing LAM/MPI on all nodes in the cluster. Red Hat RPM packages, along with source code, are at the LAM/MPI project website [4]. Debian packages are also available [5].

Before you install LAM/MPI, make sure you have the MPI-ready C compilers and the libaio library. The C3 tool suite described earlier is a convenient aid for installing the LAM/MPI RPM on all the files of the cluster. For example, the combination of the cpush, cexec, and rpm commands can install LAM binaries simultaneously on all cluster nodes.

cd /tmp/lam
cpush lam-7.10.14.rpm
cexec rpm -i lam-7.10.14.rpm

After successful installation of LAM/MPI, you have to set up the PATH environment variable. In this case, the user owning the LAM daemons is lamuser and the default shell is Bash, so I edit /home/lamuser/.bashrc to modify the PATH variable definition so it contains the path to the LAM binaries:

export PATH=/usr/local/lam/bin:$PATH

If you are using the C shell, add the following line to the /home/lamuser/.cshrc file:

set path = (/usr/local/lam/bin $path)

Note that this path might be different depending upon your distribution of Linux and the version of LAM/MPI.

To transfer the modified .bashrc or .cshrc file to all client nodes, use the C3 cpush command.

Now you are ready to define the LAM/MPI cluster nodes that will participate in your parallel computing cluster. The easiest way to define the nodes is to create a file named lamhosts in /etc/lam on the master node and then use cpush to send this file to all of the client nodes.

The lamhosts file should contain resolvable names of all nodes. A lamhosts file for my high-performance computing cluster is shown in Listing 2.

Listing 2

/etc/lam/lamhosts

 

The cpu setting in the lamhosts file takes an integer value and indicates how many CPUs are available for LAM to use on a specific node. If this setting is not present, the value of 1 is assumed. Notably, this number does not need to reflect the physical number of CPUs – it can be smaller than, equal to, or greater than the number of physical CPUs in the machine. It is used solely as a shorthand notation for the LAM/MPI mpirun command's C option, which means "launch one process per CPU as specified in the boot schema file."

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • PelicanHPC

    Crunch big numbers with your very own high-performance computing cluster.

  • HPC Cluster Basics

    The beginning for high-performance computing is understanding what you are trying to achieve, the assumptions you make to get there, and the resulting boundaries and limitations imposed on you and your HPC system.

  • Samba for Clusters

    Samba Version 3.3 and the CTDB lock manager provide full cluster support.

  • OpenSSI

    The OpenSSI framework rearranges processes for easy and transparent clustering.

  • Rocks Clustering

    Rocks offers an easy solution for clustering with virtual machines.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More

News