|
Shell script parallelization with LAM/MPI and GNU-Darwin:
The CCP4 examples
under construction
LAM/MPI
provides an elegant facility for the parallel execution of standard Unix
binaries from the command line,
which can be simply incorporated into many existing shell scripts in order to take advantage of
multiple processors, computing clusters, and other parallel computing
environments.
Critical targets for source-level
parallelization are quickly and clearly indicated through the implementation
of this method.
Moreover, this form of parallelization can often produce profound performance
improvements, without resort to source level modifications, so that proprietary
binaries can be incorporated into parallel algorithms (if desired).
Parallel method
Csh scripts commonly employ
"goto" statements and functions tags, which enable the users to bypass
redundancy in their computational algorithms.
#!/bin/csh
#
#
goto fft
.
. # here will be calculations prevously performed, such as solvent
flattening.
.
fft:
fft ...
The LAM/MPI utility, "lamexec" provides a convenient means to exploit such
structure in existing shell scripts and specifically direct the
computations to multiple processors, while csh "wait" and "&" provide an additional level of
job control. A first step is to make the script
"self-aware", so that it can spawn itself on multiple processors. In the
following example, the variables my_cwd and my_cur_doc are used to produce this
self-awareness effect.
#!/bin/csh
#
# Set working directory and script name
#
setenv my_cwd /Users/love/work/szebenyi_cyclase/multi-cryst/lsqkab
setenv my_cur_doc lsqkab.com
source /usr/local/ccp4/include/ccp4.setup
# Do parallel
switch ( $1 )
case ""
lamexec n1 csh $my_cwd/$my_cur_doc job01a &
lamexec n1 csh $my_cwd/$my_cur_doc job02a &
lamexec n0 csh $my_cwd/$my_cur_doc job03a &
lamexec n0 csh $my_cwd/$my_cur_doc job04a &
wait
lamexec n1 csh $my_cwd/$my_cur_doc job05a
breaksw
default:
endsw
exit
job01a:
lsqkab ...
#
exit
#
job02a:
lsqkab ...
#
exit
#
(etc)
(full example. CCP4 required. All of the examples
were tested on Astrolabe, the G4 minicluster at Cornell University.)
Additional features
Specific jobs can be simply
executed by adding an argument to the shell script execution at the command
line, which aids debugging and preserves the ability to bypass redundancy.
This
feature also increases the reusability of the code by providing a convenient
means to integrate and parallelize existing scripts into larger routines.
# source lsqkab.com job5a
Use of wrappers
Wrappers can be quite helpful for facilitating this parallelization work,
especially in cases where MPI or PVM enabled binaries are already available.
For example, the following wrapper can be substituted for the familiar
"x-povray" command and
adds the
required arguments for PVM parallelization.
#!/bin/csh
x-pvmpov $* pvm_hosts=a1,a1,a2,a2
It appears that PVM deals with the ssh binaries in a less sophisticated manner
than MPI, but this problem is easily overcome with an ssh wrapper script.
Potential of the method
Within the parallelization scheme offered here, such wrappers could
transparently improve the performance of commonly used
commands. For example, a wrapper script for "scp" might be used to
overcome per-process bandwidth limitations with parallel processing. A wrapper
for the "make" command might be used to parse the job control flags,
spider the relevant directories, and parallelize compilation with lamexec.
For
CCP4 users, this parallel processing method could provide particular utility for
NCS, and multi-crystal refinement procedures, data integration, rotation and translation
searches. Clearly, the method also has applications in other batch processes
which are amenable to parallel computation, such as audio/visual encoding and
decoding, decryption, dictionary chases, the
GNU-Darwin/TDC package production process, and the Unix
boot process.
Get started now
GNU-Darwin currently provides
LAM/MPI
binaries and package installation notes for
the following platforms.
Other parallel processing related packages available from GNU-Darwin:
Notes on NFS, user ID's, NAT, and security
Cluster-wide mounting of network disks is a crucial technology underpinning
parallel computing at this time, and much of the software has been
optimized for use with NFS. Darwin is very much like other BSD operating
systems in that network disks can be simply mounted with the
"mount_nfs" command. Serving network disks is quite another matter,
and it requires some knowledge of the NetInfo system. Fortunately, there is a
very helpful example in the
Darwinfo FAQ.
Efficient work with NFS disks is facilitated by having identical user ID
numbers for each user on all of the nodes. After working through the NFS
NetInfo example, it should become apparent that such variables are simply and
securely changed with the NetInfo utilities.
Any computer serving NFS requires Portmap, and thus, security maintainance
becomes a necessity. It is worthwhile to keep your cluster behind a secure
external firewall. An OpenBSD-x86 computer can serve as an, inexpensive,
reliable, and secure firewall and
router for your internal network. If that is
not feasible, then your Darwin master node can be configured with a firewall
and Network Address Tranlation (NAT). This example may
serve to get you started, although there are reports that NAT is broken as of
Darwin-1.4.1.
More to come.
email Dr. Love
|