X CAT
X CAT
8/7/2008
(Valid for both xCAT 2.0.x and pre-release 2.1)
Table of Contents
1.0 Introduction.........................................................................................................................................3
1.1 Stateless and Stateful Choices.........................................................................................................3
1.2 Scenarios..........................................................................................................................................4
1.2.1 Simple Cluster of Rack-Mounted Servers – Stateful Nodes....................................................4
1.2.2 Simple Cluster of Rack-Mounted Servers – Stateless Nodes..................................................4
1.2.3 Simple BladeCenter Cluster – Stateful or Stateless Nodes......................................................4
1.2.4 Hierarchical Cluster - Stateless Nodes.....................................................................................5
1.3 Other Documentation Available......................................................................................................5
1.4 Cluster Naming Conventions Used in This Document...................................................................5
2.0 Installing the Management Node.........................................................................................................6
2.1 Prepare the Management Node.......................................................................................................6
2.1.1 Set Up Your Networks.............................................................................................................6
2.1.2 Install the Management Node OS.............................................................................................6
2.1.3 Ensure That SELinux is Disabled.............................................................................................6
2.1.4 Prevent DHCP client from overwriting DNS configuration....................................................6
2.1.5 Configure Cluster-Facing NICs................................................................................................6
2.1.6 Configure Hostname.................................................................................................................7
2.1.7 Configure DNS Resolution.......................................................................................................7
2.1.8 Set Up basic hosts file..............................................................................................................7
2.1.9 Restart Management Node.......................................................................................................7
2.1.10 Configure Ethernet Switches..................................................................................................7
2.2 Download Linux Distro ISOs and Create Repository.....................................................................7
2.3 Downloading and Installing xCAT 2.............................................................................................8
2.3.1 Choosing the Version of xCAT You Want to Use...................................................................8
2.3.2 If Your Management Node Has Internet Access:.....................................................................8
2.3.2.1 Download Repo Files......................................................................................................8
2.3.2.2 Set Up Repo File for Fedora Site....................................................................................9
2.3.3 If Your Management Node Does Not Have Internet Access:..................................................9
2.3.3.1 Download xCAT 2 and Its Dependencies.......................................................................9
2.3.3.2 Get Distro Open Source Dependencies from Fedora Site...............................................9
2.3.3.3 Set Up YUM repositories for xCAT and Dependencies...............................................10
2.3.4 Install xCAT 2 software & Its Dependencies.........................................................................10
2.3.5 Test xCAT installation...........................................................................................................10
2.3.6 Update xCAT 2 software........................................................................................................10
2.3.7 Set Up the Install Directory for Fedora8 Node Installs .........................................................11
3.0 xCAT Hierarchy Using Service nodes..............................................................................................11
3.1 Switching to PostgreSQL Database..............................................................................................11
3.2 Define the service nodes in the database......................................................................................14
1
3.2.1 Add Service Nodes to the nodelist Table...............................................................................14
3.2.2 Set Attributes of the Service Nodes........................................................................................14
3.2.3 Configure the Service Node BMCs and Discover MACs......................................................15
3.2.4 Set Necessary Attributes in site Table....................................................................................15
4.0 Set Up Services on the Management Node.......................................................................................16
4.1 Set Up networks Table..................................................................................................................16
4.2 Set Up DHCP................................................................................................................................16
4.3 Set Up NTP....................................................................................................................................17
4.4 Set Up DNS...................................................................................................................................17
4.5 Define AMMs as Nodes................................................................................................................18
4.6 Set Up AMMs................................................................................................................................18
4.7 Start Up TFTP...............................................................................................................................19
4.8 Other Services...............................................................................................................................19
5.0 Define Compute Nodes in the Database............................................................................................19
5.1 Set Up the nodelist Table..............................................................................................................20
5.2 Set Up the nodehm table................................................................................................................20
5.3 Set Up the mp Table......................................................................................................................21
5.4 Set Up Conserver...........................................................................................................................21
5.5 Set Up the noderes Table...............................................................................................................21
5.6 Set Up nodetype Table..................................................................................................................22
5.7 Set Up Passwords in passwd Table...............................................................................................22
5.8 Verify the Tables...........................................................................................................................22
5.9 Set Up deps Table for Proper Boot Sequence of Triblades...........................................................22
5.10 Set Up Postscripts to be Run on the Nodes.................................................................................23
5.11 Get MAC Addresses for the Blades...........................................................................................23
5.12 Add Compute Nodes to DHCP...................................................................................................23
6.0 Install or Stateless Boot the Service Nodes.......................................................................................23
6.1 Build the Service Node Stateless Image........................................................................................23
6.2 Set Up the Service Nodes for Installation.....................................................................................26
6.3 Boot or Install the Service Nodes..................................................................................................26
6.4 Test Service Node installation......................................................................................................27
7.0 Install the LS21 Blades......................................................................................................................27
8.0 iSCSI Install a QS22 Blade...............................................................................................................27
9.0 Build and Boot the LS21 and QS22 Stateless Images.......................................................................28
9.1 Build the Stateless Image.............................................................................................................29
9.2 Test Boot the Stateless Image .......................................................................................................30
9.3 To Update QS22 Stateless Image..................................................................................................31
9.4 Build the Compressed Image.........................................................................................................31
9.4.1 Build aufs on Your Sample Node...........................................................................................31
9.4.2 Generate the Compressed Image...........................................................................................32
9.4.3 Optionally Use Light Weight Postscript.................................................................................32
9.4.4 Pack and Install the Compressed Image.................................................................................32
9.4.5 Check Memory Usage............................................................................................................33
10.0 Building QS22 Image for 64K pages..............................................................................................33
10.1 Rebuild aufs.................................................................................................................................34
10.2 Test unsquashed:.........................................................................................................................35
10.2.1 Check memory.....................................................................................................................35
2
10.3 Test squash..................................................................................................................................35
10.3.1 Check memory.....................................................................................................................36
10.4 To Switch Back to 4K Pages.......................................................................................................36
11.0 Using NFS Hybrid for the Diskless Images....................................................................................36
12.0 Install Torque..................................................................................................................................40
12.1 Set Up Torque Server..................................................................................................................40
12.2 Configure Torque........................................................................................................................40
12.3 Define Nodes..............................................................................................................................40
12.4 Set Up and Start Service.............................................................................................................40
12.5 Install pbstop...............................................................................................................................41
12.6 Install Perl Curses for pbstop......................................................................................................41
12.7 Create a Torque Default Queue...................................................................................................41
12.8 Set Up Torque Client ( x86_64 only)..........................................................................................41
12.8.1 Install Torque........................................................................................................................41
12.8.2 Configure Torque.................................................................................................................41
12.8.2.1 Set Up Access..............................................................................................................41
12.8.2.2 Set Up Node to Node ssh for Root .............................................................................42
12.8.3 Pack and Install image..........................................................................................................42
13.0 Set Up Moab....................................................................................................................................42
13.1 Install Moab.................................................................................................................................42
13.2 Configure Moab...........................................................................................................................42
13.2.1 Start Moab............................................................................................................................43
14.0 Appendix: Customizing Your Nodes by Creating Your Own Postscripts.....................................43
1.0 Introduction
xCAT 2 is a complete rewrite of xCAT 1.2/1.3, implementing a new architecture. All commands are
client/server, authenticated, logged and policy driven. The clients can be run on any OS with Perl,
including Windows. The code has been completely rewritten in Perl, and table data is now stored in a
relational database.
This cookbook provides step-by-step instructions on setting up an example stateless cluster. For
completeness, some advanced topics are covered, like hierarchical management (for extremely large
clusters), compute nodes with large pages, NFS hybrid mode, mixed node architectures, and
accelerator nodes. If you do not intend to use some of these features, skip those sections. (Section 1.2
will tell you which sections to skip.) The example cluster in this document is built with Fedora 8, but
the same concepts apply to Fedora 9, RHEL 5, and (to a lesser extent) SLES 10.
3
● All nodes will have a much greater likelihood of staying consistent. And if the administrator
does suspect that a node is out of sync with the rest of the cluster, they can simply reboot it and
know that it is back in its original, pristine state.
● If a node experiences a hardware problem, the hardware can be pulled from the rack and
replaced with new hardware and the node booted again and it will come up with the same state
as before.
● In a provisioning environment, new nodes can be provisioned or moved without the worry of
them losing state.
xCAT 2 provides the choice of either stateless or stateful nodes. A stateful node is one that has the OS
installed on its local hard disk and therefore, changes to the node (configuration changes, software
updates, etc.) can be made over time and those changes will persist.
Stateless nodes in xCAT 2 are implemented by not putting the OS on the local disk of the node. There
are 3 choices for stateless:
1. RAM-root – The entire OS image is contained in a RAM file system that is sent to the node
when it boots. Typical size for a minimal compute node for Linux is 75-160 MB of memory.
2. Compressed RAM-root – The OS image is in a compressed tar file. Individual files are
extracted and cached when read. File writes are done to the cached copy. Typical size for a
minimal compute node for Linux is 30-64 MB of memory.
3. NFS Hybrid – This is more accurately called NFS-root with copy-on-write. A minimal boot
kernel is sent to the node, which readonly NFS mounts the OS image from the server. Files
read are cached in memory. File writes are done to the cached copy. Typical size for a
minimal compute node for Linux is 5 MB of memory.
1.2 Scenarios
The following scenarios are meant to help you navigate through this document and know which
sections to follow and which to ignore for an environment that is similar to yours.
4
● Follow chapter 5 to define the compute nodes in the xCAT database, except that instead of
using the service node as the conserver and xcatmaster, use the management node hostname.
● If you want the nodes to be stateful (full operating system on its local disk) follow chapter 7
● If you want the nodes to be stateless (diskless) follow the example of booting the LS21 blades
in chapter 9
● Optionally follow chapters 12 and 13 to install Torque and Moab
5
● The cluster is divided into management sub-domains called connected units (CU). Each CU
has its own subnet (and broadcast domain) and is designated by a single letter. So the 1st CU is
rra, the 2nd rrb, etc.
● Within each CU, the nodes are grouped into threes (designated by a, b, c) and then the groups
are numbered sequentially: rra001a, rra001b, rra001c, rra002a, etc. In this particular example,
the “a” node is an opteron node, and the “b” and “c” nodes are accelerator Cell nodes for the
opteron node.
● Each CU has a service node that acts as an assistant management node on behalf of the main
management node. The service node has 2 ethernet adapters: the adapter on the management
node side is named, for example, rra000-m, and the adapter on the CU compute node side is
named, for example, rra000.
● The BladeCenter chassis within each CU are numbered sequentially, e.g. bca01, bca02, etc.
SELINUX=disabled
DEVICE=eth1
ONBOOT=yes
BOOTPROTO=static
6
IPADDR=11.16.0.1
NETMASK=255.255.0.0
search cluster
nameserver 11.16.0.1
mkdir /root/xcat2
cd /root/xcat2
export BASEURL=ftp://download.fedora.redhat.com/pub/fedora/linux/releases/8
wget $BASEURL/Fedora/x86_64/iso/Fedora-8-x86_64-DVD.iso
wget $BASEURL/Fedora/ppc/iso/Fedora-8-ppc-DVD.iso
mkdir /root/xcat2/fedora8
mount -r -o loop /root/xcat2/Fedora-8-x86_64-DVD.iso /root/xcat2/fedora8
cd /etc/yum.repos.d
mkdir ORIG
mv fedora*.repo ORIG
7
Create fedora.repo with contents:
[fedora]
name=Fedora $releasever - $basearch
baseurl=file:///root/xcat2/fedora8
enabled=1
gpgcheck=0
On SLES, get access to the SLES RPMs and run “zypper sa <url>” to point to them.
Now use the appropriate links you've chosen above in section 2.3.2 or 2.3.3.
cd /etc/yum.repos.d
wget http://xcat.sf.net/yum/core-snap/xCAT-core-snap.repo
wget http://xcat.sf.net/yum/xcat-dep/rh5/x86_64/xCAT-dep.repo
8
Or on SLES, also do:
zypper sa http://xcat.sf.net/yum/core-snap
zypper sa http://xcat.sf.net/yum/xcat-dep/sles10/x86_64
Create fedora-internet.repo:
[fedora-everything]
name=Fedora $releasever - $basearch
failovermethod=priority
#baseurl=http://download.fedora.redhat.com/pub/fedora/linux/releases/
$releasever/Everything/$basearch/os/
mirrorlist=http://mirrors.fedoraproject.org/mirrorlist?repo=fedora-
$releasever&arch=$basearch
enabled=1
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-fedora file:///etc/pki/rpm-gpg/RPM-GPG-
KEY
Continue now at step 2.3.4, Install xCAT 2 software & Its Dependencies.
cd /root/xcat2
wget http://xcat.sf.net/yum/core-rpms-snap.tar.bz2
wget http://downloads.sourceforge.net/xcat/xcat-dep-2.0.1.tar.bz2?use_mirror=osdn
# choose latest version available by browsing
https://sourceforge.net/project/showfiles.php?group_id=208749&package_id=258529
tar jxvf core-rpms-snap.tar.bz2
tar jxvf xcat-dep-2*.tar.bz2
cd /root/xcat2/xcat-dep/rh5/x86_64
export
BASEURL=http://download.fedora.redhat.com/pub/fedora/linux/releases/8/Everything
/x86_64/os/Packages/
wget $BASEURL/perl-Net-SNMP-5.2.0-1.fc8.1.noarch.rpm
wget $BASEURL/perl-XML-Simple-2.17-1.fc8.noarch.rpm
wget $BASEURL/perl-Crypt-DES-2.05-4.fc7.x86_64.rpm
wget $BASEURL/net-snmp-perl-5.4.1-4.fc8.x86_64.rpm
wget $BASEURL/ksh-20070628-1.1.fc8.x86_64.rpm
wget $BASEURL/perl-IO-Socket-INET6-2.51-2.fc8.1.noarch.rpm
wget $BASEURL/dhcp-3.0.6-10.fc8.x86_64.rpm
wget $BASEURL/syslinux-3.36-7.fc8.x86_64.rpm
wget $BASEURL/mtools-3.9.11-2.fc8.x86_64.rpm
9
wget $BASEURL/expect-5.43.0-9.fc8.x86_64.rpm
wget $BASEURL/perl-DBD-SQLite-1.12-2.fc8.1.x86_64.rpm
wget $BASEURL/perl-Expect-1.20-1.fc8.1.noarch.rpm
wget $BASEURL/perl-IO-Tty-1.07-2.fc8.1.x86_64.rpm
wget $BASEURL/scsi-target-utils-0.0-1.20070803snap.fc8.x86_64.rpm
wget $BASEURL/perl-Net-Telnet-3.03-5.1.noarch.rpm
createrepo .
cd /root/xcat2/xcat-dep/rh5/x86_64
./mklocalrepo.sh
cd /root/xcat2/core-snap
./mklocalrepo.sh
Or on SLES, do:
zypper install xCAT
source /etc/profile.d/xcat.sh
tabdump site
Then run:
10
Or on SLES, do:
zypper update '*xCAT*'
If you have a service node stateless image, don't forget to update the image with the new xCAT rpms
(see chapter 6.1, Build the Service Node Stateless Image):
umount /root/xcat2/fedora8
cd /root/xcat2
copycds Fedora-8-x86_64-DVD.iso
copycds Fedora-8-ppc-DVD.iso
The copycds commands will copy the contents of the DVDs to /install/fedora8/<arch>.
The service nodes need to communicate with the xCAT 2 database on the Management Node and run
xCAT commands to install the nodes. The service node will be installed with the xCAT code and
requires that the PostgreSQL Database be set up instead of the SQLite Default database. PostgreSQL
allows a remote client to be set up on the service node such that the service node can access
(read/write) the database on the Management Node.
If you do not plan on using service nodes, you can skip this chapter 3 and continue to use the
SQLite Default database.
To set up the postgresql database on the Management Node follow these steps.
11
This example assumes:
● 11.16.0.1: IP of management node (cluster-facing NIC)
● xcatdb: database name
● xcatadmin: database role (aka user)
● cluster: database password
● 11.16.1.230 & 11.16.2.230: service nodes (mgmt node facing NIC)
Substitute your addresses and desired userid , password and database name as appropriate.
The following rpms should be installed from the Fedora8 media on the Management Node (and service
node when installed). These are required for postgresql.
Lines should look like this (with your IP addresses substituted). This allows the service nodes to
access the DB.
local all all ident sameuser
# IPv4 local connections:
host all all 127.0.0.1/32 md5
host all all 11.16.0.1/32 md5
host all all 11.16.1.230/32 md5
host all all 11.16.2.230/32 md5
where 11.16.0.1 is the MN and 11.16.1.230 and 11.16.2.230 are service nodes.
10.vi postgresql.conf
set listen_addresses to '*':
listen_addresses = '*' # This allows remote access.
Note: Be sure to uncomment the line
12
13. Backup your data to migrate to the new database. (This is required even if you have not added
anything to your xCAT database yet. Required default entries were created when the xCAT RPMs
were installed on the management node which, and they must be migrated to the new postgresql
database.)
mkdir -p ~/xcat-dbback
dumpxCATdb -p ~/xcat-dbback
14. /etc/xcat/cfgloc should contain the following line, substituting your specific info. This points the
xCAT database access code to the new database.
Pg:dbname=xcatdb;host=11.16.0.1|xcatadmin|cluster
16. Restore your database to postgresql (bypass mode runs the command without xcatd):
XCATBYPASS=1 restorexCATdb -p ~/xcat-dbback
18. Run this command to get the correct management node name known by ssl:
openssl x509 -text -in /etc/xcat/cert/server-cert.pem -noout|grep Subject:
this will display something like:
Subject: CN=mgt.cluster
19. Update the policy table with mgt.cluster output from the command:
chtab priority=5 policy.name=<mgt.cluster> policy.rule=allow
Note: this name must be an MN name that is known by the service nodes.
20. Make sure the site table has at least the following settings (using tabdump, tabedit, chtab):
#key,value,comments,disable
"xcatiport","3002",,
"xcatdport","3001",,
"master","mn20",,
where mn20 is the hostname of the management node as known by the service nodes.
13
#priority,name,host,commands,noderange,parameters,time,rule,comments,disable
"1","root",,,,,,"allow",,
"2",,,"getbmcconfig",,,,"allow",,
"3",,,"nextdestiny",,,,"allow",,
"4",,,"getdestiny",,,,"allow",,
"5","mn20",,,,,,"allow",,
Note: For table attribute descriptions, run “tabdump -d <table name>”. Also, in some of the following
table commands, regular expressions are used so that a single row in the table can represent many
nodes. See http://xcat.sf.net/man5/xcatdb.5.html for a description of how to use regular expressions in
xCAT tables, and see http://www.perl.com/doc/manual/html/pod/perlre.html for an explanation of perl
regular expression syntax.
service:
objtype=group
# nodehm attributes (for hw control)
mgt=ipmi
cons=ipmi
serialport=0
serialspeed=19200
serialflow=hard
# ipmi attributes (the reg expression means remove "-m" and add "-bmc")
bmc="|^(.+)-m$|($1)-bmc|"
bmcpassword=PASSW0RD
bmcusername=USERID
# nodetype attributes (what OS image to use for deployment)
os=fedora8
arch=x86_64
profile=service
nodetype=osi
# noderes attributes (controls how deployment is done)
14
netboot=pxe
installnic=eth0
primarynic=eth0
# chain attributes (controls what happens when a new node is discovered)
chain="runcmd=bmcsetup,standby"
ondiscover=nodediscover
# servicenode attributes (what services get started/configured on the Sns)
# turn off any you don't need, just make sure your compute nodes don't refer
# to them in their noderes attributes
setupnameserver=1
setupdhcp=1
setuptftp=1
setupnfs=1
setupconserver=1
setupldap=1
setupntp=1
setupftp=1
# postscript attributes (customization scripts to run after deployment)
# configeth is a sample script to configure the 2nd ethernet NIC on the service
# node. It should be modified to fit your specific environment.
postscripts=configeth,servicenode,xcatserver,xcatclient
Then run:
cat service.attributes | chdef -z
You can also provide attribute values directly as command line arguments to chdef, if you are only
changing a few. To list the attributes of the service group, run:
lsdef -t group -l service
To add your own postscripts to further customize the service nodes, see 14 Appendix: Customizing
Your Nodes by Creating Your Own Postscripts.
15
If you are not using the NFS-hybrid method of stateless booting you compute nodes, set the installloc
attribute to “/install”. This instructs the service node to mount /install from the management node. (If
you don't do this, you have to manually sync /install between the management node and the service
nodes.)
chtab key=installloc site.value=/install
Disable the entry for the public network (connected to the outside world):
chtab net=9.114.88.160 networks.netname=public networks.disable=1
16
4.3 Set Up NTP
To enable the NTP services on the cluster, first configure NTP on the management node and start
ntpd.
Next set the ntpservers attribute in the site table. Whatever time servers are listed in this attribute will
be used by all the nodes that boot directly from the management node (i.e. service nodes and compute
nodes not being managed by a service node).
If your nodes have access to the internet you can use the global servers:
If the nodes do not have a connection to the internet (or you just want them to get their time from the
management node for another reason), you can use your Management Node as the NTP server.
To set up NTP on the nodes, add the setupntp postinstall script to the postscripts table. See section
5.10, Set Up Postscripts to be Run on the Nodes. Assuming you have a group named compute:
If using Service Nodes, ensure that the NTP server will be set up on the Service Nodes (see section
3.2.2, Set Attributes of the Service Nodes), and add the setupntp postscript to the service nodes:
17
192.168.100.11 blade2
192.168.100.12 blade3
172.30.101.133 amm3
Run:
makedns
Set up /etc/resolv.conf:
search cluster.net
nameserver 11.16.0.1
Start DNS:
For example, running these mkrrbc commands will create the following definitions in the nodelist
table. (These node groups will be used in additional xCAT Table setup so that an entry does not have
to be made for every management module or switch.)
/opt/xcat/share/xcat/tools/mkrrbc -C a -L 2 -R 1,4
/opt/xcat/share/xcat/tools/mkrrbc -C b -L 2 -R 1,4
"bca01","mm,cud,rack02",,,
"swa01","nortel,switch,cud,rack02",,,
After running mkrrbc, define the hardware control attributes for the management modules:
chtab node=mm nodehm.mgt=blade
chtab node=mm mp.mpa='|(.*)|($1)|'
Note: currently the network settings on the MM (both for the MM itself and for the switch module)
need to be set up with your own customized script. Eventually, this will be done by xCAT through
18
lsslp, finding it on the switch, looking in the switch table, and then setting it in the MM. But for now,
you must do it yourself.
TIP for SOL to work best telnet to nortel switch (default pw is “admin”) and type:
/cfg/port int1/gig/auto off
Do this for each port (I.e. int2, int3, etc.)
19
5.1 Set Up the nodelist Table
The nodelist table contains a node definition for each node in the cluster. For simple clusters, nodes
can be added to the nodelist table using nodeadd and a node range. For example:
nodeadd blade01-blade40 groups=all,blade
For more complicated clusters, in which you want subsets of nodes assigned to different groups, we
have provided a sample script to automate these definitions.
For example, running these mkrrnodes commands will define the following nodes with the assigned
groups in the nodelist table. (These node groups will be used in additional xCAT Table setup so that
an entry does not have to be made for every node.)
"rra001a","rra001,ls21,cua,opteron,compute,tb,all,rack01",,,
"rra001b","rra001,qs22,cua,cell,cell-b,compute,all,tb,rack01",,,
"rra001c","rra001,qs22,cua,cell,cell-c,compute,all,tb,rack01",,,
"rra002a","rra002,ls21,cua,opteron,compute,tb,all,rack01",,,
"rra002b","rra002,qs22,cua,cell,cell-b,compute,all,tb,rack01",,,
"rra002c","rra002,qs22,cua,cell,cell-c,compute,all,tb,rack01",,,
20
5.3 Set Up the mp Table
Specify (via regular expressions) the BladeCenter management module (mpa) that controls each blade
and the slot (id) that each blade is in. (For example, the regular expression in the 1st line below would
calculate for node rrd032a an mpa of bcd11 and an id of 5.)
makeconservercf
service conserver stop
service conserver start
Note: for each service you refer to here, you must ensure you have that service started on that service
node in section3.2.2, Set Attributes of the Service Nodes.
21
chtab node=opteron noderes.netboot=pxe noderes.xcatmaster=mn20 nodehm.serialport=1
noderes.installnic=eth0 noderes.primarynic=eth0 noderes.nfsserver=mn20
chtab node=cell noderes.netboot=yaboot noderes.xcatmaster=mn20
nodehm.serialport=0 noderes.installnic=eth0 noderes.primarynic=eth0
The following is an example of how you can set up the deps table to ensure the triblades boot up in the
proper sequence. The 1st row tells xCAT the opteron blades should not be powered on until the
corresponding cell blades are powered on. The 2nd row tells xCAT the cell blades should not be
powered off until the corresponding opteron blades are powered off.
22
nodels rra001a deps.nodedep
nodels rra001b deps.nodedep
To add your own postscripts to further customize the nodes, see 14 Appendix: Customizing Your
Nodes by Creating Your Own Postscripts.
getmacs tb
(“tb” is the group of all the blades.) To verify mac addresses in table:
tabdump mac
Configure DHCP:
makedhcp -a
23
Note: this section assumes you can build the stateless image on the management node because the
service nodes are the same OS and architecture as the management node. If this is not the case, you
need to build the image on a machine that matches the service node's OS/architecture.
1. Check the service node packaging to see if it has all the rpms required:
cd /opt/xcat/share/xcat/netboot/fedora/
vi service.pkglist service.exlist
Make sure service.pkglist has the following packages (these packages should all be there by
default).
bash
stunnel
dhclient
kernel
openssh-server
openssh-clients
busybox-anaconda
vim-minimal
rpm
bind
bind-utils
ksh
nfs-utils
dhcp
bzip2
rootfiles
vixie-cron
wget
vsftpd
rsync
Edit service.exlist and verify that nothing is excluded that you want on the service nodes.
While you are here, edit compute.pkglist and compute.exlist, adding and removing as necessary.
Ensure that the pkglist contains bind-utils so that name resolution will work during boot.
2. Run image generation:
rm -rf /install/netboot/fedora8/x86_64/service
cd /opt/xcat/share/xcat/netboot/fedora/
./genimage -i eth0 -n tg3,bnx2 -o fedora8 -p service
rm -f /install/netboot/fedora8/x86_64/service/rootimg/etc/yum.repos.d/*
cp -pf /etc/yum.repos.d/*.repo
/install/netboot/fedora8/x86_64/service/rootimg/etc/yum.repos.d
yum --installroot=/install/netboot/fedora8/x86_64/service/rootimg install
xCATsn
24
4. Prevent DHCP from starting up until xcatd has had a chance to configure it:
5. Edit fstab:
cd /install/netboot/fedora8/x86_64/service/rootimg/etc/
cp fstab fstab.ORIG
Put in fstab:
6. (Because we do not set site.installloc to anything, the service nodes will NOT mount /install. This
is what you want if the compute nodes are going to mount /install from the service nodes using the
NFS-hybrid mode. If you are going to use RAM-root mode for the compute nodes, you can set
site.installloc to “/install”. This will cause the service nodes to mount /install from the
management node, and then you won't have to manually sync /install to the service nodes.)
cd /install/netboot/fedora8/x86_64/service/rootimg/etc
echo '/install *(ro,no_root_squash,sync,fsid=13)' >exports
Note: The service nodes are set up as NFS-root servers for the compute nodes. Any time changes are
made to any compute image on the mgmt node it will be necessary to sync all changes to all service
nodes. After any service node reboot a sync must also be done. This is covered in chapter 11,
Using NFS Hybrid for the Diskless Images.
25
6.2 Set Up the Service Nodes for Installation
Note: If you are using stateless service nodes, skip this section.
To prepare for installing the service nodes, you must copy the xCAT software and necessary prereqs
into /install/postscripts, so it can be installed during node installation by the servicenode postscript.
mkdir -p /install/postscripts/xcat/RPMS/noarch
mkdir -p /install/postscripts/xcat/RPMS/x86_64
Then:
rpower service boot
26
wcons service # make sure DISPLAY is set to your X server/VNC or
rcons <one-node-at-a-time> # or do rcons for each node
tail -f /var/log/messages
Now that you have installed your LS21 blades, you don't need to follow chapter 9, Build and Boot the
LS21 and QS22 Stateless Images for your LS21 blades. (Although, if you have QS22 blades, you will
still need to follow that chapter to diskless boot them.)
Note: in these instructions, substitute your management node hostname for mn20.
NOTE: Edit kickstart file and make sure /boot has at least 200MB of space for kernel installs.
Pick a QS22 blade for the iSCSI install that can access the management node. Add it as a node (and its
management module, if necessary). In our example, the blade is called mvqs21b and the management
module of the chassis it is in is called bca2:
27
nodeadd mvqs21b groups=compute,iscsi
nodeadd bca2 groups=mm2
Make sure the root userid and password are in the iscsi table
chtab node=mvqs21b iscsi.userid=root iscsi.passwd=cluster iscsi.server=mn20
getmacs mvqs21b
If you want to just boot it to its already installed iSCSI disk (maybe to add a few packages):
nodech mvqs21b nodetype.profile=iscsi
nodeset mvqs21b iscsiboot
rpower mvqs21b boot
9.0 Build and Boot the LS21 and QS22 Stateless Images
You are now ready to build the stateless images and then boot nodes with them. In our example, we
have 2 types of compute nodes: qs22 (ppc64) blades and ls21 (x86_64) blades. The steps for each are
very similar, so we have combined them. Go through these instructions once for each type.
28
9.1 Build the Stateless Image
1. On the management node, check the compute node package list to see if it has all the rpms
required.
cd /opt/xcat/share/xcat/netboot/fedora/
vi compute.pkglist compute.exlist # for ppc64, edit compute.ppc64.pkglist
For example to add vi to be installed on the node, add the name of the vi rpm to compute.pkglist.
Make sure nothing is excluded in compute.exlist that you need. For example, if you require perl on
your nodes, remove ./usr/lib/perl5 from compute.exlist . Ensure that the pkglist contains bind-utils
so that name resolution will work during boot.
2. If the stateless image you are building doesn't match the OS/architecture of the management node,
logon to the node you installed in the previous chapter and do the following. (If you are building
your stateless image on the management node, skip this step.)
ssh mvqs21b
mkdir /install
mount mn20:/install /install
Create fedora.repo:
cd /etc/yum.repos.d
rm -f *.repo
Copy the executables and files needed from the Management Node:
mkdir /root/netboot
cd /root/netboot
scp mn20:/opt/xcat/share/xcat/netboot/fedora/genimage .
scp mn20:/opt/xcat/share/xcat/netboot/fedora/geninitrd .
scp mn20:/opt/xcat/share/xcat/netboot/fedora/compute.ppc64.pkglist .
scp mn20:/opt/xcat/share/xcat/netboot/fedora/compute.exlist .
29
If you are building the image on a sample, continue the steps above by running:
./genimage -i eth0 -n tg3 -o fedora8 -p compute
Even though we aren't done yet customizing the image, you can boot a node with the image, just for
fun:
30
9.3 To Update QS22 Stateless Image
If you need to update the image at any point with additional packages:
1. Set $ARCH:
export ARCH=x86_64 # or...
export ARCH=ppc64
export ROOTIMG=/install/netboot/fedora8/$ARCH/compute/rootimg
3. To update the image by running genimage, add packages to compute.ppc64.pkglist and rerun
genimage as described in the previous section.
31
strip -g aufs.ko
32
Note: If you have a need to unsquash the image:
cd /install/netboot/fedora8/x86_64/compute
rm -f rootimg.sfs
packimage -a x86_64 -o fedora8 -p compute -m cpio
Max for / is 100M, but only 220K being used (down from 225M). But wheres the OS?
Look at cached. 61M compress OS image. 3.5x smaller
As files change in hidden OS they get copied to tmpfs (compute_ppc64) with a copy on write. To
reclaim space reboot. The /tmp and /var/tmp is for MPI and other Torque and user related stuff. if
10M is too small you can fix it. To reclaim this space put in epilogue:
umount /tmp /var/tmp; mount -a
On Management Node:
cd /opt/xcat/share/xcat/netboot/fedora
cp compute.exlist compute.exlist.4k
echo "./lib/modules/2.6.23.1-42.fc8/*" >>compute.exlist
cd /tmp
wget
http://download.fedora.redhat.com/pub/fedora/linux/releases/8/Fedora/source/SRPM
S/kernel-2.6.23.1-42.fc8.src.rpm
scp kernel-2.6.23.1-42.fc8.src.rpm mvqs21b:/tmp
nodech mvqs21b nodetype.profile=iscsi
nodeset mvqs21b iscsiboot
rpower mvqs21b boot
33
ssh mvqs21b
mkdir /install
mount mgmt:/install /install
yum install rpm-build redhat-rpm-config ncurses ncurses-devel kernel-devel gcc
squashfs-tools
cd /tmp
rpm -Uivh kernel-2.6.23.1-42.fc8.src.rpm
rpmbuild -bp --target ppc64 /usr/src/redhat/SPECS/kernel.spec
cd /usr/src/redhat/BUILD/kernel-2.6.23
cp -r linux-2.6.23.ppc64 /usr/src/
cd /usr/src/kernels/$(uname -r)-$(uname -m)
find . -print | cpio -dump /usr/src/linux-2.6.23.ppc64/
cd /usr/src/linux-2.6.23.ppc64
make mrproper
cp configs/kernel-2.6.23.1-ppc64.config .config
make -j4
make modules_install
strip vmlinux
mv vmlinux /boot/vmlinuz-2.6.23.1-42.fc8-64k
cd /lib/modules/2.6.23.1-42.fc8-64k/kernel
find . -name "*.ko" -type f -exec strip -g {} \;
Rebuild aufs.so:
rm -rf aufs
tar jxvf aufs-2-6-2008.tar.bz2
cd aufs
mv include/linux/aufs_type.h fs/aufs/
cd fs/aufs/
patch -p1 < ../../../aufs-standalone.patch
chmod +x build.sh
./build.sh 2.6.23.1-42.fc8-64k
strip -g aufs.ko
cp aufs.ko /root
34
On sample blade:
cd /root
./genimage -i eth0 -n tg3 -o fedora8 -p compute -k 2.6.23.1-42.fc8-64k
On sample blade:
cd /root
./geninitrd -i eth0 -n tg3 -o fedora8 -p compute -k 2.6.23.1-42.fc8-64k
On Management Node:
rm -f /install/netboot/fedora8/ppc64/compute/rootimg.sfs
packimage -a ppc64 -o fedora8 -p compute -m cpio
nodech mvqs21b nodetype.profile=compute nodetype.os=fedora8
gnodeset mvqs21b netboot
rpower mvqs21b boot
On sample blade:
cd /root
./geninitrd -i eth0 -n tg3,squashfs,aufs,loop -o fedora8 -p compute -k
2.6.23.1-42.fc8-64k -l $(expr 100 \* 1024 \* 1024)
On Management Node:
rm -f /install/netboot/fedora8/ppc64/compute/rootimg.sfs
packimage -a ppc64 -o fedora8 -p compute -m squashfs #bug, must remove sfs first
nodech left nodetype.profile=compute nodetype.os=fedora8
nodeset left netboot
rpower left boot
35
10.3.1 Check memory
# ssh left "echo 3 > /proc/sys/vm/drop_caches;free -m;df -h"
total used free shared buffers cached
Mem: 4012 127 3885 0 0 65
-/+ buffers/cache: 61 3951
Swap: 0 0 0
Filesystem Size Used Avail Use% Mounted on
compute_ppc64 100M 1.7M 99M 2% /
none 10M 0 10M 0% /tmp
none 10M 0 10M 0% /var/tmp
On sample blade:
cd /root
./geninitrd -i eth0 -n tg3 -o fedora8 -p compute
OR
OR
1. Get stateless cpio or squashfs set up and test (see previous notes).
36
2. Patch kernel and build new aufs.ko:
Install stuff
yum install rpm-build redhat-rpm-config ncurses ncurses-devel kernel-devel gcc
squashfs-tools
To:
source "fs/nls/Kconfig"
source "fs/dlm/Kconfig"
source "fs/aufs/Kconfig"
37
make menuconfig
make -j4
make modules_install
make install
cd /lib/modules/2.6.23.1-42.fc8-aufs/kernel
find . -name "*.ko" -type f -exec strip -g {} \;
Whew!
cd /opt/xcat/share/xcat/netboot/fedora
rm -f aufs.ko
4. Boot NFS:
Create ifcfg-eth0:
cd /install/netboot/fedora8/x86_64/compute/rootimg/etc/sysconfig/networks-
scripts
Put in ifcfg-eth0:
ONBOOT=yes
BOOTPROTO=none
DEVICE=eth0
38
(This solves an intermittent problem where DHCP hoses IP long enough to hose NFS and then
nothing works. It's also one less DHCP and it boots faster.)
Note: for Fedora 9 only, there is a bug that appears to need the following work-around: in
/sbin/dhclient-script change "if [ x$keep_old_ip = xyes ]; then" to "if true; then". (This has been
submitted as a bug: https://bugzilla.redhat.com/show_bug.cgi?id=453982 .)
Append to fstab:
cd /install/netboot/fedora8/x86_64/compute/rootimg/etc
add this line:
sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw 0 0
To update image use yum/rpm/vi/chroot from the mgmt node for x86_64 or yum/rpm /vi/chroot from
the QS22 iSCSI image as if for a cpio or squashfs system.
To propagate the changes to all service nodes (if applicable) after rebooting the service nodes:
To propagate the changes to all service nodes (if applicable) after changing any of the images:
39
12.0 Install Torque
12.1 Set Up Torque Server
cd /tmp
wget http://www.clusterresources.com/downloads/torque/torque-2.3.0.tar.gz
tar zxvf torque-2.3.0.tar.gz
cd torque-2.3.0
CFLAGS=-D__TRR ./configure \
--prefix=/opt/torque \
--exec-prefix=/opt/torque/x86_64 \
--enable-docs \
--disable-gui \
--with-server-home=/var/spool/pbs \
--enable-syslog \
--with-scp \
--disable-rpp \
--disable-spool
make
make install
Create /etc/profile.d/torque.sh:
export PBS_DEFAULT=mn20
export PATH=/opt/torque/x86_64/bin:$PATH
chmod 755 /etc/profile.d/torque.sh
source /etc/profile.d/torque.sh
40
cp -f /opt/xcat/share/xcat/netboot/add-on/torque/pbs_server /etc/init.d/
chkconfig --del pbs
chkconfig --del pbs_mom
chkconfig --del pbs_sched
chkconfig --level 345 pbs_server on
service pbs_server start
41
account sufficient pam_ldap.so
account required pam_unix.so
with:
account required pam_access.so
account sufficient pam_ldap.so
account required pam_unix.so
42
Create /etc/profile.d/moab.sh:
export PATH=/opt/moab/bin:$PATH
to:
RMCFG[mn20] TYPE=pbs
Append to moab.cfg :
NODEAVAILABILITYPOLICY DEDICATED:SWAP
JOBNODEMATCHPOLICY EXACTNODE
NODEACCESSPOLICY SINGLEJOB
NODEMAXLOAD .5
JOBMAXSTARTTIME 00:05:00
DEFERTIME 0
JOBMAXOVERRUN 0
LOGDIR /var/spool/moab/log
LOGFILEMAXSIZE 10000000
LOGFILEROLLDEPTH 10
STATDIR /var/spool/moab/stats
On each node, 1st the scripts listed in the xcatdefaults row of the table will be run and then the scripts
for the group that this node belongs to. If the node is being installed, the postscripts will be run after
the packages are installed, but before the node is rebooted. If the node is being diskless booted, the
postscripts are run near the end of the boot process. Best practice is to write the script so that it can be
used in either environment.
43
When your postscript is executed on the node, several variables will be set in the environment, which
your script can use to control its actions:
● MASTER – the management node or service node that this node is booting from
● NODE – the hostname of this node
● OSVER, ARCH, PROFILE – this node's attributes from the nodetype table
● NODESETSTATE – the argument given to nodeset for this node
● NTYPE - “service” or “compute”
● all the site table attributes
Note that some compute node profiles exclude perl to keep the image as small as possible. If this is
your case, your postscripts should obviously be written in another shell language, e.g. bash.
44