IBM Certification Study Guide - RS600 SP
IBM Certification Study Guide - RS600 SP
RS/6000 SP
Marcelo R. Barrios, Bruno Blanchard, Kyung C. Lee
Olivia P. Liu, Ipong Hadi Trisna
http://www.redbooks.ibm.com
SG24-5348-00
SG24-5348-00
May 1999
Take Note!
Before using this information and the product it supports, be sure to read the general information in
Appendix B, “Special Notices” on page 467.
This edition applies to PSSP Version 3, Release 1 (5765-D51) and PSSP Version 2, Release 4
(5765-529) for use with the AIX Version 4, Release 3 Operating System.
When you send information to IBM, you grant IBM a non-exclusive right to use or distribute the
information in any way it believes appropriate without incurring any obligation to you.
Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xix
The Team That Wrote This Redbook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xx
Comments Welcome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi
Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Book Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 The Test Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
v
6.11.2 Managing the Kerberos Secondary Server Database . . . . . . . 189
6.12 SP Services That Utilize Kerberos . . . . . . . . . . . . . . . . . . . . . . . . . 190
6.12.1 Hardware Control Subsystem . . . . . . . . . . . . . . . . . . . . . . . . . 190
6.12.2 Remote Execution Commands . . . . . . . . . . . . . . . . . . . . . . . . 192
6.13 AFS as an SP Kerberos-Based Security System . . . . . . . . . . . . . . 199
6.13.1 Set Up to Use AFS Authentication Server . . . . . . . . . . . . . . . . 199
6.13.2 AFS Commands and Daemons . . . . . . . . . . . . . . . . . . . . . . . . 199
6.14 Sysctl Is an SP Kerberos-Based Security System. . . . . . . . . . . . . . 201
6.14.1 Sysctl Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
6.14.2 Sysctl Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
6.14.3 Terms and Files Related to the Sysctl Process . . . . . . . . . . . . 202
6.15 Related Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
6.16 Sample Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
vii
9.2.12 Set Up Nodes to Be Installed . . . . . . . . . . . . . . . . . . . . . . . . . 259
9.2.13 spchvgobj . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
9.2.14 spbootins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
9.2.15 Configure the CWS as Boot/Install Server . . . . . . . . . . . . . . . 261
9.2.16 Set the Switch Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
9.2.17 Verify the Switch Primary and Primary Backup Nodes . . . . . . 263
9.2.18 Set the Clock Source for All Switches . . . . . . . . . . . . . . . . . . . 263
9.2.19 Network Boot the Boot/Install Server Nodes . . . . . . . . . . . . . . 263
9.2.20 s1term . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
9.2.21 nodecond . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
9.2.22 Check the System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
9.2.23 Start the Switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
9.3 Key Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
9.3.1 /etc/bootptab.info . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
9.3.2 /tftpboot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
9.3.3 /usr/sys/inst.images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
9.3.4 /spdata/sys1/install/images . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
9.3.5 /spdata/sys1/install/<aix_level>/lppsource . . . . . . . . . . . . . . . . 273
9.3.6 /spdata/sys1/install/pssplpp/PSSP-x.x . . . . . . . . . . . . . . . . . . . 273
9.3.7 /spdata/sys1/install/pssp. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
9.3.8 image.data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
9.4 Related Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
9.5 Sample Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
ix
13.4.2 Configuration Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
13.5 Problem Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
13.5.1 Authorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
13.6 Event Perspectives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
13.6.1 Defining Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
13.7 Related Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
13.8 Sample Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
xi
16.10.2 System Files with IP Addresses and Host Names . . . . . . . . . 451
16.11 Related Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453
16.12 Sample Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
xv
127.Configuring Virtual Shared Disks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
128.Virtual Shared Disk States and Associated Commands . . . . . . . . . . . . . 321
129.RVSD Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
130.RVSD Subsystems and HAI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
131.Sample Node List File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
132.SMIT Panel for Configuring GPFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
133.Sample Output of /var/adm/ras/mmfs.log* . . . . . . . . . . . . . . . . . . . . . . . 330
134.SMIT Panel for Creating Disk Descriptor File . . . . . . . . . . . . . . . . . . . . . 332
135.SMIT Panel for Creating a GPFS FS . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
136.SMIT Panel for Mounting a File System . . . . . . . . . . . . . . . . . . . . . . . . . 337
137.EM Design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
138.EM Client and Peer Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
139.EMCDB Version Stored in the Syspar Class. . . . . . . . . . . . . . . . . . . . . . 352
140.User-defined Resource Variables - Warning Window Example . . . . . . . 357
141.Resource Variable Query (Partial View) . . . . . . . . . . . . . . . . . . . . . . . . . 360
142.Create Condition Option from Event Perspectives . . . . . . . . . . . . . . . . . 361
143.Create Condition Pane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
144.Defining Name and Description of a Condition . . . . . . . . . . . . . . . . . . . . 363
145.Selecting Resource Variable and Defining Expression . . . . . . . . . . . . . . 363
146.Conditions Pane - New Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
147.Mechanism of SP Node Backup in Boot/Install Server Environment . . . 371
148.Environment after Adding a Second Switched Frame and Nodes . . . . . 386
This redbook will not replace the practical experience you should have, but is
an effective tool, when combined with education activities and experience,
should prove to be a very useful preparation guide for the exam. Due to the
practical nature of the certification content, this publication can also be used
as a desk-side reference. So, whether you are planning to take the RS/6000
SP and PSSP exam, or if you just want to validate your RS/6000 SP skills,
this book is for you.
Ipong Hadi Trisna has been working for IBM Indonesia since 1995 as a
System Services Rep. for RS/6000 Division to support SP System and
HACMP. He holds an Engineer degree from Indonesia Institute of Technology
Jakarta, Indonesia majoring in Electronic and Computer Engineering. His
expertise area is hands on in the field for RS/6000 especially SP System and
HACMP.
We wish to thank the following people for their invaluable contributions to this
project:
Becky Gonzalez
IBM Austin
Comments Welcome
Your comments are important to us!
xxi
xxii IBM Certification Study Guide: RS/6000 SP
Chapter 1. Introduction
RS/6000 SP knowledge only is not sufficient to pass the exam. Basic AIX and
AIX admin skills are also required.
You are supposed to be fluent with all topics addressed in this redbook before
taking the exam. If you do not feel confident with your skills in one of these
topics, you should go to the referred documentation listed in each chapter.
In order to prepare you for both sections, we have included a section in each
chapter that lists the key concepts that should be understood before taking
the exam as well as a similar scenario where all the chapters in the redbook
refer to. This scenario is described in 1.2, “The Test Scenario” on page 2.
We will start with the first frame (Frame 1) and 11 nodes, and then we will add
a second frame (Frame 2) later on when we discuss about reconfiguration in
Part 3.
1 17
sp3n01en1 BIS sp3n17en1 BIS
en1 en1 192.168.32.117
192.168.31.11
sp3n01 sp3n17
en0 192.168.3.11 en0 192.168.3.117
SWITCH SWITCH
Frame 1 Frame 2
192.168.3.XX
sp3en0 192.168.3.130
The environment is fairly complex in the sense that we have defined two
Ethernet segments and a boot/install server (BIS) to better support our future
expansion to a second frame where we will add a third Ethernet segment and
an additional boot/install server for the second frame.
The boot/install servers were selected following the default options offered by
PSSP. The first node in each frame is designated as the boot/install server for
the rest of nodes in that frame.
Introduction 3
Therefore, there is no need skipping frame numbers for future expansion
frames.
This chapter discusses the hardware components of the RS/6000 SP, such as
node types, control workstation, frames, and switches. It also provides some
additional information on disk, memory, and software requirements.
2.2 Hardware
The basic components of the RS/6000 SP are:
• The frame with its integral power supply.
• Processor nodes.
• Optional dependent nodes that serve a specific function, such as
high-speed network connections.
• Optional SP Switch and Switch-8 to expand your system.
• Control workstation (a high-availability option is also available).
• Network connectivity adapters and peripheral devices, such as tape
and disk drives.
Internal
Node
Sw itch
Control
W orkstation
2.3 Frames
The building block of RS/6000 SP is the frame . There are two sizes: The tall
frame (75.8" high) and the short frame (49" high). RS/6000 SP internal nodes
are mounted in either a tall or short frame. A tall frame has eight drawers,
while a short frame has four drawers. Each drawer is further divided into two
slots. A thin node occupies one slot; a wide node occupies one drawer (two
slots), and a high node occupies two drawers (four slots). An internal power
supply is included with each frame. Frames get equipped with optional
processor nodes and switches.
Since the original RS/6000 SP product was made available in 1993, there
have been a number of model and frame configurations. The frame and the
first node in the frame were tied together forming a model. Each configuration
was based on the frame type and the kind of node installed in the first slot.
This led to an increasing number of possible prepackaged configurations as
more nodes became available.
The introduction of a new tall frame in 1998 is the first attempt to simplify the
way frames and the nodes inside are configured. This new frame replaces the
old frames. The most noticeable difference between the new and old frame is
the power supply size. Also, the new tall frame is shorter and deeper than the
old tall frame. With the new offering, IBM simplified the SP frame options by
telecopying the imbedded node from the frame offering. Therefore, when you
order a frame, all you receive is a frame with the power supply units and a
power cord. All nodes, switches, and other auxiliary equipment are ordered
separately.
All new designs are completely compatible with all valid SP configurations
using older equipment. Also, all new nodes can be installed in any existing
SP frame provided that the required power supply upgrades have been
implemented in that frame.
Only the short model frame can be equipped with a switch board. The short
expansion frame cannot hold a switch board, but nodes in the expansion
frame can share unused switch ports in the model frame.
Processor Node
LED and Control Breaker Processor Node
(Slot 8)
Processor Node
(Slot 7)
Switch Assembly
The base level SP Switch frame (feature code #2031) contains four ISBs. An
SP Switch frame with four ISBs will support up to 128 nodes. The base level
SP Switch frame can also be configured into systems with fewer than 65
nodes. In this environment, the SP switch frame will greatly simplify future
system growth. Figure 4 shows an SP Switch frame with eight ISBs.
A tall frame has four power supplies. In a fully populated frame, the frame can
operate with only three power supplies (N+1). Short frames come with two
power supplies and a third, optional one, for N+1 support.
Figure 5 on page 12 illustrates tall frame components from front and rear
views.
Processor Node 1
LED and Control Breaker
Processor Node
(Slot 8) Processor Node Processor Node
Processor Node
(Slot 8) (Slot 7)
(Slot 7)
RF Shunt
Assembly
Switch Assembly
Main Power
PCI Card
Switch
Frame Supervisor
Left Skirt Right Skirt Card
Right Skirt Left Skirt
Left Skirt
Supervisor Connector
Frame Supervisor
4 8
3
2
1
7
6
5
A
Green Orange
Orange LEDs
Green LEDs
DB25
DB25 Connector
Connector
for Y-Cable
for Y-Cable
RS-232 cable
To Control Workstation
There is a cable that connects from the frame supervisor card (position A) to
the switch supervisor card (position B) on the SP Switch or the SP-Switch-8
boards and to the node supervisor card (position C) of every node in the
frame. Therefore, the control workstation can manage and monitor frames,
switches, and all in-frame nodes.
Standard nodes can be classified as those that are inside the RS/6000 SP
frame and those that are not.
Since 1993, when IBM announced the RS/6000 SP, there have been 14
internal node types excluding some special on request node types. There are
five most current nodes: 160 MHz Thin P2SC node, 332 MHz SMP Thin
node, 332 MHz SMP Wide node, POWER3 SMP Thin node, and POWER3
SMP Wide node. Only the 160 MHz Thin P2SC node utilizes Micro Channel
Architecture (MCA) bus architecture while the others use PCI bus
architecture.
This node is the first PCI architecture bus node of the RS/6000 SP. Each
node has two or four PowerPC 604e processors running at a 332 MHz clock
The 332 MHz SMP Wide node is a 332 MHz SMP Thin node combined with
additional disk bays and PCI expansion slots. This wide node has four
internal disk bays with a maximum of 36.4 GB (mirror) and ten PCI I/O
expansion slots (three 64-bit, seven 32-bit). Both 332 MHz SMP Thin and
Wide nodes are based on the same technology as the RS/6000 model H50
and have been known as the Silver nodes. Figure 7 shows a 332 MHz SMP
node component diagram.
Fan 4
Fan 3
I/O Planar
This node is the first 64-bit internal processor node of the RS/6000 SP. Each
node has a one- or two-way (within two processor cards) configuration
utilizing a 64-bit 200 MHz POWER3 processor with a 4 MB Level 2 (L2) cache
per processor. The standard ECC SDRAM memory in each node is 256 MB
expandable up to 4 GB (within two card slots). This new node is shipped with
The POWER3 SMP Wide node is a POWER3 SMP Thin node combined with
additional disk bays and PCI expansion slots. This Wide node has four
internal disk bays for pairs of 4.5 GB, 9.1 GB, and 18.2 GB Ultra SCSI disk
capacity. Each node has ten PCI slots (two 32-bit, eight 64-bit). Both
POWER3 SMP Thin and Wide nodes are equivalent to the RS/6000 43P
model 260. A diagram of the POWER3 SMP node is shown in Figure 8.
Notice that it uses docking connectors (position A) instead of flex cables as in
the 332 MHz node.
The minimum software requirements for POWER3 SMP Thin and Wide nodes
are the AIX Version 4.3.2 and PSSP Version 3.1.
Node Type 160 MHz 332 MHz 332 MHz POWER3 POWER3
Thin SMP Thin SMP Wide SMP Thin SMP Wide
L1 Cache 32 KB/ 32 KB / 32 KB 32 KB / 64 KB
(Instr./Data) 128 KB
per processor
Max. Memory 1 GB 3 GB 4 GB
Memory Slots 4 2 2
Disk Bays 2 2 4 2 4
System Address
Bus 64-bit Memory- I/O
Controller
PCI Bridge
SP Switch
MX Adapter PCI bus PCI bus
The 332 MHz SMP node contains two- or four-way 332 MHz PowerPC 604e
processors each with its own 256 KB Level 2 cache. The X5 Level 2 cache
controller incorporates several technological advancements in design
providing greater performance over traditional cache designs. The cache
controller implements an eight-way, dual-directory, set-associative cache
using SDRAM. When instructions or data are stored in a cache, they are
grouped into sets of eight 64-byte lines. The X5 maintains an index to each of
the eight sets. It also keeps track of the tags used internally to identify each
cache line. Dual tag directories allow simultaneous processor requests and
system bus snoops, thus, reducing resource contention and speeding up
access.
System Bus
System Memory
I/O Subsystem
The POWER3 SMP node system structure is shown in Figure 10 on page 20.
6XX Data
Memory Bus
6XX Data Bus
6XX Address 128-bit @ 100 MHz
128-bit @ 100 MHz
32-bit @ 33 MHz
64-bit @ 33 MHz 64-bit @ 33 MHz
PCI Slots
PCI Slots PCI Slots
POWER3 Microprocessor
System Bus
The system bus, referred to as the 6XX bus, connects up to two POWER3
processors to the memory-I/O controller chip set. It provides 40 bits of real
address and a separate 128-bit data bus. The address, data, and tag buses
System Memory
I/O Subsystem
Service Processor
The service processor function is integrated on the I/O planner board. This
service processor performs system initialization, system error recovery, and
diagnostic functions that give the POWER3 SMP node a high level of
availability. The service processor is designed to save the state of the system
to 128 KB of nonvolatile memory (NVRAM) to support subsequent diagnostic
and recovery actions taken by other system firmware and the AIX operating
system.
Each I/O rack accommodates up to two I/O drawers (maximum four drawers
per system) with additional space for storage and communication
subsystems. The base I/O drawer contains:
When all four I/O drawers are installed, the S70 contains twelve media bays
(Eight media bays for S7A), forty-eight hot-swapped disk drive bays, and
fifty-six PCI slots per system.
Switch Cable
frame_to_frame
Frame 15m
Supervisor Control
Cable Connection
SAMI
S1TERM
Ethernet SP LAN
An SP Switch Router may have multiple logical dependent nodes, one for
each dependent node adapter it contains. If an SP Switch Router contains
more than one dependent node adapter, it can route data between SP
systems or system partitions. For an SP Switch Router, this card is called a
Switch Router Adapter (feature code #4021). Data transmission is
accomplished by linking the dependent node adapters in the switch router
with the logical dependent nodes located in different SP systems or system
partitions.
SP Switch HiPPI
Adapter Adapter
HiPPI
Adapter WAN
ATM
SP Switch OC-12c
Adapter ATM
SP OC-3c
ATM Switch
Switch
Processor
Nodes 8-port
Ethernet
SP Switch 10/100
Adapter 4-port
FDDI
SP Switch
Router
SP System
Although you can equip an SP node with a variety of network adapters and
use the node to make your network connections, the SP Switch Router with
the Switch Router Adapter and optional network media cards offers many
advantages when connecting the SP to external networks.
• Each media card contains its own IP routing engine with separate
memory containing a full route table of up to 150,000 routes. Direct
access provides much faster lookup times compared to software driven
lookups.
• Media cards route IP packets independently at rates of 60,000 to
130,000 IP packets per second. With independent routing available
from each media card, the SP Switch Router gives your SP system
excellent scalability characteristics.
• The SP Switch Router has a dynamic network configuration to bypass
failed network paths using standard IP protocols.
• Using multiple Switch Router Adapters in the same SP Switch Router,
you can provide high performance connections between system
partitions in a single SP system or between multiple SP systems.
Two versions of the RS/6000 SP Switch Router can be used with the SP
Switch. The Model 04S (GRF 400) offers four media card slots, and the
Model 16S (GRF 1600) offers sixteen media card slots. Except for the
additional traffic capacity of the Model 16S, both units offer similar
performance and network availability as shown in Figure 14.
For more detailed information, refer to IBM 9077 SP Switch Router: Get
Connected to the SP Switch, SG24-5157.
The control workstation also acts as a boot/install server for other servers in
the RS/6000 SP system. In addition, the control workstation can be set up as
an authentication server using Kerberos. It can be the Kerberos primary
server with the master database and administration service as well as the
ticket-granting service. As an alternative, the control workstation can be set
up as a Kerberos secondary server with a backup database to perform
ticket-granting service.
Note:
1. Requires a 7010 Model 150 X-Station and display. Other models and
manufacturers that meet or exceed this model can be used. An ASCII
terminal is required as the console.
Notes:
1. Supported by PSSP 2.2 and later
2. On systems introduced since PSSP 2.4, either the 8-port (feature code
#2493) or 128-port (feature code #2944) PCI bus asynchronous
adapter should be used for frame controller connections. IBM strongly
suggests you use the support processor option (feature code #1001). If
you use this option, the frames must be connected to a serial port on
an asynchronous adapter and not to the serial port on the control
workstation planar board.
Ethernet SP LAN
SPVG SDR
and
Sys Mgmt Data
The primary and backup control workstations are also connected on a private
point-to-point network and a serial TTY link or target mode SCSI. The backup
control workstation assumes the IP address, IP aliases, and hardware
address of the primary control workstation. This lets client applications run
without changes. The client application, however, must initiate reconnects
when a network connection fails.
Generally, you can have a boot/install server for every eight nodes. Also, you
may want to consider having a boot/install server for each version of AIX and
PSSP (although this is not required).
15 16
13 14
11 12
9 10
7 8
5 6
3 4
1 2
B oot/
Install
S erver
C o n tro l
W orksta tio n
All of these applications are able to take advantage of the sustained and
scalable performance provided by the SP Switch. The SP Switch provides the
message passing network that connects all of the processors together in a
way that allows them to send and receive messages simultaneously.
There are two networking topologies that can be used to connect parallel
machines: Direct and indirect.
Indirect networks, on the other hand, are constructed such that some
intermediate switch elements connect only to other switch elements.
Messages sent between processor nodes traverse one or more of these
intermediate switch elements to reach their destination. The advantages of
the SP Switch network are:
• Bisectional bandwidth scales linearly with the number of processor nodes
in the system.
Bisectional bandwidth is the most common measure of total bandwidth for
parallel machines. Consider all possible planes that divide a network into
two sets with an equal number of nodes in each. Consider the peak
bandwidth available for message traffic across each of these planes. The
bisectional bandwidth of the network is defined as the minimum of these
bandwidths.
SP Switch SP Switch
Chip Chip
SP Switch SP Switch
Chip Chip
to SP connections to
Nodes other SP Switch
SP Switch SP Switch Boards
Chip Chip
SP Switch SP Switch
Chip Chip
The first two elements here are driven by the transmitting element of the link,
while the last element is driven by the receiving element of the link.
The relationship between the SP Switch Chip Link and the SP Switch Chip
Port is shown in Figure 18 on page 39.
Token (1bit)
System Clock
Data (8 bits)
Input Port Output Port
Data Valid (1 bit)
Token (1bit)
Figure 18. Relationship Between Switch Chip Link and Switch Chip Port
Buffer
Input Port Input Port
Nodes based on RS/6000s that use the MCA bus obviously use the
MCA-based switch adapter (#4020). The same adapter is used in
uniprocessor thin, wide, and SMP high nodes.
New nodes based on PCI bus architecture (332 MHz SMP Thin and Wide
Nodes, the 200 MHz POWER3 SMP Thin and Wide Nodes) must use the
newer MX-based switch adapters (#4022 and #4023, respectively) since the
switch adapters are installed on the MX bus in the node. The so-called
mezzanine or MX bus allows the SP Switch adapter to be connected directly
onto the processor bus providing faster performance than adapters installed
on the I/O bus. The newer (POWER3) nodes use an improved adapter based
on a faster mezzanine (MX2) bus.
SP Node
SP Switch Adapter
SP Switch Port
Output Port
SP Switch Link
Input Port
MCA Bus
MX/MX2 Bus
PCI Bus
SP Switch Board
Node C
The 16 unused SP Switch ports on the right side of the node switch board are
used for creating larger networks. There are two ways to do this:
• For an SP system containing up to 80 nodes, these SP Switch ports
connect directly to the SP Switch ports on the right side of other node
switch boards.
• For an SP system containing more than 80 nodes, these SP Switch ports
connect to additional stages of switch boards. These additional SP Switch
boards are known as intermediate switch boards (ISBs).
Node D
Node C
8
Fra
2
me
me
Fra
8 8
Frame1
Frame3
5 6
5
5
Frame2 Frame4
6 5
Frame1
Fra
e2
m
m
e4
Fra
4
e5
Fr
am
am
Fr
e1
The addition of a sixth frame to this configuration would reduce the number of
direct connections between each pair of frames to below four. In this
hypothetical case, each frame would have three connections to four other
frames and four connections to the fifth frame for a total of 16 connections per
frame. This configuration, however, would result in increased latency and
reduced switch network bandwidth. Therefore, when more than 80 nodes are
required for a configuration, an (ISB) frame is used to provide 16 paths
between any pair of frames.
Fra
me e6
1 ram
F
Fra
me
2 me5
Fra
Frame3
Frame4
2.8.3.1 SP Switch
The operation of the SP Switch (feature code #4011) has been described in
the preceding discussion. When configured in an SP order, internal cables
are provided to support expansion to 16 nodes within a single frame. In
multi-switch configurations, switch-to-switch cables are provided to enable
the physical connectivity between separate SP switch boards. The required
SP switch adapter connects each SP node to the SP Switch board.
2.8.3.2 SP Switch-8
To meet some customer requirements, eight port switches provide a low cost
alternative to the full size 16-port switches. The 8-port SP Switch-8 (SPS-8,
feature code #4008) provides switch functions for an 8-node SP system. The
The SP Switch-8 has two active switch chip entry points. Therefore, the ability
to configure system partitions is restricted with this switch. With the maximum
eight nodes attached to the switch, there are two possible system
configurations:
• A single partition containing all eight nodes
• Two system partitions containing four nodes each
200 Mhz POWER3 SMP Thin or Wide #4023 SP Switch MX2 Adapter
The 332Mhz and 200Mhz SMP PCI-based nodes listed here have a unique
internal bus architecture that allows the SP Switch adapters installed in these
nodes to have increased performance compared with previous node types. A
conceptual diagram illustrating this internal bus architecture is shown in
Figure 28 on page 49.
Address
Data Mem ory
Bus
6xx
Bus
6xx-MX
Bus
These nodes implement the PowerPC MP System Bus (6xx bus). In addition,
the memory-I/O controller chip set includes an independent separately
clocked mezzanine bus (6xx-MX) to which 3 PCI bridge chips and the SP
Switch MX or MX2 Adapter are attached. The major difference between these
node types is the clocking rates for the internal buses. The SP Switch
Adapters in the these nodes plug directly into the MX bus - they do not use a
PCI slot. The PCI slots in these nodes are clocked at 33 Mhz. In contrast, the
MX bus is clocked at 50 Mhz in the 332 Mhz SMP nodes and at 60 Mhz in the
200 Mhz POWER3 SMP nodes. Thus, substantial improvements in the
performance of applications using the Switch can be achieved.
When you attach a disk subsystem to one node, it is not automatically visible
to all the other nodes. The SP provides a number of techniques and products
to allow access to a disk subsytem from other nodes.
The version of PSSP that will run on each type of node is shown in Table 3 on
page 51. The application the customer is using may require specific versions
of AIX. Not all the versions of AIX run on all the nodes; so, this too must be
considered when nodes are being chosen.
Table 3. Minimum Level of PSSP and AIX That Is Allowed on Each Node
The PSSP Product ordered for the SP System (9076) but entitled for use
across the entire SP Complex. PSSP V3.1 has been enhanced to allow
attachment of an S70 or S70 advanced server. Here, the feature is ordered
times the number of nodes.
A system partition can be no smaller than a switch chip and the nodes
attached to it, and those nodes would occupy some number of slots in the
frame. The location of the nodes in the frame and their connection to the
chips is a major consideration if you are planning on implementing system
partitioning.
For a single frame system with 16 slots, the possible systems partitioning the
number of slots per partition are:
One system partition: 16
Two system partitions: 12-4 or 8-8
Three system partitions: 4-4-8
Four system partitions: 4-4-4-4
Switch1
Frame 1
This section uses the following current node, frame, switch, and switch
adapter types to configure SP systems.
Nodes
• 160 MHz Thin node (feature code #2022).
• 332 MHz SMP Thin node (feature code #2050).
• 332 MHz SMP Wide node (feature code #2051).
• POWER3 SMP Thin node (feature code #2052).
• POWER3 SMP Wide node (feature code #2053).
Frames
• Short model frame (model 500)
• Tall model frame (model 550)
• Short expansion frame (feature code #1500)
• Tall expansion frame (feature code #1550)
• SP Switch frame (feature code #2031)
• RS/6000 server frame (feature code #9123)
Switches
• SP Switch-8 (8-port switch, feature code #4008)
• SP Switch (16-port switch, feature code #4011)
Switch Adapter
• SP Switch adapter (feature code #4020)
• SP Switch MX adapter (feature code #4022)
• SP Switch MX2 adapter (feature code #4023)
• SP System attachment adapter (feature code #8396)
Configuration Rule 1
The Tall frame and Short frames cannot be mixed within an SP system.
Configuration Rule 2
If there is a single PCI Thin node in a drawer, it must be installed in the
odd slot position (left side of the drawer).
Based on the configuration rule 1, the rest of this section is separated into
two major parts. The first part provides the configuration rule for using short
frames, and the second part provides the rules for using tall frames.
Configuration Rule 3
A short model frame must be completely full before a short expansion
frame can mount nodes. You are not allowed any imbedded empty
drawers.
The short model frame must be completely full before the short expansion
frame can mount nodes as shown in Figure 31
Thin Thin
Wide Node
High Node
Wide Node
Configuration Rule 4
A short frame supports only a single SP Switch-8 board.
SP Switch-8 SP Switch-8
SP Switch-8
High Node
PCI Thin
SP Switch-8
Configuration Rule 5
Tall frames support SP-Attached servers.
Wide Node
Wide Node
Wide Node
High Node
Wide Node PCI Thin
Wide Node
High Node High Node
Wide Node
SP Switch-8 SP Switch-8
In configuration (a), four Wide nodes and eight Thin nodes are mounted in a
tall model frame equipped with an SP Switch. There are four available switch
ports that you can use to attach SP-Attached servers or SP Switch routers.
Expansion frames are not supported in this configuration because there are
Thin nodes on the right side of the model frame.
Configuration Rule 6
If a model frame on switched expansion frame has Thin nodes on the right
side, it cannot support nonswitched expansion frames.
In configuration (b), six Wide nodes and two PCI Thin nodes are mounted in a
tall model frame equipped with an SP Switch. There also is a High node, two
Wide nodes, and four PCI Thin nodes mounted in a nonswitched expansion
frame. Note that all PCI Thin nodes on the model frame must be placed on
the left side to comply with configuration rule 6. All Thin nodes on a
expansion frame are also placed on the left side to comply with the switch
port numbering rule. There is one available switch port that you can use to
attach SP-Attached servers or SP Switch routers.
In configuration (c), there are eight Wide nodes mounted in a tall model frame
equipped with an SP Switch and four High nodes mounted in a nonswitched
expansion frame (frame 2). The second nonswitched expansion frame (frame
3) is housed in a High node, two Wide nodes, and one PCI Thin node. This
configuration occupies all 16 switch ports in the model fame. Note that Wide
nodes and PCI Thin nodes in frame 3 have to be placed on High node
locations.
Now you try to describe the configuration (d). If you want to add two
POWER3 Thin nodes, what would be the locations?
SP Switch SP Switch
(a) (b)
Wide Node
High Node
Wide Node PCI Thin
Wide Node
High Node
Wide Node Wide Node
Wide Node
High Node
Wide Node Wide Node
Wide Node
High Node High Node
Wide Node
SP Switch
(c)
High Node High Node
Wide Node
SP Switch
(d)
Figure 34. Example of Single SP-Switch Configurations
SP Switch SP Switch
SP Switch SP Switch
SP Switch Board
SP Switch Board
SP Switch Board
SP Switch Board
Figure 37 on page 65 shows slot numbering for tall frames and short frames.
Figure 37. Slot Numbering for Short Frames and Tall Frames
where slot_number is the lowest slot number occupied by the node. Each
type (size) of node occupies a consecutive sequence of slots. For each node,
there is an integer n such that a thin node occupies slot n, a wide node
occupies slots n, n+1, and a high node occupies n, n+1, n+2, n+3. For wide
and high nodes, n must be odd.
Frame 4 Node 49
Switch boards are numbered sequentially starting with 1 from the frame with
the lowest frame number to that with the highest frame number. Each full
switch board contains a range of 16 switch port numbers (also known as
switch node numbers) that can be assigned. These ranges are also in
sequential order with their switch board number. For example, switch board 1
contains switch port numbers 0 through 15.
Switch port numbers are used internally in PSSP software as a direct index
into the switch topology and to determine routes between switch nodes.
where switch_number is the number of the switch board to which the node is
connected, and switch_port_assigned is the number assigned to the port on
the switch board (0 to 15) to which the node is connected.
Figure 39 on page 68 shows the frame and switch configurations that are
supported and the switch port number assignments in each node. Let us
describe more details on each configuration.
In configuration 1, the switched frame has an SP Switch that uses all 16 of its
switch ports. Since all switch ports are used, the frame does not support
nonswitched expansion frames.
If the switched frame has only wide nodes, it could use, at most, eight switch
ports and, therefore, has eight switch ports to share with nonswitched
expansion frames. These expansion frames are allowed to be configured as
in configuration 2 or configuration 3.
Frame n
Configuration 1
14 15
12 13
10 11
8 9
6 7
4 5
2 3
0 1
SP Switch
14
12 13 15
10
8 9 11
6
4 5 7
2
0 1 3
SP Switch
Configuration 3
12 13 15 14
8 9 11 10
4 5 7 6
0 1 3 2
SP Switch
Configuration 4
Figure 40 shows sample switch port numbers for a system with a short frame
and an SP Switch-8.
4 5
2 3 7
1
6
0
SP Switch-8
SP Manuals
The book IBM RS/6000 SP: Planning Volume 1, Hardware and Physical
Environment, GA22-7280 is a helpful hardware reference. It is included here
to help you select nodes, frames, and other components needed and ensures
that you have the correct physical configuration and environment.
332 MHz Thin and Wide Node Service, GA22-7330. The redbook explains the
configuration of 332 MHz Thin and Wide nodes.
SP Redbooks
Inside the RS/6000 SP, SG24-5145 serves as an excellent reference for
understanding the various SP system configurations you could have.
You need to ensure that all of the addresses you assign are unique within
your site network and within any outside networks to which you are attached,
such as the Internet. Also, you need to plan how names and addresses will
be resolved on your systems (that is, using DNS name servers, NIS maps,
/etc/hosts files, or some other method).
[Entry Fields]
* HOSTNAME (symbolic name of your machine) [sp3n0]
To set the hostname in control workstation, issue the smit fast path:
smit hostname
You select the network adapter you want to configure and fill in the IP
address and netmask assigned for this adapter. Please be sure that you have
the correct combination of IP address and netmask. The netmask can be
defined based on the IP address class. A sample of smit mktcpip is shown in
Figure 42 on page 75.
[Entry Fields]
* HOSTNAME [sp3n0]
* Internet ADDRESS (dotted decimal) [192.168.3.130]
Network MASK (dotted decimal) [255.255.255.0]
* Network INTERFACE en0
NAMESERVER
Internet ADDRESS (dotted decimal) []
DOMAIN Name []
Default GATEWAY Address [9.12.0.1]
(dotted decimal or symbolic name)
Your CABLE Type bnc +
START Now no +
The en0 adapter (first Ethernet adapter) on the nodes needs to be configured
with an IP address and a name. This name is know as the Reliable
Hostname. The control workstation and several subsystems (such as
Kerberos) will use this Reliable Hostname and the en0 adapter for
communication.
Before configuring boot/install servers for other subnets, make sure the
control workstation has routes defined to reach each one of the additional
subnets.
RS/6000 SP Networking 75
To set up static routes, you may use smit or the command line. To add routes
using the command line, use the route command:
route add -net <ip_address_of_other_network><ip_address_of_gateway>
where:
For example:
[Entry Fields]
Destination TYPE net +
* DESTINATION Address [192.168.15.0]
(dotted decimal or symbolic name)
* Default GATEWAY Address [9.12.0.130]
(dotted decimal or symbolic name)
* METRIC (number of hops to destination gateway) [1] #
Network MASK (hexadecimal or dotted decimal) [255.255.255.0]
In this sample, there is only one name server defined with an address of
9.12.1.30. The system will query this domain name server for name
resolution. The default domain name to append to names that do not end with
a . (period) is msc.itso.ibm.com. The search entry when resolving a name is
msc.itso.ibm.com and itso.ibm.com.
3.2.5 NIS
NIS’ main purpose is to centralize administration of files, such as
/etc/passwd, within a network environment.
RS/6000 SP Networking 77
A NIS domain defines the boundary where file administration is carried out. In
a large network, it is possible to define several NIS domains to break the
machines up into smaller groups. This way, files meant to be shared among
five machines, for example, stay within a domain that includes the five
machines not all the machines on the network.
A NIS server is a machine that provides the system files to be read by other
machines on the network. There are two types of servers: Master and Slave.
Both keep a copy of the files to be shared over the network. A master server
is the machine where a file may be updated. A slave server only maintains a
copy of the files to be served. A slave server has three purposes:
1. To balance the load if the master server is busy.
2. To back up the master server.
3. To enable NIS requests if there are different networks in the NIS domain.
NIS client requests are not handled through routers; such requests go to a
local slave server. It is the NIS updates between a master and a slave
server that goes through a router.
A NIS client is a machine that has to access the files served by the NIS
servers.
There are four basic daemons that NIS uses: ypserv, ypbind, yppasswd, and
ypupdated. NIS was initially called yellow pages; hence, the prefix yp is used
for the daemons. They work in the following way:
• All machines within the NIS domain run the ypbind daemon. This daemon
directs the machine’s request for a file to the NIS servers. On clients and
slave servers, the ypbind daemon points the machines to the master
server. On the master server, its ypbind points back to itself.
• ypserv runs on both the master and the slave servers. It is this daemon
that responds to the request for file information by the clients.
• yppasswd and ypupdated run only on the master server. The yppasswd makes
it possible for users to change their login passwords anywhere on the
network. When NIS is configured, the /bin/passwd command is linked to
the /usr/bin/yppasswd command on the nodes. The yppasswd command
sends any password changes over the network to the yppasswd daemon on
the master server. The master server changes the appropriate files and
propagates this change to the slave servers using the ypupdated daemon.
Tip
By serving the /etc/hosts file, NIS has an added capability for handling
name resolution in a network. Please refer to the "NIS and NFS"
publication by O’Reilly and Associates for detailed information.
To configure NIS, there are four steps all of which can be done through SMIT.
For all four steps, first run smit nfs and select Network Information Service
(NIS) to access the NIS panels, then:
• Choose Change NIS Domain Name of this Host to define the NIS
Domain. Figure 44 on page 80 shows what this SMIT panel looks like. In
this example, SPDomain has been chosen as the NIS domain name.
RS/6000 SP Networking 79
Figure 44. SMIT Panel for Setting a NIS Domain Name
• On the machine that is to be the NIS master (for example, the control
workstation), select Configure/Modify NIS and then Configure this Host
as a NIS Master Server. Figure 45 on page 81 shows the SMIT panel. Fill
in the fields as required. Be sure to start the yppasswd and ypupdated
daemons. When the SMIT panel is executed, all four daemons: ypbind,
ypserv, yppasswd, and ypupdated are started on the master server. This
SMIT panel also updates the NIS entries in the local /etc/rc.nfs file.
• On the machines set aside to be slave servers, go to the NIS SMIT panels
and select Configure this Host as a NIS Slave Server. Figure 46 on
page 82 shows the SMIT panel for configuring a slave server. This step
starts the ypserv and ypbind daemons on the slave servers and updates
the NIS entries in the local /etc/rc.nfs file(s).
RS/6000 SP Networking 81
Figure 46. SMIT Panel for Configuring a Slave Server
• On each node that is to be a NIS client, go into the NIS panels and select
Configure this Host as a NIS Client. This step starts the ypbind daemon
and updates the NIS entries in the local /etc/rc.nfs file(s).
Once configured, when there are changes to any of the files served by NIS,
their corresponding maps on the master are rebuilt and either pushed to the
slave servers or pulled by the slave servers from the master server. These
are done through the SMIT panel or the command make. To access the SMIT
panel, select Manage NIS Maps within the NIS panel. Figure 48 on page 84
shows this SMIT panel.
RS/6000 SP Networking 83
Figure 48. SMIT Panel for Managing NIS Maps
Select Build/Rebuild Maps for this Master Server and then either have the
system rebuild all the maps with the option all or specify the maps that you
want to rebuild. After that, return back to the SMIT panel in Figure 48 on page
84 and either Transfer Maps to Slave Servers (from the master server) or
Retrieve Maps from Master Server to this Slave (from a slave server).
3.2.6 DNS
DNS (Domain Name Server) is the way that host names are organized on the
Internet using TCP/IP. Host names are used to look up or resolve the name
we know a system as and convert it to TCP/IP address. All of the movement
of data on a TCP/IP network is done using addresses, not host names; so,
DNS is used to make it easy for people to manage and work with the
computer network.
If your SP system has a site with many systems, you can use DNS to
delegate the responsibility for name systems to other people or sites. You can
also reduce your administration workload by only having to update one server
in case you want to change the address of the system.
In the same way as "/" is the root directory for UNIX, DNS has "." as the root
of the name space. Unlike UNIX, if you leave out the full stop or period at the
end of the DNS name, DNS will try various full or partial domain names for
you. One other difference is that, reading left to right, DNS goes from the
lowest level to the highest; whereas, the UNIX directory tree goes from the
highest to the lowest.
For example, the domain ibm.com is subdomain of the com domain. The
domain itso.ibm.com is subdomain of the ibm.com domain, and the .com
domain.
You can set up your SP system without DNS. This uses a file called /etc/hosts
on each system to define the mapping from names to TCP/IP addresses.
Because each system has to have copy of the /etc/hosts file, this becomes
difficult to maintain for even a small number of systems. Even though setting
up DNS is more difficult initially, the administrative workload for three or four
workstations may be easier than with /etc/hosts. Maintaining a network of 20
or 30 workstations becomes just as easy as for three or four workstations. It
is common for an SP system implementation to use DNS in lieu of /etc/hosts.
When you set up DNS, you do not have to match your physical network to
your DNS setup, but there are some good reasons why you should. Ideally,
the primary and secondary name servers should be the systems that have the
best connections to other domains and zones.
3.3.1 SP Ethernet
The SP requires an Ethernet connection between the control workstation and
all nodes, which is used for network installation of the nodes and for system
management. This section describes the setup of that administrative
Ethernet, which is often called the SP LAN.
RS/6000 SP Networking 85
3.3.1.1 Frame and Node Cabling
SP frames include coaxial Ethernet cabling for the SP LAN also known as
thin-wire Ethernet or 10BASE-2. All nodes in a frame can be connected to
that medium through the BNC connector of either their integrated 10 Mbps
Ethernet or a suitable 10 Mbps Ethernet adapter using T-connectors. Access
to the medium is shared among all connected stations and controlled by
Carrier Sense, Multiple Access/Collision Detect (CSMA/CD). 10BASE-2 only
supports half duplex (HDX). There is a hard limit of 30 stations on a single
10BASE-2 segment, and the total cable length must not exceed 185 meters.
However, it is not advisable to connect more than 16 to 24 nodes to a single
segment. Normally, there is one segment per frame, and one end of the
coaxial cable is terminated in the frame. Depending on the network topology,
the other end connects the frame to either the control workstation or to a
boot/install server in that segment and is terminated there. In the latter case,
the boot/install server and CWS are connected through an additional Ethernet
segment; so, the boot/install server needs two Ethernet adapters.
In order to use Twisted Pair in full duplex mode, there must be a native RJ-45
TP connector at the node (no transceiver), and an Ethernet switch, like the
IBM 8274, must be used. A repeater always works in half duplex mode and
will send all IP packets to all ports (such as in the 10BASE-2 LAN
environment). We, therefore, recommend to always use an Ethernet switch
with native UTP connections.
Ethernet/1
HDX
CWS
10Mbps
RS/6000 SP Networking 87
• Consequently, the whole SP LAN is a single broadcast domain as well as
a single collision domain .
• The CWS acts as boot/install server for all nodes.
• Performance is limited to one 10 Mbps HDX connection at a time.
• Only six to eight network installs of SP nodes from the CWS NIM server
can be performed simultaneously.
Ethernet/1 Ethernet/2
HDX HDX
CWS
10Mbps 10Mbps
routing
RS/6000 SP Networking 89
Ethernet/2
Ethernet/1
routing
HDX CWS
Media-Converter Router
BNC to TP
10Mbps
HDX
Ethernet/2
routing
Ethernet/1
100Mbps
FDX
BIS
fast Ethernet
HDX
BIS CWS
Inst-Ethernet 10Mbps
Even when a router is added, the solution presented in the following section
is normally preferable to a segmented network with boot/install servers both
from a performance and from a management/complexity viewpoint.
RS/6000 SP Networking 91
Switched 10BASE-2 Network
An emerging technology to overcome performance limitations in shared or
segmented Ethernet networks is Ethernet Switching, which is sometimes
called micro-segmentation. An SP example is shown in Figure 53.
Coll.Domain/2 10Mbps
HDX
Media-Converter
BNC to TP
Coll.Domain/1
Ethernet Switch
(one logical LAN)
100Mbps FDX
fast Ethernet
CWS
Considering only the network topology, the control workstation should be able
to install six to eight nodes in each Ethernet segment (port on the Ethernet
switch) simultaneously since each Ethernet segment is a separate collision
domain. Rather than the network bandwidth, the limiting factor most likely is
the ability of the CWS itself to serve a very large number of NIM clients
simultaneously, for example, answering UPD bootp requests or acting as the
NFS server for the mksysb images. To quickly install a large SP system, it
may, therefore, still be useful to set up boot/install server nodes, but the
network topology itself does not require boot/install servers. For an
installation of all nodes of a large SP system, we advocate the following.
1. Using the spbootins command, set up approximately as many boot/install
server nodes as can be simultaneously installed from the CWS.
2. Install the BIS nodes from the control workstation.
3. Install the non-BIS nodes from their respective BIS nodes. This provides
the desired scalability for the installation of a whole, large SP system.
4. Using the spbootins command, change the non-BIS nodes’ configuration
so that the CWS becomes their boot/install server. Do not forget to run
setup_server to make these changes effective.
5. Reinstall the original BIS nodes. This removes all previous NIM data from
them since no other node is configured to use them as boot/install server.
RS/6000 SP Networking 93
Note: ARP cache tuning
Be aware that for SP systems with very large networks (and/or routes to
many external networks), the default AIX settings for the ARP cache size
might not be adequate. The Address Resolution Protocol (ARP) is used to
translate IP addresses to Media Access Control (MAC) addresses and vice
versa. Insufficient APR cache settings can severely degrade your
network’s performance, in particular when many broadcast messages are
sent. Refer to /usr/lpp/ssp/README/ssp.css.README for more
information about ARP cache tuning.
In order to avoid problems with broadcast traffic, no more than 128 nodes
should be connected to a single switched Ethernet subnet. Larger systems
should be set up with a suitable number of switched subnets. To be able to
network boot and install from the CWS, each of these switched LANs must
have a dedicated connection to the control workstation. This can be
accomplished either through multiple uplinks between one Ethernet switch
and the CWS or through multiple switches that each have a single uplink to
the control workstation.
Ethernet Switch
(100Mbps)
RS/6000 SP Networking 95
100Mbps Switched 10Mbps Coll.Domain/1 10Mbps Coll.Domain/2
10Mbps HDX
fast Ethernet
100Mbps FXD
Ethernet Switch
(10/100Mbps Autosense)
Media-Converter CWS
BNC to TP
In this configuration, again an Ethernet Switch, such as the IBM 8274, is used
to provide a single LAN and connects to the control workstation at 100 Mbps
FDX. One frame has new nodes with 100 Mbps Ethernet. These nodes are
individually cabled by 100BASE-TX Twisted Pair to ports of the Ethernet
Switch and operate in full duplex mode as in the previous example. Two
frames with older nodes and 10BASE-2 cabling are connected to ports of the
same Ethernet Switch using media converters as in the configuration shown
in Figure 53 on page 92. Ideally, a switching module with autosensing ports is
used, which automatically detects the communication speed.
Ethernet, Fiber Distributed Data Interface (FDDI), and token-ring are also
configured by the SP. Other network adapters must be configured manually.
These connections can provide increased network performance in user file
serving and other network related functions. You need to assign all the
addresses and names associated with these additional networks.
If you are not enabling ARP on the switch, specify the switch network subnet
mask and the starting node’s IP address. After the first address is selected,
subsequent node addresses are based on the switch port number assigned.
Unlike all other network interfaces, which can have sets of nodes divided into
several different subnets, the switch IP network must be one contiguous
subnet that includes all the nodes in the system partition.
If you want to assign your switch IP addresses as you do your other adapters,
you must enable ARP for the css0 adapter. If you enable ARP for the css0
adapter, you can use whatever IP addresses you wish, and those IP
addresses do not have to be in the same subnet for the whole system.
RS/6000 SP Networking 97
• 192.168.3.65-96 to frame 2
1 subnet 2 17
S w it c h S w it c h
subnet 1
C o n tro l
W o r k s t a t io n
S w it c h S w it c h
49 33
36 35
37
Subnet 1 (5 c o n n e c te d )
Subnet 2 (2 9 c o n n e c te d )
subnet 4 Subnet 3 (1 6 c o n n e c te d ) subnet 3
Subnet 4 (1 6 c o n n e c te d )
For nodes that use Ethernet or Token-Ring as the routed network, CPU
utilization may not be a big problem. For nodes that use FDDI as the
customer routed network, a customer network running at or near maximum
For systems that use Ethernet or Token Ring routers, traffic can be routed
through the SP Ethernet. For FDDI networks, traffic should be routed across
the switch to the destination nodes. The amount of traffic coming in through
the FDDI network can be up to ten times the bandwidth that the SP Ethernet
can handle.
For bigger demands on routing and bandwidth, the SP Switch router can be a
real benefit. Refer to 2.5.1, “SP Switch Router” on page 26 for details.
SP Manuals
RS/6000 SP Planning Volume 2, Control Workstation and Software
Environment, GA22-7281. This book is essential to understand the planning
and requirements of SP system networking.
SP Redbooks
Inside the RS/6000 SP, SG24-5145. This book will help you to understand
how the RS/6000 SP is affected by the network.
Others
IBM Certification Study Guide: AIX V.4.3 System Support, SG24-5139. This
book helps to understand some part of the SP system that relates closely to
networking design.
RS/6000 SP Networking 99
TCP/IP, SNA, HACMP, and Multiple Systems, SG24-4653. This redbook
contains in-depth discussion on protocols and will help you to strengthen your
knowledge in this area.
This chapter provides an overview of internal and external I/O devices and
how they are supported in RS/6000 SP environments. It also covers a
discussion about file systems and their utilization in the RS/6000 SP.
After you choose a disk option, be sure to get enough disk drives to satisfy
the I/O requirements of your application taking into account if you are using
the Recoverable Virtual Shared Disk optional component of PSSP, mirroring,
or RAID 5 and whether I/O is random or sequential.
Disk Description
Storage
2100 The Versatile Storage Server (VSS) offers the ability to share disks with
up to 64 hosts through Ultra SCSI connections. The hosts can be
RS/6000, NT, AS/400, and other UNIX platforms. The VSS has a
protected storage capacity of up to 2 TB. It can be connected through
multiple Ultra SCSI busses (up to 16) for increased throughput and has
up to 6 GB of read cache. Internally, SSA disks are configured in RAID 5
arrays with fast write cache availability. The 7133 is an integral part of
VSS. Your existing 7133 SSA disks can be placed under control of the
VSS. They can remain in their current racks, or they can be placed in the
VSS enclosures.
Disks are configured into 6+P+S or 7+P RAID 5 arrays with at least one
hot spare per loop and typically one 7133 drawer per SSA loop. These
RAID 5 arrays are then divided into LUNs (logical units) with valid LUN
sizes of 0.5, 1, 2, 4, 8, 12, 16, 20, 24, 28, and 32 GB. Each LUN is an
hdisk in the RS/6000.
7131 The tower has five hot swappable slots for 4.5, or 9.1 GB disk drives for
a maximum 45.5 GB capacity. Two towers can provide a low cost
mirrored solution.
7133 If you require high performance, the 7133 Serial Storage Architecture
(SSA) Disk might be the subsystem for you. SSA provides better
interconnect performance than SCSI and offers hot pluggable drives,
cables, and redundant power supplies. RAID 5, including hot spares, is
support on some adapters, and loop cabling provides redundant data
paths to the disk. Two loops of up to 48 disks are supported on each
adapter. However, for best performance of randomly accessed drives,
you should have only 16 drives (one drawer or 7133) in a loop.
7137 The 7137 subsystem supports both RAID 0 and RAID 5 modes. It can
hold from 4 to 33GB of data (29GB maximum in RAID 5 mode). The 7137
is the low end model of RAID support. Connection is through SCSI
adapters. If performance is not critical, but reliability and low cost are
important, this is a good choice
In summary, to determine what configuration best suits your needs, you must
be prepared with the following information:
• The amount of storage space you need for your data.
• A protection strategy (mirroring, RAID 5), if any.
• The I/O rate you require for storage performance.
• Any other requirements, such as multi-hosts connections, or if you plan to
use the Recoverable Virtual Shared Disk component of PSSP, which
needs twin-tailed disks.
You can find up-to-date information about the available storage subsystems
on the Internet at: http://www.storage.ibm.com
Node
Disk Arrays
Node
Node
Node
Node
Node
Node
Tape Libraries
Control
Workstation
Note
All applications using the Node Object remain unchanged with the
exception of some SP installation code.
4.3.2.1 spmkvgobj
All information needed by NIM, such as lppsource, physical disk, server,
mksysb, and so forth, is now moved from Boot/Install server Information to
a new panel accessible by the fast path createvg_dialog as shown in
Figure 58 on page 110.
[Entry Fields]
Start Frame [] #
Start Slot [] #
Node Count [] #
OR
Important
The ssar identifier must have a length of 21 characters.
Installation on external SSA disks is supported in PSSP 3.1 or later.
[Entry Fields]
Start Frame [] #
Start Slot [] #
Node Count [] #
OR
Note
To verify the content of the Volume_Group class of node 1, you can issue
the following SDR command:
SDRGetObjects Volume_Group node_number==1 vg_name pv_list copies
4.3.2.3 sprmvgobj
To be able to manage the Volume_Group class, a third command to
remove a Volume_Group object that is not the current one has been
added: sprmvgobj
[Entry Fields]
Start Frame [] #
Start Slot [] #
Node Count [] #
OR
The following is an example built by the SMIT panel used in Figure 60:
/usr/lpp/ssp/bin/sprmvgobj -l ’1’ -r ’rootvg2’
Note
-u, -g, and -a flags were dropped because PSSP 3.1 no longer
supports /usr servers.
Figure 61 shows the new SMIT panel to issue spbootins (the fastpath is
server_dialog ).
[Entry Fields]
Start Frame [] #
Start Slot [] #
Node Count [] #
OR
4.3.2.5 spmirrorvg
This command enables mirroring on a set of nodes given by the option
-l node_list
You can force (or not force) the extension of the Volume Group by using
the -f option (available values are: yes or no).
This command takes the Volume Group information from the SDR updated
by the last spchvgobj and spbootins commands.
Note:
You can add a new physical volume to the node rootvg by using the
spmirrorvg command; the following steps give the details:
• Add a physical disk to the actual rootvg in the SDR by using
spchvgobj without changing the number of copies.
• Run spmirrovg
Figure 62 on page 115 shows the new SMIT panel to issue spmirrorvg (the
fastpath is start_mirroring).
[Entry Fields]
Start Frame [] #
Start Slot [] #
Node Count [] #
OR
Note
This command uses the dsh command to run the AIX-related commands
on the nodes.
4.3.2.6 spunmirrorvg
This command disables mirroring on a set of nodes given by the option:
-l node_list
Figure 63 shows the new SMIT panel to issue spunmirrorvg (the fastpath is
stop_mirroring ).
[Entry Fields]
Start Frame [] #
Start Slot [] #
Node Count [] #
OR
Node List []
The following is the example built by the SMIT panel in Figure 63:
/usr/lpp/ssp/bin/spunmirrorvg -l ’1’’
Note
This command uses the dsh command to run the AIX related commands
on the nodes.
hdisk0
1 rootvg2 0 true 1 PSSP-3.1 aix432
ssar//0004AC50532100D:ssar//0004AC50616A00D
1 jmbvg 0 true 1 PSSP-3.1 aix432
ssar//0004AC5150BA00D
4.3.2.8 spbootlist
spbootlist sets the bootlist on a set of nodes by using the option:
-l node_list
This command takes the Volume Group information from the SDR updated
by the last spchvgobj and spbootins commands.
Section 4.3, “Multiple rootvg Support” on page 107 gives information on
how to use this new command.
The related SMIT panel or commands are given in Figure 58 on page 110 and
Figure 61 on page 114.
The following example gives the steps to follow to activate a new rootvg on
node 1 (hostname is node01). We assume two Volume Groups (rootvg1, and
rootvg2) have already been installed on the node. rootvg1 is the active
rootvg.
1. Change the node boot information:
spbootins -l 1 -c rootvg2 -s no
2. Note: it is not necessary to run setup_server.
3. Verify:
splstdata -b
4. Change the node bootlist:
spbootlist -l 1
5. Verify:
dsh -w node01 ’bootlist -m normal -o’
6. Reboot the node:
dsh -w node01 ’shutdown -Fr’
Important
The key switch must be in the normal position.
[Entry Fields]
Start Frame [] #
Start Slot [] #
Node Count [] #
OR
Node List []
7133 IBM Serial Storage Architecture Disk Subsystems Models 010, 020,
500, and 600.
Figure 68 on page 123 shows the Change Volume Group Information window.
In this, the user is specifying an external SSA disk as the destination for
rootvg on node1. Note that you may specify several disks in the Physical
Volume List field (refer to 4.3.2.1, “spmkvgobj” on page 109 for more
information on how to enter the information).
OR
Figure 68. SMIT Panel to Specify an External Disk for SP Node Installation
When you press the Enter key in the Change Volume Group Information
window, the external disk information is entered in the Node class in the SDR.
This can be verified by running the splstdata -b command as shown in
Figure 69 on page 124. This shows that the install disk for node 1 has been
changed to ssar//0004AC50532100D.
Under the covers, smitty changevg_dialog runs the spchvgobj command. This
is a new command in PSSP 3.1 that recognizes the new external disk
address formats. It may be run directly from the command line using this
syntax:
spchvgobj -r rootvg -h ssar//0004AC50532100D -l 1
The setup_server command will cause the network install manager (NIM)
wrappers to build a new bosinst.data resource for the node, which will be
used by AIX to install the node.
The format of bosinst.data has been changed in AIX 4.3.2 to include a new
member to the target_disk stanza specified as CONNECTION=. This is
shown in Figure 70 on page 125 for node 1’s bosinst.data file (node 1 was
used as an example node in Figure 68 on page 123 and Figure 70 on page
125). NIM puts in the new CONNECTION= member when it builds the file.
target_disk_data:
LOCATION =
SIZE_MB =
CONNECTION = ssar//0004AC50532100D
locale:
BOSINST_LANG = en_US
CULTURAL_CONVENTION = en_US
MESSAGES = en_US
KEYBOARD = en_US
One important motivation to use global file systems is to give users the
impression of a single system image by providing their home directories on all
the machines they can access. Another is to share common application
software that then needs to be installed and maintained in only one place.
Global file systems can also be used to provide a large scratch file system to
many machines, which normally utilizes available disk capacity better than
distributing the same disks to the client machines and using them for local
In NFS, file systems residing on the NFS server are made available through
an export operation either automatically when the NFS start-up scripts
process the entries in the /etc/exports file or explicitly by invoking the exportfs
command. They can be mounted by the NFS clients in three different ways. A
predefined mount is specified by stanzas in the /etc/filesystems file, an
explicit mount can be performed by manually invoking the mount command,
and automatic mounts are controlled by the automount command, which
mounts and unmounts file systems based on their access frequency. This
relationship is sketched in Figure 71 on page 127.
mount
/export/tina -ro,access=client
/home/joe:
dev = /export/joe
nodename = nfs_srv
mount = true
vfs = nfs
exportfs
rpc.mountd
mount \
nfs_srv:/export/tmp \ /etc/xtab
/home/tmp
automountd
automount
/etc/auto.master
/home /etc/auto/maps/home.maps
exportfs -i /export/tmp
/etc/auto/maps/home.maps
tina nfs_srv:/export/tina
client nfs_srv
The PSSP software uses NFS for network installation of the SP nodes. The
control workstation and boot/install servers act as NFS servers to make
resources for network installation available to the nodes, which perform
explicit mounts during installation. The SP accounting system also uses
explicit NFS mounts to consolidate accounting information.
NFS is often used operationally to provide global file system services to users
and applications. Among the reasons for using NFS is the fact that it is part of
base AIX, it is well-known in the UNIX community, very flexible, and relatively
easy to configure and administer in small to medium-sized environments.
However, NFS also has a number of problems. They are summarized below
to provide a basis to compare NFS to other global file systems.
Performance: NFS Version 3 contains several improvements over NFS
Version 2. The most important change probably is that
NFS Version 3 no longer limits the buffer size to 8 kB
improving its performance over high bandwidth networks.
Other optimizations include the handling of file attributes
and directory lookups and increased write throughput by
For reasons that will be discussed later, we recommend to use DFS rather
than AFS except when an SP is to be integrated into an existing AFS cell.
We, therefore, limit the following high-level description to DFS. Most of these
general features also apply for AFS, which has a very similar functionality.
After a general description of DFS, we point out some of the differences
between DFS and AFS that justify our preference of DFS.
The client component of DFS is the cache manager. It uses a local disk cache
or memory cache to provide fast access to frequently used file and directory
data. To locate the server that holds a particular fileset, DFS uses the fileset
location database (FLDB) server. The FLDB server transparently accesses
The primary server component is the file exporter. The file exporter receives
data requests as DCE Remote Procedure Calls (RPCs) from the cache
manager and processes them by accessing the local file systems in which the
data is stored. DFS includes its own Local File System (LFS ) but can also
export other Unix file systems (although with reduced functionality). It
includes a token manager to synchronize concurrent access. If a DFS client
wants to perform an operation on a DFS file or directory, it has to acquire a
token from the server. The server revokes existing tokens from other clients
to avoid conflicting operations. By this, DFS is able to provide POSIX single
site semantics.
local Token
cache Manager
aggr3
fxd
aggr2
Cache File
Manager Exporter aggr1
dfsd
dfsbind
Fileset Location
Server
FLDB
CDS Security flserver
Server Server Fileset Location DB Machine
In summary, many of the problems related to NFS do not exist in DFS or have
a much weaker impact. DFS is, therefore, more suitable for use in a large
production environment. On the other hand, DCE administration is not easy
and requires a lot of training. The necessary DCE and DFS licenses also
cause extra cost.
It is obvious that DFS is well integrated with the other DCE core services;
whereas, AFS requires more configuration and administration work. DFS also
provides file system semantics that are superior to AFS. So, unless an
existing AFS cell is expanded, we recommend that DFS is used rather than
AFS to provide global file services.
SP Manuals
RS/6000 SP Planning Volume 2, Control Workstation and Software
Environment, GA22-7281. This manual gives you detailed explanations on
I/O devices.
SP Redbooks
Inside The RS/6000 SP, SG24-5145. NFS and AFS concepts are discussed in
this redbook.
PSSP 3.1 provides support for the RS/6000 Enterprise Server Models S70
and S7A known as SP-attached servers. These are high-end RS/6000
PCI-based and are the first 64-bit SMP architecture nodes that attach
independently to the SP as they are simply too large to physically reside in an
SP frame.
The main section in this chapter is subdivided into the following five sections:
1. The system attachment of the SP-attached server to the SP is discussed
in “Hardware Attachment” on page 135.
2. Installation and configuration of an SP-attached server are discussed in
“Installation and Configuration” on page 146.
3. The PSSP support to SP-attached server is discussed in “PSSP Support”
on page 152.
4. User interface panels and commands are discussed in “User Interfaces”
on page 161.
5. Different attachment scenarios to the SP are discussed in “Attachment
Scenarios” on page 166.
Until now, all nodes in an SP environment resided within the slot location of
an SP frame. However, the SP-attached server is physically too large to
reside in an SP frame slot location as it is packaged in two side-by-side rack
units as shown in Figure 73 on page 137.
The first unit is a 22w x 41d x 62h-inch (56w x 104d x 157h-cm) Central
Electronics Complex (CEC). The CEC system rack contains:
• A minimum of one processor card and a maximum of three processor
cards with a 4-, 8-, or 12-way PowerPC processor configuration. The
system can contain up to a maximum of 12 processors sharing common
system memory.
• Each processor card has four 64-bit processors operating at 125 Mhz or
262 Mhz.
• A 4 MB ECC L2 cache memory per 125 Mhz processor and an 8 MB per
262 Mhz processor.
• System memory is controlled through a multiport controller that supports
up to 20 memory slots. All the system memory is contained in the system
rack up to a maximum of 16 GB.
• An operator panel that consists of the display unit, scroll up and down
push-button, an Enter button, and two indicator LEDs. The power on/off
button is also located on the operator panel. In addition, it contains a port
that can be used through an RS-232 cable to communicate to the S70.
The operator panel is used for selecting boot options and initiating system
dumps as well as for service functions and diagnostic support of the entire
system.
• Reliability from redundant fans, hot-swappable disk drives, power supplies
and fans, and a built-in service processor.
The second unit is a standard I/O rack similar in size to the CEC. Each I/O
rack accommodates up to two I/O drawers with a maximum of four drawers
per system. Up to three more I/O racks can be added to a system. The base
I/O drawer contains:
• Up to 14 PCI slots per drawer.
• Drawer zero reserves slots two and eight for support of system media.
• Service processor and hot-pluggable DASD.
Since the CEC and I/O racks are so large, the SP-attached server must be
attached to the SP system externally.
SP Frame S70
Switch Cable
Switch
frame_to_ frame Slot 1
Frame
Frame 15m
Supervisor Control
Cable Connections
S1term
SAMI
CWS
The following diagram outlines the two RS-232 connections to the S70
machine.
Control Panel
Service
Processor
S S P K M S S S S
L L L L L L
a Srl Srl
O ... O O O O ... O
r 1 2
T T 1 2 T T T T
1 Op Panel 10 11
8 9 14
RIO 0 RIO 0
SAMI/MI interface
It is important to note that the size of the S70 prohibits it from being physically
mounted in the SP frame. Since the SP-attached server is mounted in its own
rack and is directly attached to the CWS using RS-232, the SP system must
view the SP-attached server as a frame. The SP-attached server is also
viewed as a node; because the PSSP code runs on the machine, it is
managed by the CWS, and you can run standard applications on the
SP-attached server. Therefore, the SP system views the SP-attached server
as an object with both frame and node characteristics.
CWS
Three CWS connections to the SP-attached server are required for hardware
control and software management:
• An Ethernet connection to the SP-LAN for system administration
purposes.
• A custom-built RS-232 cable connected from the S70 operator panel to a
serial port on the CWS. It is used to emulate operator input at the operator
CWS Considerations
In connecting the SP-attached server to the CWS, it is important to keep the
following CWS areas of concern in mind:
• When connecting the SP-attached frame to the system, you need to make
sure that the CWS has enough spare serial ports to support the additional
connections. However, it is important to note that there is one restriction
with the 16-port RS-232 connection. By design, it does not pass the
required ClearToSend signal to the SAMI port of the SP-attached server,
and, therefore, the 16-port RS-232 cannot be used for the RS-232
connectivity to the SP-attached server. The eight-port and the 128-port
varieties will support the required signal for connectivity to the
SP-attached server.
• There are two RS-232 attachments for each S70/S7A SP attachment. The
first serial port on the S70/S7A must be used for S1TERM connectivity.
• Floor placement planning to account for the effective usable length of
RS-232 cable.
The CWS-to-S70 connection cables are 15 meters in length, but only 11.5
meters is effective. So, the S70 must be placed at a distance where the
RS-232 cable to the CWS is usable.
• In a HACWS environment, there will be no S70 control from the backup
CWS. In the case where a failover occurs to the backup CWS, hardmon
and s1term support of the S70 is not available until fail back to the primary
CWS. The node will still be operational with switch communications and
SP Ethernet support.
Switch Considerations
In connecting the SP-attached server to the SP Switch, it is important to note
the following:
• The High Performance switch (HiPS) cannot be used with an SP-attached
server since this switch is not supported in PSSP 3.1.
• The S70/S7A servers will be the first, and currently the only, nodes
attached to the switch using an RS/6000 SP Attachment adapter.
PCI-SLOT USAGE
3 Used by System in First
Drawer Drawer
Backside Leaving 3 64-bit Slots
and 5 32-bit Slots
Quiet Touch
S S S S S S S S P K M S S S S S S
L L L L L L L L a Srl Srl L L L L L L
- Used 1 2
O O O O O O O O r O O O O O O
1 2
- 32 bit T T T T T T T T T T T T T T
- 64 bit Op Panel
1 2 3 4 5 6 7 8 9 10 11 12 13 14
RIO 0 RIO 0
6.5m
**not to scale
Note
Each SP-attached server S70 must have a PSSP 3.1 licence separately
chargeable against each S70’s serial number.
[Entry Fields]
* Start Frame [ ] #
* Frame Count [ ] #
* Starting Frame tty port [/dev/tty0]
* Starting Switch Port Number [ ] #
s1 tty port [ ]
* Frame Hardware Protocol [SAMI]
Re-initialize the System Data Repository no +
frame_number tty frame_type MAC b_MACN slots f_in_config snn_index switch_config hardware_protocol s1_tty
• Node Class
The SDR Node class contains node-specific information used throughout
PSSP. Similarly, there will be an SDR Node object associated with the
SP-attached server.
SP frame nodes are assigned a node_number based on the algorithm
described in section 5.2.2, “SP-Attached Server Attachment” on page 137.
Likewise, the same algorithm is used to compute the node number of a
SP-attached server frame node where the SP-attached server occupies
the first and only slot of its frame. This means that for every SP-attached
server frame node, 16 node numbers will be reserved of which only the
first one will ever be used.
The node number is the key value used to access a node object.
Some entries of the Node Class Example are outlined in Figure 81 on
page 154.
Figure 81. Entries of the Node Class for SP Nodes and SP-Attached Server
NodeControl Class
Type Capabilities Slots_used Platform_type Processor_type
65 Power,reset,tty,KeySwitch,LED,NetworkBoot 1 rs6k UP
33 Power,reset,tty,KeySwitch,LED,NetworkBoot 1 rs6k UP
10 Power,tty,LCD,NetworkBoot 1 chrp MP
Figure 83. Example of the NodeControl Class with the SP-Attached Server
The key link between the Node class and the NodeControl class is the
node type, which is a new attribute stored in the SDR Node object. The
NodeControl Class
Type Capabilities Slots_used Platform_type Processor_type
65 Power,reset,tty,KeySwitch,LED,NetworkBoot 1 rs6k UP
33 Power,reset,tty,KeySwitch,LED,NetworkBoot 1 rs6k UP
10 Power,tty,LCD,NetworkBoot 1 chrp MP
5.4.2 Hardmon
Hardmon is a daemon that is started by the System Resource Controller
(SRC) subsystem that runs on the CWS. It is used to control and monitor the
SP hardware (Frame, Switch, and Nodes) by opening a tty that
communicates using an internal protocol to the SP Frame Supervisor card
through a serial RS-232 connection between the CWS and SP Frame.
With PSSP 3.1, two new fields have been added to the SDR’s frame class:
hardware_protocol and s1_tty. They enable hardmon to determine the new
hardware that is externally attached to the SP and also what software
protocol must be used to communicate to this hardware.
Currently, the only two supported values for the hardware_protocol field are
SP and SAMI. However, these values are extensible for new hardware
protocol drivers that will emerge as more externally connected hardware is
supported.
Upon initialization, hardmon reads its entries in the SDR Frame class and
also examines the value of the hardware_protocol field to determine the type
of hardware and its capabilities. If the value read is SP, this indicates that SP
nodes are connected to hardmon through the SP’s Supervisor subsystem. A
value of SAMI is specific to the S70/S7A hardware since it is the SAMI
software protocol that allows the communication, both sending messages and
receiving packet data to the S70/S7A’s Service Processor.
serial link
RS-232
CWS
hardware
control
H commands
Frame Class
A
hardware_protocol R
S1_tty packets
tty
D
of state data
M
O hardware
control
N S70d commands
packets of
hardware_protocol = SAMI
state data
S1_tty = /dev/tty1
tty=/dev/tty2 sockets S70
hardware_protocol = sp SAMI
S1_tty = " " SI term
tty
tty=/dev/tty0 s1_tty
(RS-232)
(RS-232)
Slot 1
It is important to note that only hardmon starts the S70 daemon and no other
invocation external to hardmon is possible. In addition, the parent hardmon
daemon starts a separate S70 daemon for each S70 frame configured in the
SDR Frame class.
where -d indicates the debug flag, 0 is the debug option, 2 is the frame
number, 1 is the slot number (which is always 1), 8 is the file descriptor of the
S70d’s side of the socket that is used to communicate with hardmon,
/dev/tty2 is the tty that is used to open SAMI/MI operator panel port, and
/dev/tty1 serial tty.
Dataflow
Hardmon requests are sent to the S70 daemon where the command is
handled by one of two interface components of the S70 daemon, the Frame
Supervisor Interface, or the Node Supervisor Interface.
The frame supervisor interface is responsible for keeping current the state
data in the frames’ packet and formats the frame packet for return to
hardmon. It will accept hardware control commands from hardmon that are
intended for itself and pass-on to the node supervisor interface commands
intended to control the S70/S7A node.
The node supervisor interface polls state data from the S70/S7A hardware for
keeping current the state data in the Nodes’ packet. The node supervisor
interface will translate the commands received from the frame supervisor
interface into S70/S7A software protocol and sends the command through to
the S70/S7A service processor.
If the hardmon command is intended for the frame, the frame supervisor
entity of the S70d handles it. If intended for the node, the node supervisor
entity converts it to SAMI protocol and sends it out the SAMI/MI interface file
descriptor as illustrated by Figure 86 on page 160.
H
A
R hardmon commands
S70d
N
"pass-on"
hardmon poll
cammands SAMI
S70
MI
Control panel
Service
Processor
The S70 daemon uses SAMI protocol, which takes the form of 4-byte
command words, to talk to the S70’s Manufacturing Interface. This interface
communicates with the S70’s operator panel, which in turn communicates
with the S70’s Service Processor. It is the Service Processor that contains the
instruction that acts upon the request. Data returned to the S70 daemon
follows the reverse flow.
The hardware control type is determined from the SDR Node class as a
hardware_control_type attribute. This attribute is the key into the
NodeControl class. The NodeControl class will indicate the hardware
capabilities for monitoring. This relationship is illustrated in Figure 84 on page
156.
There are four new hardmon variables that will be integrated into the
Hardmon Resource Monitor for the SP-attached servers. They are
SRChasMessage, SPCNhasMessage, src, and spcn. Historical states such
as nodePower, serialLinkOpen, and type are also supported by the
SP-attached servers. The mechanics involved with the definition of these
variables are no different than with previous variables and can be viewed
through Perspectives and in conjunction with the Event Manager.
In order to recognize these new resource variables, the Event Manager must
be stopped and restarted on the CWS as are all the nodes in the affected
system partition.
5.5.1 Perspectives
As SP must now support nodes with different levels of hardware capabilities,
an interface was architected to allow applications, such as Perspectives, to
determine what capabilities exist for any given node and respond accordingly.
This interface will be included with a new SDR table, the NodeControl class.
The Perspectives interface needs to reflect the new node definitions: Those
that are physically not located on an SP frame and those nodes that do not
have full hardware control and monitoring capabilities
There is a typical object representing the SDR Frame object for the
SP-attached server node in the Frame/Switch panel. This object has a unique
pixmap placement to differentiate it from a high and low frame, and this
pixmap is positioned according to its frame number in the Perspectives panel.
The monitored resource variables are handled the same as for standard SP
nodes. Operations, status, frame, and node information are handled the
same as for standard SP nodes.
These new resource variables will be integrated into the Hardmon Resource
Monitor for the SP-attached server:
• IBM.PSSP.SP_HW.Node.SRChasMessage
• IBM.PSSP.SP_HW.Node.SPCNhasMessage
• IBM.PSSP.SP_HW.Node.src
• IBM.PSSP.SP_HW.Node.spcn
In order to recognize these new resource variables, the Event Manager must
be stopped and restarted on the CWS and all the nodes in the affected
system partition.
3. Querying frame(s)
2 frame(s)
Check ok
4. Checking frames
5. Checking nodes
--------------------------------- Frame 1 -----------------------
Frame Node Node Host/Switch Key Env Front Panel LCD/LED is
Slot Number Type Power Responds Switch Fail LCD/LED Flashing
-------------------------------------------------------------------------------
1 1 high on yes no normal no LCDs are blank no
5 5 high on yes no normal no LCDs are blank no
9 9 high on yes no normal no LCDs are blank no
13 13 high on yes no normal no LCDs are blank no
• splstdata
Figure 89 on page 165 is the output of splstdata -n. It shows two
frames. Figure 90 on page 165 shows the output from splstdata -f
where the S70 is shown as a second frame. Figure 90 on page 165
shows the hardware description of each node in the SP system.
.
List Node Configuration Information
SP Frame S70
Slot 15 Slot 16
Empty Empty
Node 13 Node 14
Node 11 Node 12
Node 9 Node 10
Node 7 Node 8
Node 5 Node 6
Node 3 Node 4
Node 1 Node 2
Switch cable
Switch
Slot 1 Frame 2 is Slot 1 Node 65
Frame 1
connected to switch
port 14 Frame 5
CWS
CWS
Switch cable
Slot 1 Node
Node 11 Slot 1 Node17
Node 1
Switch
Slot
Slot 1 Node 65
Frame 1 Frame 2
Frame 5
CWS
Note that the switch cable from frame one connects to the S70; for example,
in this case, slot one frame five connects to switch port three of switch chip
five.
Node 1
CWS
Slot 1 Node 11
Node Node17
Slot 1 Node 1 Node33
Slot 1 Node 1
Switch Switch
Slot 1 Node 225 Slot 1 Node 145
Frame 1 Frame 2 Frame 3
Frame 15 Frame 10
CWS
SP Manuals
Chapter 15 "SP-attached Servers" in IBM RS/6000 SP: Planning, Volume 2,
Control Workstation and Software Environment , GA22-7281 provides some
additional information regarding SP-attached servers.
SP Redbooks
Chapter 4 "SP-Attached Server Support" in PSSP 3.1 Announcement ,
SG24-5332 provides some additional information on this topic.
In a system environment the server first identifies and authenticates the client
and then checks its authorization for the function requested.
In an SP system, there are at least two levels of security: AIX and PSSP.
Kerberos, which comes in bundled with PSSP, has been entrusted to perform
authentication on SP environments.
The main security concern with this authentication for the above commands
is the fact that passwords are sent in plain text over the network. They can be
easily captured by any root user on a machine that is on the network(s)
through which the connection is established.
rcp, rlogin, and rsh
The current user name (or a remote user name specified as a
command line flag) is used, and the user is prompted for a
password. Alternatively, a client can be authenticated by its IP
name/address if it matches a list of trusted IP names/addresses
that are stored in files on the server.
• /etc/hosts.equiv lists the hosts from which incoming (client) connections
are accepted. This works for all users except root (UID=0).
• $HOME/.rhosts lists additional hosts, optionally restricted to specific
userids, which are accepted for incoming connections. This is on a
per-user basis and also works for the root user.
SP Security 175
With AIX v4.3.1, all these commands except rexec also support Kerberos
Version 5 authentication. The base AIX operating system does not include
Kerberos. It is recommended that DCE for AIX Version 2.2 is used to provide
Kerberos authentication. Note that previous versions of DCE did not make
the Kerberos services available externally. However, DCE for AIX
Version 2.2, which is based on OSF DCE Version 1.2.2, provides the
complete Kerberos functionality as specified in RFC 1510, The Kerberos
Network Authentication Service (V5) .
For backward compatibility with PSSP 3.1 (which still requires Kerberos
Version 4 for its own commands), the AIX r-commands rcp and rsh also
support Kerberos Version 4 authentication. See 6.5, “How Kerberos Works”
on page 179 for details on Kerberos.
Note
On the SP, the chauthent command should not be used directly. The
authentication methods for SP nodes and the control workstation are
controlled by the partition-based PSSP commands chauthpar and
lsauthpar. Configuration information is stored in the Syspar SDR class, in
the auth_install, auth_root_rcmd and auth_methods attributes.
Further descriptions of the AFS and Sysctl security systems are included in
the later parts of this chapter.
SP Security 177
• Provides security on remote commands, such as rsh, rcp, dsh, and sysctl.
Description of these commands are in the following table.
Commands Description
rsh rsh is the remote shell command. On PSSP V3.1 this command is
no longer SP provided. The /usr/lpp/ssp/rcmd/bin/rsh
command is linked to the Berkeley command /usr/bin/rsh (which
uses.rhosts file).
rcp rcp is remote copying of files between local and remote hosts. On
PSSP V3.1, this command is no longer SP provided. The
/usr/lpp/ssp/rcmd/bin/rcp command is now linked to the
Berkeley command /usr/bin/rcp.
sysctl Uses the SP authentication service. When the client issues the
sysctl command, a Kerberos ticket will be sent to the server to
validate the identity of the client.
Authentication Server Host with the Kerberos database. This host provides
(Primary and secondary) the tickets to the principals to use.
When running the setup_authent, program
authentication services are initialized. At this stage,
a primary authentication server must be nominated
(this may be the CWS). A secondary authentication
server may then be created later that serves as a
backup server.
Ticket Cache File File that contains the Kerberos tickets for a
particular Kerberos principal and AIX ID.
SP Security 179
than one kerberos daemon running on the realm to provide faster
service especially when there are many client requests.
kadmind: This daemon only runs on the primary authentication server
(usually the CWS). It is responsible for serving the Kerberos
administrative tools, such as changing passwords and adding
principals. It also manages the primary authentication database.
kpropd: This daemon only runs on secondary authentication database
servers. When the daemon receives a request, it synchronizes the
Kerberos secondary server database. The databases are
maintained by the kpropd daemon, which receives the database
content in encrypted form from a program, kprop, which runs on
the primary server.
/.k The master key cache file. Contains the DES key
derived from the master password. The DES key is
saved in /.k file using the
/usr/lpp/ssp/kerberos/etc/kstash command. The
kadmind daemon reads the master key from this file
instead of prompting for the master password.
After changing the master password, perform the
following: Enter the kstash command to kill and restart
kadmind daemon and to recreate /.k file to store the new
master key in the /.k file.
SP Security 181
Directories and Files Description
/var/adm/SPlogs/kerbero This file records the kerberos daemon’s process IDs and
s/kerboros.log messages from activities.
SP Security 183
startsrc -s kerberos
stopsrc -s kadmind
startsrc -s kadmind
SP Security 185
6.9.2 Change the Attributes of the Kerberos Principal
To change a password for a principal in the authentication database, a PSSP
authentication database administrator can use either the kpasswd command or
the kadmin program’s change_password subcommand. You can issue these
commands from any system running SP authentication services and do not
require a prior k4init.
To change your own admin instance password, you can use either the kpasswd
command or the kadmin program’s change_admin_password subcommand.
In addition to changing the password, you may want to change either the
expiration date of the principal or its maximum ticket lifetime, though these
are not so likely to be necessary. To do so, the root user on the primary
authentication database system must use the kdb_edit command just as
when adding new principals locally. Instead of not finding the specified
principal, the command finds it already exists and prompts for changes to all
its attributes starting with the password followed by the expiration date and
maximum ticket lifetime.
Use the chkp command to change the maximum ticket lifetime and expiration
date for Kerberos principals in the authentication database. When logged into
a system that is a Kerberos authentication server, the root user can run the
chkp command directly. Additionally, any users who are Kerberos database
administrators listed in the /var/kerberos/database/admin_acl.mod file can
invoke this command remotely through a sysctl procedure of the same name.
The administrator does not need to be logged in on the server host to run chkp
through sysctl but must have a Kerberos ticket for that admin principal
(name.admin).
The following are the procedures to delete a principal through the kdb_util
command and its subcommands.
1. The root user on the primary authentication server must edit a backup
copy of the database and then reload it with the changed database. For
example, in order to keep a copy of the primary authentication database to
SP Security 187
a file named slavesave in the /var/kerberos/database directory, enter the
command: kdb_util dump /var/kerberos/database/slavesave
2. Edit the file by removing the lines for any unwanted principals.
3. Reload the database from the backup file by entering the command:
kdb_util load /var/kerberos/database/slavesave
SP Security 189
The kpropd daemon, which always runs on the secondary server,
automatically performs updates on the secondary server database.
The kpropd daemon is activated when the secondary server boots up. If the
kprod daemon becomes inactive, it may be automatically reactivated by the
AIX System Resource Controller (SRC). That is, it may be restarted by using
the startsrc command. The history of restarting the daemon is kept in the log
file called /var/adm/SPlogs/kerberos/kprod.log.
The following commands are the primary clients to the hardware control
subsystem:
• hmmon: Monitors the hardware state.
• hmcmds: Changes the hardware state.
• s1term: Provides access to the node’s console.
• nodecond: For network booting, uses hmmon, hmcmds, and s1term.
• spmon: some parameters are used to monitor; some are used to change the
hardware state. The spmon -open command opens a s1term connection.
# k4list -srvtab
Server key file: /etc/krb-srvtab
Service Instance Realm Key Version
------------------------------------------------------
hardmon sp4cw0 MSC.ITSO.IBM.COM 1
rcmd sp4cw0 MSC.ITSO.IBM.COM 1
hardmon sp4en0 MSC.ITSO.IBM.COM 1
rcmd sp4en0 MSC.ITSO.IBM.COM 1
Each line in that file lists an object, a Kerberos principal, and the associated
permissions. Objects can either be host names or frame numbers. By default,
PSSP creates entries for the control workstation and for each frame in the
system, and the only principals that are authorized are root.admin and the
instance of hardmon for the SP Ethernet adapter. There are four different sets
of permissions indicated by a single lowercase letter:
• m (Monitor) - monitor hardware status
• v (Virtual Front Operator Panel) - control/change hardware status
• s (S1) - access to node’s console through the serial port (s1term)
SP Security 191
• a (Administrative) - use hardmon administrative commands
Note, that for the control workstation, only administrative rights are granted.
For frames, the monitor, control, and S1 rights are granted. These default
entries should never be changed. However, other principals might be added.
For example, a site might want to grant operating personnel access to the
monitoring facilities without giving them the ability to change the state of the
hardware or access the nodes’ console.
dsh SP System SP
Mgmt boot/install
SP
SP Kerveros V4 Kerberos V4
rsh rsh daemon
User issued
commands /usr/lpp/ssp/rcmd/bin/rsh /usr/lpp/ssp/rcmd/etc/kshd
AIX
rsh daemon
AIX rsh
/usr/bin/rsh
/usr/sbin/rshd
In PSSP v3.1, the authenticated r-commands in the base AIX 4.3.2 operating
system are used instead. They can be configured for multiple authentication
methods including the PSSP implementation of Kerberos Version 4. To allow
applications that use the full PSSP paths to work properly, the PSSP
commands rcp and remsh/rsh have not been simply removed but have been
replaced by links to the corresponding AIX commands. This new calling
structure is shown in Figure 97 on page 194.
SP Security 193
SP Parallel Management
Commands
dsh
SP System SP
Mgmt boot/install
/usr/lpp/ssp/rcmd/bin/rsh
User issued Symbolic Link
AIX
commands Kerberos V5/V4
rsh daemon
/usr/sbin/krshd
AIX rsh
/usr/bin/rsh
AIX
rsh daemon
/usr/sbin/rshd
SP Security 195
Kerberos Version 4 rcmd service ticket but ignores the -f and -F
flags since Version 4 Ticket-Granting Tickets are not forwardable.
These requests are then processed by the rshd and krshd daemons.
Be aware that the daemon itself does not call the get_auth_method()
subroutine to check if STD is among the authentication methods. The
chauthent command simply removes the shell service from the /etc/inetd.conf
file when it is called without the -std option; so, inetd will refuse connections
on the shell port. But if the shell service is enabled again by editing
/etc/inetd.conf and refreshing inetd, the rshd daemon will honor requests
even though lsauthent still reports that Standard AIX authentication is
disabled.
After checking if the requested method is valid, the krshd daemon then
processes the request. This, of course, depends on the protocol version.
The daemon then calls the kvalid_user() subroutine, from libvaliduser.a, with
the local user name (remote user name from the client’s view) and the
principal’s name. The kvalid_user() subroutine checks if the principal is
authorized to access the local AIX user’s account. Access is granted if one of
the following conditions is true:
1. The $HOME/.k5login file exists and lists the principal (in Kerberos form).
2. The $HOME/.k5login file does not exist, and the principal name is the
same as the local AIX user’s name.
Case (1) is what is expected. But be aware that case (2) above is quite
counter-intuitive: It means that if the file does exist and is empty, access is
denied, but if it does not exist, access is granted. This is completely reverse
to the behavior of both the AIX $HOME/.rhosts file and the Kerberos
Version 4 $HOME/.klogin file. However, it is documented to behave this way
(and actually follows these rules) in both the kvalid_user() man page and the
AIX Version 4.3 System Managment Guide: Communications and Networks,
SC23-4127.
If the authorization check has passed, the krshd daemon checks if a Kerberos
Version 5 TGT has been forwarded. If this is the case, it calls the k5dcelogin
command that upgrades the Kerberos TGT to full DCE credentials and
executes the command in that context. If this k5dcelogin cannot be done
because no TGT was forwarded, the user’s login shell is used to execute the
command without full DCE credentials.
SP Security 197
Note: DFS home directories
Note that this design may cause trouble if the user’s home directory is
located in DFS. Since the kvalid_user() subroutine is called by krshd before
establishing a full DCE context via k5dcelogin, kvalid_user() does not have
user credentials. It runs with the machine credentials of the local host, and
so can only access the user’s files if they are open to the other group of
users. The files do not need to be open for the any_other group (and this
would not help, either), since the daemon always runs as root and so has
the hosts/<ip_hostname>/self credentials of the machine.
The daemon then checks the Kerberos Version 4 $HOME/.klogin file and
grants access if the principal is listed in it. This is all done by code provided
by the PSSP software, which is called by the base AIX krshd daemon. For
this reason, Kerberos Version 4 authentication is only available on SP
systems not on normal RS/6000 machines.
Note: rcmdtgt
PSPP 3.1 still includes the /usr/lpp/ssp/rcmd/bin/rcmdtgt command, which
can be used by the root user to obtain a ticket-granting ticket by means of
the secret key of the rcmd.<localhost> principal stored in /etc/krb-srvtab.
To work around this problem, PSSP uses the authenticated rsh command to
temporarily add the boot/install server’s root user to the .rhosts file of the
control workstation and removes this entry after network installation.
PSSP: Installation and Migration Guide, GA22-7347 explains the steps which
are required to initially set up SP security using an AFS server, and PSSP:
Administration Guide, SA22-7348 describes the differences in the
management commands of PSSP Kerberos and AFS Kerberos.
AFS uses a different set of protocols, utilities, daemons, and interfaces for
principal database administration.
SP Security 199
Table 12, contains some commands that may be used for managing AFS.
Table 12. Some Commands for Managing AFS
Commands Description
kas For adding, listing, deleting, and changing the AFS principal’s
attributes.
kas has corresponding subcommands, which are as follows:
examine (for displaying Principal’s information).
create (for adding Principals and setting passwords).
setfields (for adding an authentication administrator and for
changing Principal passwords and attributes).
delete (for deleting Principals).
klog.krb The user interface to get Kerberos tickets and AFS tokens.
SP Security 201
• Authenticates the clients.
• Decode service ticket.
• Performs an authorization callback.
• Executes commands as root.
SP Redbooks
Inside the RS/6000 SP, SG24-5145. Section 4.6 of Chapter 4 is on SP
Security. It gives a good overview of Kerberos, AFS, and sysctl.
Study Guides
IBM Global Services, RS/6000 SP, System Administration: Course Code
AU96. (Unit 1 is on managing Kerberos authentication in the SP
environment.) This book covers what Kerberos is used for on the SP, how to
manage Kerberos principal authentication, how to keep Kerberos secure, and
considerations on authentication server backup and recovery. Unit 5 is on
working with the sysctl security system. Appendix C covers an overview of
AFS authentication.
SP Security 203
B. The partition-sensitive daemons and file collection.
C. hardmon and NIM.
D. hardmon and sysctl.
3. The /etc/krb-srvtab file contains:
A. The ticket-granting ticket (TGT).
B. The list of principals authorized to invoke remote commands.
C. The master key encrypted with the root.admin password.
D. The private Kerberos keys for local services.
4. Which of the following is not a Kerberos client in a standard PSSP
implementation?
A. IBM SP Perspectives.
B. The hardmon daemon.
C. Remote shell (rsh).
D. The system control facility (sysctl).
This chapter covers user management that consists of adding, changing, and
deleting users on the SP system and how to control user login access.
Data and user management using the file collections facility is also
discussed. File collection provides the ability to have a single point of update
and control of file images that will then be replicated across nodes.
AMD and AIX Automounters are also discussed. These allows users local
access to any files and directories no matter which node they are logged in
to.
Issues Solutions
[Entry Fields]
Default Network Install Image [bos.obj.ssp.432]
Remove Install Image after Installs false +
The following are the steps in adding an SP User by entering smit spmkuser:
• Check /usr/lpp/ssp/bin/spmkuser.default file for defaults for primary
group, secondary groups, and initial programs.
• The user’s home directory default location is retrieved from the SDR
SP class, homedir_server, and homedir_path attributes.
• spmkuser only supports the following user attributes: id, pgrp, home (as in
hostname:home_directory_path format), groups, gecos, shell, and
login.
• A random password is generated and is stored in the
/usr/lpp/ssp/config/admin/newpass.log file.
Figure 100 shows the output screen for changing the characteristics of an SP
user. All value fields can be changed except the name field. Nodes will pull the
SPUM file collection from the CWS and update its configuration.
[Entry Fields]
* User NAME spuser1
User ID [218] #
LOGIN user? +
PRIMARY group [1] +
Secondary GROUPS [staff] +
HOME directory [sp3en0:/home/sp5en0/sp>
Initial PROGRAM [/bin/ksh] /
User INFORMATION []
Remove a User
[Entry Fields]
Remove AUTHENTICATION information? No +
Remove HOME directory? No +
The following example shows how to list SP users with the spluser command:
spluser spuser1
The output will appear as the following:
spuser1 id=201 pgrp=1 groups=staff home=/u/spuser1 on
sp3en0:/home/sp3en0/spuser1 shell=/bin/ksh gecos= login=true
From a user’s point of view, the password and user credentials are the same
throughout the network. This means that the user only needs to maintain one
password. When the user’s home directory is maintained on one machine and
made available through NFS, the user’s environment is also easier to
maintain.
By default, the NIS master server maintains the following files that should
contain the information needed to serve to the client systems.
/etc/ethers
/etc/group
/etc/hosts
/etc/netgroup
/etc/networks
/etc/passwd
/etc/protocols
/etc/publickey
/etc/rpc
Any changes to these files must be propagated to clients and slave servers
using SMIT:
Either specify a particular NIS map by entering the name representing the
filename or leave the default value of all, then press Enter. You can also do
this manually by changing to the directory /etc/yp and entering the command
make all or make <map-name>. This propagates the maps to all NIS clients and
transfers all maps to the slave servers.
You may configure a slave server with the smit mkslave command.
Configuring a slaver server starts the ypbind daemon that searches for a
server in the network running ypserv. Shortly afterwards, the ypserv daemon
of the slave server itself will start.
In a lot of situations, the slave server must also be able to receive and serve
login requests. If this is the case, the slave server must also be configured as
an NIS client.
You may also configure a Client Server using SMIT by entering smit mkclient
on every node or use edit the appropriate entries in the script.cust file. This
can be done at installation time or later through changing the file and then
doing a customized boot on the appropriate node.
File collections are sets of files or directories that simplify the management of
duplicate or common files on multiple systems, such as SP nodes. A file
collection can be any set of regular files or directories. PSSP is shipped with
a program called /var/sysman/supper, which is a Perl program that uses the
Software Update Protocol (SUP) to manage the SP file collections.
When configuring the SDR, you are asked if you want to use this facility.
When answered affirmatively, the control workstation configures a
mechanism for you that will periodically update the system configuration files
(you specify the interval). The files included in that configuration are:
• All files in the directory /share/power/system/3.2.
• Some of the supper files.
• The AMD files
• The user administration files (/etc/group, /etc/passwd, and
/etc/security/group).
• /etc/security/passwd.
A primary file collection can contain a secondary file collection. For example,
the power_system file collection is a primary file collection that consists of the
secondary file collection, node.root. This means that power_system can be
installed onto a boot/installl server, and all of the files that have been defined
within that file collection will be installed on that boot/installl node including
those in node.root. However, the files in node.root would not be available on
that node because they belong to a secondary file collection. They can,
however, be served to another node. This avoids having to install the files in
their real or final location.
For example, if you want to have one .profile on all nodes and another .profile
on the control workstation, consider using the power_system collection
delivered with the IBM Parallel System Support Programs for AIX. This is a
primary collection that contains node.root as a secondary collection.
• Copy .profile to the /share/power/system/3.2 directory on the control
workstation.
• If you issue supper install power_system on the boot/install server, the
power_system collection is installed in the /share/power/system/3.2
directory. Because the node.root files are in that directory, they cannot be
executed on that machine but are available to be served from there. In this
case, .profile is installed as /share/power/system/3.2/.profile.
• If you issue supper install node.root on a processor node, the files in
node.root collection are installed in the root directory and, therefore, can
be executed. Here, /share/power/system/3.2/.profile is installed from the
file collection as /.profile on the node.
Secondary file collection is useful when you need a second tier or level of
distributing file collections. This is particularly helpful when using boot/install
servers within your SP or when partitioning the system into groups.
This file collection is important because it contains the files that define the
other file collections. It also contains the file collection programs used to load
and manage the collections. Of particular interest in this collection are:
• /var/sysman/sup, which contains the directories and master files that
define all the file collections in the system.
• /var/sysman/supper, which is the Perl code for the supper tool.
• /var/sysman/file.collections, which contains entries for each file
collection.
The collection also includes the password index files that are used for login
performance:
The node.root file collection is available on the control workstation and the
boot/install servers under the power_system collection so that it can be
served to all the nodes. You do not install node.root on the control workstation
because the files in this collection might conflict with the control workstation's
own root files.
The control workstation is normally the master server for all of the default file
collections. That is, a master copy of all files in the file collections originates
from the control workstation. The /var/sysman/sup directory contains the
master files for the file collections.
An example of an individual file collection with its directory and master files is
illustrated in Figure 103 on page 219. It shows the structure of the
/var/sysman/sup/sup.admin file collection.
serve Boot/Install Server All the collections that can be served from
CWS your machine.
Run the supper update command, first on any secondary server, then on
clients. The supper update command may be included in the crontab file to run
regularly.
A system-created file contains a list of all the files excluded during the update
process. If there are no files for this collection listed in the refuse file in the
/var/sysman/sup directory, the refuse file in this directory will have no entries.
There are seven steps in building a file collection, and you must be root to
perform all of them.
1. Identify files you wish to collect. For example, it has been decided that
program files (which are graphic tools) in /usr/local directory are to be
included on all nodes.
2. Create the file collection directory. In this case, create
/var/sysman/sup/tools directory
3. Create master files that are list, prefix, lock, and supperlock. Pay attention
to the list file that consists of rules for including and excluding files in that
directory. Lock and supperlock files must be empty.
4. Add a link to the lists file in the /var/sysman/sup directory. For example,
ln -s /var/sysman/sup/tools/list /var/sysman/sup/lists/tools.
5. Update the file.collections file. Add the name of the new file collection as
either a primary or secondary collection.
6. Update the .resident file by editing the .resident file on your control
workstation or your master server and add your new collection, if you want
to make it resident, or use the supper install command.
7. Build the scan file by running supper scan. The scan file only works with
resident collections. It consists of an inventory list of files in the collection
that can be used for verification and eliminates the need for supper to do a
directory search on each update.
There are four steps involved, and you must be root to perform installation of
a new file collection.
AMD has been used by PSSP 1.2, 2.1 and 2.2. AIX Automounter has been
used by PSSP 2.3 onwards.
The AMD daemon runs on CWS, boot/install servers, and all nodes on the SP
system. It monitors specified directory mount points, and when a file I/O
operation is requested to that mount point, it performs the RPC call to
complete the NFS mount to the server specified in the automount map files.
Any mount point directories that do not already exist on the client will be
created. After a period of inactivity, two minutes by default, the automount
daemon will attempt to unmount any mounted directories under its control.
The mounted directories can come from SP boot/install servers or any
workstation or server on the network.
AMD is not an IBM product, and its information can be found in the
compressed file named /usr/ssp/lpp/public/amd_up102.tar.Z.
There are two types of file maps. These are indirect maps and direct maps.
• Indirect maps are useful for commonly-used, higher-level directories,
such as /home.
• Direct maps are useful when directories cannot be dedicated for
automount such as /usr.
AMD can be enabled by entering the command smit enter_data and selecting
true for AMD configuration. The system will then run the amd_config Perl
script. This script is located in the /usr/lpp/ssp/install/bin directory that adds
the amd_start script to /etc/rc.sp.
The number of mounts can be reduced on a given system and has less
probability of problems due to NFS file server outages.
On the SP, the Automounter may be used to manage the mounting of home
directories. It can be customized to manage other directories as well. It
makes system administration easier because it only requires modification of
The SP will configure and run the native AIX automounter on the newer
nodes containing PSSP 2.3 and later releases and the BSD AMD daemon on
the older nodes containing PSSP 2.2.
SP Manuals
PSSP: Administration Guide, SA22-7348. Chapter 4 describes file collections
thoroughly that cover the concepts, how to create file collections, how it
works, and so forth. Chapter 5 gives detailed description on managing user
accounts that covers how to set up SP users and how to change, delete, and
list them.
SP Redbooks
Inside the RS/6000 SP, SG24-5145. Chapter 4, Section 4.8 contains
description on file collection that cover the definition, file collection building,
installation, organization, maintenance, and so forth. Section 4.9 covers
managing AIX Automounter and its difference from BSD Automounter.
Study Guides
IBM Global Services, RS/6000 SP, System Administration: Course Code
AU96. Unit 2 describes the managing of user accounts, which covers
considerations for setting up and administering users in a distributed system
and setting up login control. Unit 3 describes the managing of user directories
that cover automounting of NFS file systems, usage, and setting up of the
AMD Automounter. Unit 4 covers data management that covers file collection
concepts, how to work with and manage file collections, build and install
them, the difference between using NIS and file collections, and so forth.
This chapter addresses various topics related to the initial configuration of the
CWS: Preparation of the environment, copy of the AIX and PSSP lpp from the
distribution media to the CWS disks, initialization of Kerberos services and of
the SDR. These topics are not listed in the chronological order of the CWS
configuration process. Rather, they are gathered by categories: PSSP
commands, configuration files, environment requirements, and lpp
considerations.
The initial configuration of the CWS is the part of the PSSP installation where
you prepare the environment before you start configuring the PSSP software.
It consists of several steps:
1. You need to update the AIX system environment: You have to modify the
PATH of the root user, change the maximum number of processes allowed
by AIX, customize a few system files, such as /etc/services, and check
that some system daemons are running.
The tasks described in steps one through four can be executed in any order.
Steps 5, 6, and 7 must be performed in this order after all other steps.
The following sections describes in more detail the commands, files, and
concepts related to these seven steps.
8.3.1 setup_authent
This command has no argument. It configures the Kerberos authentication
services for the SP system. The command, setup_authent, first searches the
AIX system for Kerberos services already installed, checks for the existence
of Kerberos configurations files, and then enters an interactive dialog where
you are asked to choose and customize the authentication method to use for
the management of the SP system. You can choose to use the SP provided
Kerberos services, another already existing Kerberos V4 environment, on an
8.3.2 install_cw
This command has no argument. It is used after the PSSP software has been
installed on the CWS and after the Kerberos authentication services have
been initialized. The command, install_cw, performs the initial customization
of PSSP onto the CWS (setup of PSSP SMIT panels, initialization of the SDR,
and so on), configures the default partition, and starts the SP daemons
necessary for the following steps of the SP installation.
8.4.2 /etc/inittab
This file is used to define several commands that are to be executed by the
init command during an RS/6000 boot process.
8.4.3 /etc/inetd.conf
On the CWS, the inetd daemon configuration must contain the uncommented
entries bootps and tftp. If they are commented prior to the PSSP installation,
you must uncomment them manually. The PSSP installation scripts will not
check or modify these entries.
8.4.4 /etc/rc.net
For improving networking performance, you can modify this file on the CWS
to set network tunables to the values that fit your SP system by adding the
following lines:
# additions for tuning of SP-PSSP system
no -o thewall=16384
no -o sb_max=163840
no -o ipforwarding=1
no -o tcp_sendspace=65536
no -o tcp_recvspace=65536
no -o udp_sendspace=32768
no -o udp_recvspace=65536
no -o tcp_mssdflt=1448
The rc.net file is also the recommended location for setting any static routing
information. In particular, the CWS needs to have IP connectivity to each of
the SP nodes’ en0 adapter during the installation process. In the case where
the CWS and all nodes en0 adapters are not on the same Ethernet segment
(for example, when there are several frames), the rc.net file of the CWS can
be modified to include a routing statement.
8.5.1 Connectivity
During normal operation, the TCP/IP connectivity needed for user
applications between the CWS and the SP nodes can be provided through
any type of network (Token Ring, FDDI, ATM) supported by the RS/6000
hardware. However, for the installation and the management of the SP nodes
from the CWS, there must exist an Ethernet network connecting all SP nodes
to the CWS. This network may consist of several segments. In this case, the
routing between the segments is provided either by one (or several) SP
node(s) with multiple Ethernet adapters, Ethernet hubs, or Ethernet routers.
In the case that SP-Attached Servers (RS/6000 S70 or S7A) are included in
the SP system, two serial cables are needed to link the CWS to each of the
servers. An Ethernet connection is also mandatory between the CWS and the
server configured on the en0 adapter of the server.
Also, note that the CWS cannot be connected to an SP Switch (no css0
adapter in the CWS).
Keep in mind that this rule provides only a very rough estimate. As a point of
comparison, the minimum system image (spimg) provided with PSSP is 91
MB versus an estimated 700 MB for the system images considered in this
rule of thumb.
/spdata/sys1/install/
/spdata/sys1/install/<source_name>
/spdata/sys1/install/<source_name>/lppsource
/spdata/sys1/install/aix431
/spdata/sys1/install/aix431/lppsource
/spdata/sys1/install/aix432
/spdata/sys1/install/aix432/lppsource
/spdata/sys1/install/images
/spdata/sys1/install/pssp
/spdata/sys1/install/pssplpp
/spdata/sys1/install/pssplpp/PSSP-2.2
/spdata/sys1/install/pssplpp/PSSP-2.3
/spdata/sys1/install/pssplpp/PSSP-2.4
/spdata/sys1/install/pssplpp/PSSP-3.1
The installable images (lpp) of the AIX systems must be stored in directories
named /spdata/sys1/install/<source_name>/lppsource.You can set
<source_name> to the name you prefer. However, it is recommended to use a
name identifying the version of the AIX lpps stored in this directory. The
names generally used are aix421, aix431, and so on.
Except for <source_name>, the name of all directories listed in Figure 104 must
be left unchanged.
In Section 8.5.2.1, “/spdata Size and Disk Allocation” on page 240, we have
mentioned one possibility of allocation of /spdata based on a backup strategy.
We now present another possibility based on the contents of the
subdirectories of /spdata. Instead of dedicating a volume group to /spdata,
you can spread the information contained in the /spdata directory between
the rootvg and another volume group (for example, let us call it spstdvg). All
information that can be easily recreated is stored in spstdvg, while all
information that is manually created during the installation of the SP system
is stored in rootvg. The advantage of this solution is to enable the backup of
critical SP information along with the rest of the AIX system backup using the
mksysb command, while all information that is not critical can be backed up
independently with a different backup frequency (may be only once at
These directories are then mounted over their mount point in rootvg.
For installation on AIX releases earlier or equal to 4.2, you also need to install
the filesets bos.info.*
Perl ssp.perlpkg
Perl ssp.perlpkg
Sysctl ssp.sysctl
Perspectives ssp.gui
SP Manuals
You can refer to two sets of documents related to either Version 2.4 or
Version 3.1 of PSSP:
To help you understand the use of each command, they are presented in
association to the installation step in which they are performed and in the
order in that they are first used during the installation process. Some of these
commands may be used several times during the initial installation and the
upgrades of an SP system. In this case, we also provide information that is
not related to the installation step but that you may need at a later stage in
the life of your SP system.
The first task is to define in the SDR the site environment data used by the
installation and management scripts. This can be done using the command
line interface: spsitenv, or its equivalent SMIT window: Site Environment
Information window ( smitty site_env_dialog). This must be executed on the
CWS only.
The command spsitenv defines all site wide configuration information (name
of the default installable image, NTP, and so on) and which of the optional
PSSP features will be used (SP User Management, SP File Collection
Management, SP Accounting).
[Entry Fields]
Default Network Install Image [bos.obj.ssp.432]
Remove Install Image after Installs false +
This task is performed using either the command line interface: spframe, or its
SMIT equivalent windows: SP Frame Information (smitty sp_frame_dialog)
and non-SP Frame Information (smitty nonsp_frame_dialog). This task must
be executed on the CWS only.
This command executes on the CWS or any SP node when using the
command line interface. It can only be used on the CWS when called from
one of the SMIT windows accessible from the List Database Information
window (smitty list_data).
This command displays configuration information stored in the SDR about the
frames and nodes.
This command has many options. During the SP installation, the most useful
ones are:
-n for node general configuration
-b for node boot/installation configuration
-a for node adapters configuration
-f for frame information
At this point in the installation, you can use splstdata -f and splstdata -n to
verify that the frames have been correctly configured in the SDR, and that the
execution of the spframe command has correctly discovered the nodes in
each frame.
Figure 106 shows which adapters are defined by each of these commands.
15 sp3n15
en0 192.168.31.115
13 14
sp3n13 sp3n14
en0 en0
11 12
sp3n11 sp3n12
en0 en0
9 10
sp3n09 sp3n10
en0 en0
192.168.31.xx
7 8
sp3n07 sp3n08
en0 en0
5 6
sp3n05 sp3n06
en0 en0
1 sp3n01en1 BIS
en1 192.168.31.11
sp3n01
en0 192.168.3.11
SWITCH
Frame 1
192.168.3.XX
sp3en0 192.168.3.130
You can provide this information, if you already know it, by creating a file
/etc/bootptab.info (for more details, see 9.3.1, “/etc/bootptab.info” on page
267) to speed-up the sphrdwrad command. For each node in the argument list
of the command, sphrdwrad will look if it finds its hardware address in the
/etc/bootptab.info. If it cannot find it, it will then query the node through the
hardware connection to the frame (serial link). In the latter case, the node will
be powered-down and powered-up.
Note
Do not use the sphrdwrad command on a running node since it will be
powered off.
In our environment, we can use either command to discover the en0 adapter
hardware addresses:
sphrdwrad 1 1 rest
or
sphrdwrad 1 1 12
The spethernt command configures the en0 adapter, and the spadaptrs
command is its counterpart for the other adapters. Similar to the spethernt
Only Ethernet, Token Ring, FDDI, and Switch (css0) adapters can be
configured using spadaptrs. Other types of adapters (ATM, ESCON) can not
be configured this way. You must either configure them manually after the
nodes are installed or write configuration code for them in the shell
customization script firstboot.cust (See “firstboot.cust” on page 272).
For the switch adapters, two options -a and -n allow to allocate IP addresses
to switch adapters sequentially based on the switch node numbers.
The default is to assign the long symbolic name of the en0 adapter as the
host name of the node. If your site policy is different (for example, you may
want to give to the node, as host name, the name of the adapter connected to
your corporate network), you use sphostnam to change the initial host name.
Again, like the previous one, this command applies either to one node or to a
range or list of nodes.
In our environment, we only want to change the format of the name and use
the short names but keep the en0 adapter name as host name:
sphostnam -f short 1 1 12
On the CWS, you can use either spsetauth or the SMIT Select Authorization
Methods for Root access to Remote Commands window ( smitty spauth_rcmd).
You can now choose the authentication methods used for System
Management tasks. Valid methods are Kerberos 4, Standard AIX, and
Kerberos 5 (DCE).
You perform this task only on the CWS using either chauthpar or the SMIT
Select Authorization Methods for Root access to Remote Commands window
(smitty spauth_methods).
This command will start the daemons: hats, hags, haem, hr, pman, emon,
spconfigd, emcond, and spdmd (optional).
Execution of this command on the CWS only starts the daemons on the CWS
and not on any SP node. Since the daemons need to execute on all machines
of the SP system for the subsystem to run successfully, syspar_ctrl -A must
In PSSP 2.4, the spbootins completely perform this task, while in PSSP 3.1
the task is split between the spchvgobj and the spbootins command. Sections
9.2.13, “spchvgobj” on page 259 and 9.2.14, “spbootins” on page 260
describe the functions performed by the commands in each case.
9.2.13 spchvgobj
The spchvgobj command executes on CWS only.
This command is only available in PSSP 3.1. It has been added as part of the
new PSSP feature, which has the possibility of having several bootable
volume groups. The boot/install configuration, that was, up to PSSP 2.4,
specific of a node, is now specific to a volume group.
The PSSP installation scripts use a default configuration for the boot/install
servers, the AIX image (mksysb) that will be installed on each node, and the
disk where this image will be installed. This default is based on information
that you entered in the Site Environment Information panel. The default is to
define as the boot/install server(s):
• The CWS for a one frame system
• The CWS and the first node of each frame in a multi-frame system
The default is to use rootvg as the default bootable volume group, on hdisk0.
If you wish to use a different configuration, you can use the spchvgobj
command or its SMIT equivalent to specify for a set of SP nodes, and for a
bootable volume group, the names of disk(s), where to install the AIX image,
the number of mirrored disks, the name of the boot/install server, where to
fetch the AIX image, the name of the installable image, the name of the AIX
lpp source directory, and the level of PSSP to be install on the nodes.
In our environment, since at the time of the installation of the first frame we
already plan for adding a new frame, we want to force node 5 to 15 to point to
node 1 as the boot/install server. We can therefore use:
spchvgobj -n 1 1 5 11
9.2.14 spbootins
The spbootins command executes on CWS only.
If our environment was running under PSSP 2.4 instead of PSSP 3.1, then we
would have used
spbootins -n 1 1 5 11
to specify that, at their next reboot, all nodes in frame one are to load the AIX
image from their respective boot/install server and to ask not to run
setup_server.
At this point, only the CWS will be configured since the other nodes are still
not running.
On the CWS, this command could have been executed automatically if you
had specified the -s yes option when running spbootins.
Since we did not use this option previously in our environment, we have to
execute setup_server.
Sample topology files are provided with PSSP in the /etc/SP directory. These
samples correspond to most of the topologies used by customers. If none of
the samples match your real switch topology, you have to create one using
the partitioning tool provided with PSSP (System Partitioning AId available
from the Perspectives Launch Pad). Once this file is created, it must be
annotated and stored in the SDR (here, annotated means that the generic
topology contained in the sample file is customized to reflect information
about the real switch connections using the information stored in the SDR).
This task is performed using the Eannotator and Etopology commands on the
CWS or by using the equivalent SMIT Topology File Annotator (smitty
annotator) and Store a Topology File windows (smitty etopology_store).
Sample clock topology files are provided in the SDR. You can choose to use
one of them or let the system decide for you.
In our environment, and since there is only one switch, we let Eclock
automatically make the decision:
Eclock -d
In normal cases, the installation of a node requires that you open two shell
windows on your CWS display. One will be used to monitor the execution of
the installation using the s1term command, while the other one is used to
initiate the installation using the nodecond command. These two commands
execute in parallel. We present them sequentially in 9.2.20, “s1term” on page
263 and 9.2.21, “nodecond” on page 264.
9.2.20 s1term
The s1term command executes on CWS only.
The s1term command opens a connection to the SP node serial port. Since
the node console is, by default, associated to this port, s1term provides a
remote console access to the SP node from the CWS through the serial link.
During installation, the nodecond command needs write access to the node on
the serial link. The write access cannot be shared by several clients. You
must, therefore, only open a read-only connection to monitor the node
installation and see all messages displayed on the node console.
After the boot/install servers have successfully been installed, you can start
the installation of the other nodes. To monitor this installation, you can open a
parallel one s1term session to each of these nodes
9.2.21 nodecond
The nodecond command executes on CWS.
In parallel with the s1term window, you can now initiate the boot and system
installation on the target node. This phase is called node conditioning and it is
executed by the nodecond command. It is executed on the CWS for all nodes
even if their boot/install server is not the CWS.
Once started, this command does not require any user input. It can,
therefore, be started as a shell background process. If you have several
nodes to install from the control workstation, you can start one nodecond
command for each of them from the same shell. However, for performance
reasons, it is not recommended to simultaneously install more than eight
nodes from the same boot/install server.
In our environment, we first need to install node 1 and start the command in
the background:
nodecond 1 1 &
After all boot/install server has successfully been installed, you can condition
the remaining nodes of your SP system.
For a Thin or Wide node (non-SMP), using either spmon -g (PSSP 2.4) or
Hardware Perspective (PSSP 3.1) and a s1term -w window the steps are:
1. Power off the node (if it is powered on).
2. Put the key in Secure mode.
3. Power on the node.
4. When the led gets to 200, put the key in Service mode.
5. Reset the node.
6. When the led reaches 260 or 262, the Main menu is displayed. Select
option 1 Select Boot Device.
7. In the Select Boot (startup) Device menu, select your network adapter
to boot from.
8. You will be prompted to enter IP addresses for the client (the node to
be installed), the server (the boot/install server for the node being
installed), and a gateway (which you may leave empty).
9. Return to the main menu.
10.Select option 3 to send a test transmission ping between the client and
the server.
11.Return to the main menu and select option 4 to start the system boot.
12.At this point, make sure the key is in Normal mode before the
installation finishes using the command:
spmon -key normal <node>
For a CHRP Node (SMP Thin and Wide), perform the following steps:
1. Power the node off by invoking spmon.
2. Open a tty connection using a read/write s1term.
3. Power on the node using spmon.
4. Wait for a panel to appear as shown in Figure 107.
9.3.1 /etc/bootptab.info
The bootptab.info file specifies the hardware (MAC) address of the en0
adapter of SP nodes. It is used to speed-up the execution of the sphrdwrad
command. Each line contains the information for one node and is made of
two parts: The node identifier and the MAC address.
1,14 10:0:5A:FA:03:33 is not a valid entry even if the second string is a usual
format for MAC addresses.
9.3.2 /tftpboot
The /tftpboot directory exists on the CWS, the boot/install server, and on the
SP client nodes.
You can also manually add customization scripts to the /tftpboot directory:
• tuning.cust
• script.cust
• firstboot.cust
In our environment, the /tftpboot directory of the CWS contains the files listed
in Figure 109 on page 269.
We will now describe in more detail the role of each of these files.
9.3.2.1 <spot_name>.<archi>.<kernel_type>.<network>
Files with this format of name are bootable images. The naming convention
is:
• <spot_name> Name of the spot from which this bootable image has been
created. It is identical to the name of a spot subdirectory located under
/sdpata/sys1/install/<aix_level>/spot. In our environment, the spot name is
spot_aix432.
• <archi> is the machine architecture that can load this bootable image. It is
one of rs6k, rspc, or chrp.
• <kernel_type> refers to the number of processors of the machine that can
run this image. It is either up for a uniprocessor or mp for a multiprocessor.
• <network> depends on the type of network adapter through which the
client machine will boot on this image. It can be ent, tok, fddi, or generic.
For each node, the tftpboot directory will contain a symbolic link to the
appropriate bootable image. You can see an example of this in Figure 109 on
page 269, where this file is called spot_aix432.rs6k.mp.ent.
9.3.2.2 <hostname>-new-srvtab
These files are created by the create_krb_files wrapper of setup_server.
<hostname> is the reliable host name of an SP node. For each client node of
a boot/install server, one such file is created in the server /tftpboot directory.
This file contains the passwords for the rcmd principals of the SP node. Each
SP node retrieves its <hostname>-new-srvtab file from the server and stores
it in its /etc directory as krb-srvtab.
9.3.2.3 <hostname>.install_info
These files are created by the mkinstall wrapper of setup_server.
<hostname> is the reliable host name of an SP node. For each client node of
a boot/install server, one such file is created in the server /tftpboot directory.
This file is a shell script containing mainly shell variables describing the node
en0 IP address, host name, boot/install server IP address, and hostname.
After the node AIX image has been installed through the network, the
pssp_script script downloads the <hostname>.install_info file into its own
/tftpboot directory, and it executes this shell to define the environment
variable it needs to continue the node customization.
9.3.2.4 <hostname>.config_info
These files are created by the mkconfig wrapper of setup_server. <hostname>
is the reliable host name of an SP node. For each client node of a boot/install
server, one such file is created in the server /tftpboot directory.
9.3.2.5 tuning.cust
The tuning.cust file is a shell script that sets tuning options for IP
communications. A default sample file is provided with PSSP in
/usr/lpp/ssp/samples/tuning.cust. Three files are also provided that contain
recommended settings for scientific, commercial, or development
environments (in /usr/lpp/ssp/install/config).
Before starting the installation of the nodes, you can copy one of the three
pre-customized files into the /tftpboot directory of the CWS, or you can
provide your own tuning file. Otherwise, the default sample will be copied to
/tftpboot by the installation scripts.
During the installation of additional boot/install servers, the tuning.cust file will
be copied from the CWS /tftpboot directory to each server /tftpboot directory.
During the installation of each node, the file will be downloaded to the node. It
is called by the /etc/rc.net file; so, it will be executed each time a node
reboots.
You should note that tuning.cust sets ipforwarding=1. So you may want to
change this value for nodes that are not IP gateways directly in the
/tftpboot/tuning.cust on the node (not on boot/install servers).
9.3.2.6 script.cust
The script.cust file is a shell script that will be executed at the end of the node
installation and customization process before the node is rebooted. The use
of this file is optional. It is a user provided customization file. You can use it to
perform additional customization that requires a node reboot to be taken into
account.
Typically, this script is used to set the time zone, modifying paging space, and
so on. It can also be used to update global variables in the /etc/environment
file.
9.3.2.7 firstboot.cust
The firstboot.cust file is a shell script that will be executed at the end of the
node installation and customization process after the node is rebooted. The
use of this file is optional. It is a user provided customization file. This is the
recommended place to add most of your customization.
This file should be used for importing a volume group, defining a host name
resolution method used on a node, or installing additional software.
Note
At the end of the execution of the firstboot.script, the host name resolution
method (/etc/hosts, NIS, DNS) MUST be defined and able to resolve all IP
addresses of the SP system: CWS, nodes, the Kerberos server, and the
NTP server. If it is not, the reboot process will not complete correctly.
If you do not define this method sooner, either by including configured file
in the mksysb image or by performing the customization in the script.cust
file, you must perform this task in the firstboot.cust file.
9.3.3 /usr/sys/inst.images
This directory is the standard location for storing an installable lpp image on
an AIX system when you want to install the lpp from disk rather than from the
distribution media (tape, CD). You can, for example, use it if you want to
install on the CWS another product than AIX and PSSP.
9.3.4 /spdata/sys1/install/images
The /spdata/sys1/install/images directory is the repository for all AIX
installable images (mksysb) that will be restored on the SP nodes using the
PSSP installation scripts and the NIM boot/install servers configured during
the CWS installation.
If you want to use the default image provided with PSSP (spimg), you must
store it in the /spdata/sys1/install/images directory.
If all nodes have an identical software configuration (same level of AIX and
LPPs), they can share the same mksysb image independently from their
hardware configuration.
If your SP system has several boot/install servers, the installation script will
automatically create the /spdata/sys1/install/images directory on the
boot/install servers and load it with the mksysb images needed by the nodes
that will boot from each of these servers.
9.3.5 /spdata/sys1/install/<aix_level>/lppsource
For each level of AIX that will be running on a node in the SP system, there
must exist on the CWS an /spdata/sys1/install/<aix_level>/lppsource
directory. The recommended rule is to set the relative pathname <aix_level>
to a name significantly indicating the level of AIX: aix414, aix421, aix432.
However, this is not required, and you may choose whatever name you wish.
This directory must contain the AIX lpp images corresponding to the AIX
level. In addition, this directory must contain the perfagent code
corresponding to the AIX level. Refer to the 8.6.1, “PSSP Prerequisites” on
page 242 for the minimal sets of AIX and perfagent lpp to install in this
directory. Starting with AIX release 4.3.2, perfagent.tools is part of AIX and
not PAIDE as it used to be for previous AIX releases.
If the SP system contains several boot/install server, this directory will only
exist on the CWS. It will be known as a NIM resource by all servers but will be
defined as hosted by the CWS. When a node needs to use this directory, it
mounts it directly from the CWS whatever NIM master it is pointing at.
9.3.6 /spdata/sys1/install/pssplpp/PSSP-x.x
For each level of PSSP that will be used by either the CWS or a node in the
SP system, there must exist on the CWS a
/spdata/sys1/install/pssplpp/PSSP-x.x directory where PSSP-x.x is one of
PSSP-2.2, PSSP-2.3, PSSP-2.4 or PSSP-3.1.
If the SP system contains more than one boot/install server, the installation
scripts will create the /spdata/sys1/install/pssplpp/PSSP-x.x directories on
each server and load them with the PSSP lpp filesets.
9.3.7 /spdata/sys1/install/pssp
You can create this directory manually on the CWS in the first steps of the
PSSP installation. The CWS installation script will then store in this directory
several files that will be used later during the nodes installation through the
network.
9.3.8 image.data
In a mksysb system image, the image.data file is used to describe the rootvg
volume group. In particular, it contains the size of the physical partition
(PPSIZE) of the disk from which the mksysb was created. You usually do not
need to modify this file. However, if the mksysb is to be restored on a node
SP Manuals
The reader can refer to two sets of documents related to either version 2.4 or
version 3.1 of PSSP.
PSSP: Command and Technical Reference , GC23-3900, for PSSP 2.4 and
PSSP: Command and Technical Reference , SA22-7351 for PSSP 3.1 contain
a complete description of each command listed in 9.2, “Installation Steps and
Associated Key Commands” on page 249
SP Redbooks
RS/6000 SP: PSSP 2.2 Survival Guide, SG24-4928. Chapter 2 contains
practical tips and hints about specific aspects of the installation process.
Others
We recommend the use of either AIX 4.2 Network Installation Management
Guide and Reference, SC23-1926 or AIX 4.3 Network Installation
Management Guide and Reference , SC23-4113 for getting any detailed
information about NIM.
This chapter presents some of the commands and methods available to the
SP administrator to check that the SP system has been correctly configured,
initialized, and started.
Section 10.3, “Key Commands” on page 279 presents the commands that are
available for checking various aspects of an SP system. Section 10.4,
“Graphical User interface” on page 288 give a few hints about the use of
spmon -g and Perspectives. Section 10.5, “Key Daemons” on page 290
focuses on the daemons that are important to monitor in an SP system.
Section 10.6, “SP Specific Logs” on page 292 lists the logs that are available
to the user to check the execution of commands and daemons.
You should also verify that you have installed all PSSP filesets corresponding
to your SP hardware configuration and to the options you wish to use (VSD,
RVSD, and so on).
For example, to check all PSSP related filesets, you can use:
lppdiff -Ga ssp* rsct*
The SYSMAN_test command is executed on the CWS, but it does not restrict its
checking to components of the CWS. If nodes are up and running, it will also
perform several tests on them. Subsets of the components checked by
SYSMAN_test are: ntp, automounter, file collection, user management, nfs
daemons, /.klogin file, and so on.
The CSS_test command can be used to check that the ssp.css lpp has been
correctly installed. In particular, CSS_test checks for inconsistencies between
the software levels of ssp.basic and ssp.css. This is why we present this
command in this section. However, it is also useful to run this command on a
completely installed and running system where the switch has been started
since it will also check that communication can be performed over the Switch
between the SP nodes.
You can also use lssrc on SP nodes or to get detailed information about a
particular subsystem. Figure 112 on page 284 shows a long listing of the
status of the Topology Services subsystem on one of the SP nodes.
10.3.3.2 syspar_ctrl -E
The syspar_ctrl command is the PSSP command providing control of the
system partition-sensitive subsystems. In 9.2.11, “Start System
Partition-Sensitive Subsystems” on page 258, we have seen that the -A
option of this command adds and starts the subsystems.
You can then use the other options of syspar_ctrl to stop, refresh, start, or
delete subsystems that were reported as manageable by syspar_ctrl -E.
[sp3en0:/]# spmon -d -G
1. Checking server process
Process 16262 has accumulated 42 minutes and 14 seconds.
Check ok
3. Querying frame(s)
1 frame(s)
Check ok
4. Checking frames
5. Checking nodes
--------------------------------- Frame 1 -------------------------------------
Frame Node Node Host/Switch Key Env Front Panel LCD/LED is
Slot Number Type Power Responds Switch Fail LCD/LED Flashing
-------------------------------------------------------------------------------
1 1 high on yes yes normal no LCDs are blank no
5 5 thin on yes yes normal no LEDs are blank no
6 6 thin on yes yes normal no LEDs are blank no
7 7 thin on yes yes normal no LEDs are blank no
8 8 thin on yes yes normal no LEDs are blank no
9 9 thin on yes yes normal no LEDs are blank no
10 10 thin on yes yes normal no LEDs are blank no
11 11 thin on yes yes normal no LEDs are blank no
12 12 thin on yes yes normal no LEDs are blank no
13 13 thin on yes yes normal no LEDs are blank no
14 14 thin on yes yes normal no LEDs are blank no
15 15 wide on yes yes normal no LEDs are blank no
[sp3en0:/]#
This option is generally used when writing script. For interactive use, it is
easier to use the Graphical tools provided by PSSP (see 10.4, “Graphical
User interface” on page 288).
10.3.6.1 SDRGetObjects
The SDRGetObjects command extracts information about all objects in a class.
For example, you can list the reliable hostname of all SP nodes:
[sp3en0:/]# SDRGetObjects Node reliable_hostname
reliable_hostname
sp3n01.msc.itso.ibm.com
sp3n05.msc.itso.ibm.com
sp3n06.msc.itso.ibm.com
sp3n07.msc.itso.ibm.com
sp3n08.msc.itso.ibm.com
sp3n09.msc.itso.ibm.com
sp3n10.msc.itso.ibm.com
sp3n11.msc.itso.ibm.com
sp3n12.msc.itso.ibm.com
sp3n13.msc.itso.ibm.com
sp3n14.msc.itso.ibm.com
sp3n15.msc.itso.ibm.com
[sp3en0:/]#
10.3.6.2 splstdata
The SDRGetObjects command is very powerful and is often used in SP
management script files. However, its syntax is not very suitable for everyday
interactive use by the SP administrator since it requires that you remember
the exact spelling of classes and attributes. PSSP provides a front end to
SDRGetObjects for the most often used queries: splstdata. This command
offers many options. We have already presented options -a, -b, -f, and -n in
Section 9.2.4, “Check the Previous Installation Steps” on page 253. You must
also know how to use:
splstdata -v to display volume group information (PSSP 3.1 only)
splstdata -s to access switch information
splstdata -h to extract hardware configuration information
splstdata -i to display node IP configuration
splstdata -e to display site environment information
Each option of the splstdata can be called from an entry in the SMIT List
Database Information window (smitty list_data) or one of its subwindows.
The Perspective initial panel, Launch Pad, is customizable. You can add
icons to this panel for the actions you use often. By default, the Launch Pad
contains shortcuts to some of the verification commands we have presented
in previous sections:
• Monitoring of hostsResponds, switchResponds, nodePowerLEDs
• SMIT SP_verify
• syspar_ctrl -E
SDR sdrd
Job Switch Resource Table Services Job Switch Resource Table Services
Sysctl sysctld
10.5.1 Sdrd
The sdrd daemon runs on the CWS. It serves all request from any client
application to manipulate SDR information. It is managed using the AIX SRC
commands. There is an entry for sdrd in the /etc/inittab, and sdrd is started at
CWS boot time. This daemon must be running before any SP management
action can be performed.
You can use any of the following commands to check that the sdrd is running:
10.5.2 Hardmon
The hardmon daemon runs on the CWS. It manages the serial port of the
CWS that are connected to the SP frame. It controls all frames and node
hardware through an SP specific protocol for communicating over the serial
links. It also manages the S70d, which performs the hardware monitoring of
non-SP frames over serial links. There is an entry for hardmon in the
/etc/inittab, and it is started at CWS boot time.
10.5.3 Worm
The worm runs on all SP nodes in an SP system equipped with a switch. The
worm is started by the rc.switch script, which is started at node boot time.
The worm must be running on the primary node before you can start the
switch with the Estart command. We recommend that you refer to Chapter 14
of the PSSP:Administration Guide, GC23-3897 for PSSP 2.4 and PSSP:
Administration Guide, SA22-7348 for PSSP 3.1. for more details about the
Switch daemons.
Most of the SP related logs can be found in /var/adm/SPlogs on the CWS and
on the SP nodes. A few other logs are stored in /var/adm/ras and
/var/tmp/SPlogs.
You generally only look at logs for problem determination. For the purpose of
this chapter (verifying the PSSP installation and operation), we will only
mention the /var/adm/SPlogs/sysman directory. On each SP node, this
directory contains the trace of the AIX and PSSP installation, their
configuration, and the execution of the customization scripts described in
Section 9.3.2, “/tftpboot” on page 268. We recommend that you look at this
log after the installation of a node to check that it has successfully completed.
The installation of a node involves the execution of several processes that
are not linked to a terminal (scripts defined in /etc/inittab, for example). You
may not notice that some of these scripts have failed if you do not search for
indication of their completion in the /var/adm/SPlogs/sysman directory.
SP Manuals
The reader can refer to the related document for Version 2.4 of PSSP:
PSSP: Command and Technical Reference , GC23-3900, for PSSP 2.4 and
PSSP: Command and Technical Reference , SA22-7351 for PSSP 3.1 contain
a complete description of each command listed in 10.3, “Key Commands” on
page 279.
Chapter 19 of PSSP: Diagnosis and Messages , GC23-3899 for PSSP 2.4 and
Chapter 24 of IBM Parallel System Support Programs for AIX: Diagnosis
Guide, GA22-7350 for PSSP 3.1 describe, in detail, the verification of System
Management installation using the SYSMAN_test command.
SP Redbooks
RS/6000 SP Monitoring: Keeping It Alive, SG24-4873. Chapter 5 provides
you with a detailed description of the Perspectives graphical user interface.
RS/6000
Cluster. A group of machines that are able to run LoadLeveler jobs. Each
member of the cluster has the LoadLeveler software installed.
Job Step. A job command file specifies one or more executable programs to
be run. The executable and the conditions under which it is run are defined in
a single job step. The job step consists of several LoadLeveler command
statements.
Load-Leveler Job
exit status = 0
job Process
step 3 Data
Figure 118 on page 300 shows how these machine types fit together and the
order in which they communicate.
Central
2. Job Manager 3. Machine
Information Information
Scheduling Executing
Machine Machine
1. Submit a 4. Send
job the job
The system administrator can specify several different parameters that are
used to calculate SYSPRIO. Examples of these are: How many other jobs the
LoadLeveler also supports the concept of job classes. These are defined by
the system administrator and are used to classify particular types of jobs. For
example, we define two classes of jobs that run in the clusters called night
jobs and day jobs. We might specify that executing machine A, which is very
busy during the day because it supports a lot of interactive users, should only
run jobs in the night class. However, machine B, which has a low workload in
the day, could run both. LoadLeveler can be configured to also take job class
in to account when it calculates SYSPRIO for a job.
Third Tier:
Central Coordinator Coordinate and administer
Data Managers
n First Tier:
Providing, Archiving, Reporting
Handling Statistics Requests
1 2
Figure 119. PTPE Monitoring Hierarchy
If the primary CWS fails, the backup CWS can assume all CWS functions with
the following exceptions:
• Updating passwords (if SP User Management is in use)
• Adding or changing SP users
• Changing Kerberos keys (the backup CWS is typically configured as a
secondary authentication server)
• Adding nodes to the system
• Changing site environment information
NetTAPE can:
• Consolidate Control of Distributed Tape Operations
NetTAPE provides a single system image of all of the network's tape
devices. Tape device allocation, mount queue management, and tape
device monitoring functions are performed using a graphical user
interface.
SP Manuals
Product manuals are very helpful for installing, configuring, and managing
these products. If you are interested in installing and configuring these
SP Redbooks
There are many redbooks that cover each one of these products in great
detail. However, since the idea is to get an understanding only, we
recommend the book Inside the RS/6000 SP, SG24-5145. This book covers
most of the products in more detail than they appear here; so this redbook
may be useful if you want to explore the product in greater depth.
Once PSSP has been configured and installed, you may need to install and
configure additional products before you may start using your applications.
These products, although some of them are not part of PSSP, are usually
installed and configured in RS/6000 SP environments.
This chapter provides the basic concepts and setup procedures for
understanding, installing, and configuring additional RS/6000 SP products.
Application Application
Cache Cache
VSD VSD
LVM IP IP LVM
IP Network
(SP Switch)
lv_X lv_Y
In Figure 121, there are two logical devices (lv_X and lv_Y). Each one is
owned by Node X and Node Y respectively. When applications in Node X
need to access lv_X, they will go trough the logical volume manager as usual
for local access. However, when they need to access lv_Y, which is remote,
the VSD layer will take the requirement and ship it through a TCP/IP network
(in this case the SP Switch) to the disk server for lv_Y. For the application on
Node X, both accesses were of the same kind (both access special devices
named /dev/lv_X and /dev/lv_Y, respectively).
The nodes that manage physical disks are called VSD Server, and those that
only access VSD disks are called VSD Clients. A VSD Server can be a VSD
client.
In order to use VSD, it is necessary to install the VSD filesets on all the nodes
that are going to be using or managing VSD disks. The VSD filesets have
changed from PSSP 2.4 to PSSP 3.1, as shown in Table 26.
Table 26. VSD Filesets
If you are working with PSSP level older than PSSP 3.1, make the conversion
to the correspondent fileset according to Table 26.
Note
The IBM Virtual Shared Disk Perspective component is in ssp.vsdgui. The
PostScript file for VSD manual and the man pages for the related
commands are contained in ssp.docs. They are in the ssp install image
which should be installed on the control workstation.
If you are going to use HSD, then on HSD server and client nodes:
• vsd.hsd
#acl#
# These are the users that can issue sysctl_vsdXXX command on this node
# Name must have a Kerberos name format which defines user@realm
# Please check your security administrator to fill in correct realm name
# you may find realm name from /etc/krb.conf
# _PRINCIPAL [email protected]
_PRINCIPAL [email protected]
# _PRINCIPAL [email protected]
# _PRINCIPAL [email protected]
This file should be copied to all the nodes where VSD has been installed.
Once copied, check that you have authorization to the VSD nodes.
To check your sysctl authorization, first run the klist command to look at your
ticket and then run the sysctl whoami command and compare both:
[sp3en0:/]# klist
Ticket file: /tmp/tkt0
Principal: [email protected]
To check that you can run VSD multinode commands, use the following
command:
[sp3en0:/]# vsdsklst -n 1,15
>> sp3n01.msc.itso.ibm.com
Node Number:1; Node Name:sp3n01.msc.itso.ibm.com
Volume group:rootvg; Partition Size:4; Total:537; Free:233
Physical Disk:hdisk0; Total:537; Free:233
Not allocated physical disks:
Physical disk:hdisk1; Total:2.2
<<
>> sp3n15.msc.itso.ibm.com
Node Number:15; Node Name:sp3n15.msc.itso.ibm.com
Volume group:rootvg; Partition Size:4; Total:958; Free:665
Physical Disk:hdisk0; Total:479; Free:311
Physical Disk:hdisk3; Total:479; Free:354
Not allocated physical disks:
Physical disk:hdisk1; Total:2.0
Physical disk:hdisk2; Total:2.0
<<
This command lists information about physical and logical volume manager
states as seen by the IBM Virtual Shared Disk software.
In this case, VSD have been installed and configured in node 1 and node 15.
12.2.3 Configuring
At this point in the installation, you are required to define and enter disk
parameters for the VSD nodes into the System Data Repository (SDR).
This can be done through the vsdnode command or the IBM Virtual Shared
Disk Perspective graphical interface ( spvsd command). The syntax for the
vsdnode command is as follows:
For example, to define and configure nodes 1 and 15, we should run the
following command:
vsdnode 1 15 css0 256 256 256 48 4096 262144 2 61440
Once the nodes have been designated, we can start creating VSD disks on
the designated nodes. To create a VSD disk, you have to decide first which
volume group your are going to use. It can be rootvg or a global volume group
you have previously created.
Volume groups used for virtual shared disks must be given a global name that
is unique across system partitions.
This tasks is always done, but you do not have to always perform it. The
Create... actions and the comparable createvsd and createhsd commands do
You can use the Run Command... action and run the vsdvg command to
define global volume groups.
If you are using VSD to create the logical volumes and define the global
volume groups for you, then it is a good idea to check old rollback files. Refer
to IBM Parallel System Support Programs for AIX: Managing Shared Disks ,
SA22-7349 for details on how to check old rollback files.
You can create a virtual shared disks with the graphical user interface action
or a line command (on both primary and secondary nodes if you have the IBM
Recoverable Virtual Shared Disk component running). You must first have
used the IBM Virtual Shared Disk Perspective or the vsdnode command to set
up information in the SDR about each node involved in this virtual shared disk
configuration.
To create virtual shared disk using the IBM Virtual Shared Disk Perspective,
launch the graphical interface using the spvsd command. Figure 124 on page
316 shows the initial start-up window.
In the main window select the View->Add Pane as shown Figure 125 on
page 317. Once the new pane has been added, you can create virtual shared
disks by selecting the Create... option from the Action menu.
When creating virtual shared disks, you have to enter the pertinent
information in the dialog box or as arguments to the createvsd command.
The window for creating virtual shared disks is shown in Figure 126 on page
318.
If you prefer the command line interface instead, you can use the createvsd
command as follows:
createvsd -n 1,15 -s 4 -g ITSOVG -v ITSOVSD
This can be seen from the two nodes we have just configured and that
created the virtual shared disks:
No secondary nodes are defined. The space allocated to a virtual shared disk
is spread across all the physical disks (hdisks) within its local volume group
on each node (1 and 15).
To assign each disk in the previous example a secondary node (with the IBM
Recoverable Virtual Shared Disk component running), type:
createvsd -n 1/5/,15/6/ -s 4 -g ITSOVG -v ITSOVSD
After you have created your virtual shared disks, you must configure them on
all nodes that need to read from and write to them.
If you want recoverability, you should also have installed the IBM
Recoverable Virtual Shared Disk software on each virtual shared disk node.
In this case, you can use the actions from the Nodes pane Control IBM
RVSD subsystem..., which will automatically configure and activate all the
virtual shared disks as soon as quorum is met and activates recoverability on
all the virtual shared disk nodes after you set the state to Initial Reset. If you
prefer to use the command ha_vsd reset, you must run it on each virtual
shared disk node.
To configure all the virtual shared disks, you can use the IBM Virtual Shared
Disk Perspective (spvsd) graphical interface, or you can use the command
cfgvsd. From the graphical interface, select the nodes you want to configure
and then select Configure IBM VSDs... from the Actions menu. Figure 127
on page 320 shows the graphical window for configuring the virtual shared
disks we previously defined.
To check the status of your virtual shared disks, you may use the lsvsd
command as follows:
# lsvsd -l
minor state server lv_major lv_minor vsd-name option
size(MB)
1 ACT 1 34 1 ITSOVSD1n1 nocache 4
2 ACT 15 0 0 ITSOVSD2n15 nocache 4
The state column represents the state of the virtual shared disk.
Before you start your virtual shared disks, you have put the virtual shared
disks you just configured into a suspended state. To do this, you use the
preparevsd command. Once the virtual shared disks are in a suspended state,
you can use the resumevsd command to make them active.
Figure 128 on page 321 shows all the possible states of a virtual shared disk
and the transitions between states.
With reference to Figure 129, Nodes X, Y, and Z form a group of nodes using
VSD. RVSD is installed on Nodes X and Y to protect VSDs rvsd_X and
rvsd_Y. Nodes X and Y physically connect to each other’s disk subsystem
where the VSDs reside. Node X is the primary server for rvsd_X and the
secondary server for rvsd_Y and vice versa for Node Y. Should Node X fail,
RVSD will automatically failover rvsd_X to Node Y. Node Y will take
ownership of the disks, vary-on the volume group containing rvsd_X, and
make the VSD available. Node Y serves both rvsd_X and rvsd_Y. Any I/O
operation that was in progress and new I/O operations against rvsd_X are
suspended until failover is complete. When Node X is repaired and rebooted,
RVSD switches the rvsd_X back to its primary Node X.
RVSD subsystems are shown in Figure 130 on page 323. The rvsd daemon
controls recovery. It invokes the recovery scripts whenever there is a change
in the group membership. When a failure occurs, the rvsd daemon notifies all
surviving providers in the RVSD node group so they can begin recovery.
Communication adapter failures are treated the same as node failures.
A user sees a GPFS file system as a normal file system. Although it has its
own support commands, usual file system commands, such as mount and df,
work as expected on GPFS. GPFS file systems can be flagged to mount
automatically at boot time. GPFS supports relevant X/OPEN standards with a
few minor exceptions. Large NFS servers, constrained by I/O performance,
are likely candidates for GPFS implementations.
The GPFS daemon runs on every node participating in the GPFS domain and
may take on different personalities. Since GPFS is not the client-server type
of file system, as NFS or AFS may be seen, it uses the concept of VSD
servers, which are nodes physically connected to disks. Each node running
GPFS (including VSD servers) will use the virtual shared disk extensions to
access the data disks.
GPFS works within a system partition, and the node in this partition running
GPFS will be able to access any defined GPFS file system. In order to access
the file systems created in GPFS, nodes need to mount them like any other
file system. To mount the file systems, nodes have two options:
• Nodes running GPFS
For these nodes, mounting a GPFS file system is the same as mounting
any local (JFS) file system. The mounting has no syntax difference with
the local mounting done with JFS. At creation time, GPFS file systems can
be set to be mounted automatically when the nodes start up.
• Nodes not running GPFS
For these nodes, GPFS file system can be made available through NFS.
Nodes running GPFS, and after mounting the file systems, can NFS
export them. The same applies to any NFS-capable machine.
12.4.1 Requirements
GPFS environment is specific to AIX on the RS/6000 SP. Various software
requirements must be installed and configured correctly before you can
create a GPFS file system.
GPFS also requires the IBM Virtual Shared Disk and the IBM Recoverable
Virtual Shared Disk products, which level are defined by the level of PSSP
installed. So, if PSSP 2.4 is installed, VSD and RVSD Version 2.1.1 are
required. If PSSP 3.1 is used, then VSD and RVSD 3.1 are required.
GPFS requires RVSD even though your installation does not have twin-tailed
disks or SSA loops for multi-host disk connection.
GPFS tasks cannot be done on the CWS; they must be performed on one of
the GPFS nodes.
There are three areas of consideration when GPFS is being setup: The nodes
using GPFS, the VSDs to be used, and the FS to be created. Each area is
now examined. A sample FS setup consisting of four nodes is provided.
Nodes 12, 13, and 14 are GPFS nodes, while node 15 is the VSD server
node.
Warning
Do not attempt to start the mmfsd daemon prior to configuring GPFS.
Starting the mmfsd daemon without configuring GPFS causes dummy
kernel extensions to be loaded, and you will be unable to create a FS. If
this occurs, configure GPFS and then reboot the node(s).
Carry out the following procedures to configure GPFS, then start the
mmfsd daemon to continue creating the FS.
Nodes
The first step in setting up GPFS is to define which nodes are GPFS nodes.
The second step is to specify the parameters for each node.
The Node Count is an estimate of the maximum number of nodes that will
mount the FS and is entered into the system only when the GPFS FS is
created. It is recommended to overestimate this number. This number is used
in the creation of GPFS data structures that are essential for achieving the
maximum degree of parallelism in file system operations. Although a larger
estimate consumes a bit more memory, insufficient allocation of GPFS data
structures can limit a node’s ability to process certain parallel requests
efficiently, such as the allotment of disk space to a file. If it is not possible to
estimate the number of nodes, apply the default value of 32. A larger number
may be specified if more nodes are expected to be added. However, it is
important to avoid wildly overestimating since this can affect buffer
operations. This value cannot be changed later. The FS must be destroyed
and recreated.
A node list is a file that specifies to GPFS the actual nodes to be included in
the GPFS domain. This file may have any file name. However, when GPFS
configures the nodes, it copies the file to each GPFS node as
/etc/cluster.nodes. The GPFS nodes are listed one per line in this file, and the
switch interface is to be specified because this is the interface over which
GPFS runs.
Figure 131 on page 327 is an example of a node list file. The file name in this
example is /var/mmfs/etc/nodes.list.
Having chosen the nodes that form the GPFS domain, there is the option to
choose which of these nodes are to be considered for the personality of stripe
group manager. There are only three nodes in the GPFS domain in this
example; so, this step is unnecessary. However, if there are a large number
of nodes in the GPFS domain, it may be desirable to restrict the role of stripe
group manager to a small number of nodes. This way, if something happens
and a new stripe group manager has to be chosen, GPFS can do so from a
smaller set of the nodes (the default is every GPFS node). To carry this out,
follow the format for creating a node list to create the file
/var/mmfs/etc/cluster.preferences (this file name must be followed).
To configure GPFS, you can use SMIT panels or the mmconfig command. The
mmconfig command is further described in General Parallel File System by
AIX: Installation and Administration Guide , SA22-7278. The SMIT panel
maybe accessed by typing smit gpfs and then selecting the Create Startup
Configuration option. Figure 132 on page 328 shows the SMIT panel used to
configure GPFS (this is being run on node 12 in our example). This step
needs to be run on only one node in a GPFS domain.
The pagepool and mallocsize options specify the size of the cache on each
node dedicated for GPFS operations. mallocsize sets an area dedicated for
holding GPFS control structures data, while pagepool is the actual size of the
cache on each node. In this instance, pagepool is specified to the default size
of 4 M while mallocsize is specified to be the default of 2 M where M stands
for megabytes and must be included in the field. The maximum values per
node are 512 MB for pagepool and 128 MB for mallocsize.
The priority field refers to the scheduling priority for the mmfsd daemon. The
concept of priority is beyond the scope of this redbook. Please refer to AIX
documentations for more information.
Further information, including details regarding the values to set for pagepool
and mallocsize, is available in the manual General Parallel File System for
AIX: Installation and Administration Guide , SA22-7278.
Once GPFS has been configured, mmfsd has to be started on the GPFS
nodes before a FS can be created. Here are the steps to do so:
1. Set the WCOLL environment variable to target all GPFS nodes for the dsh
command. PSSP: Administration Guide, SA22-7348, PSSP: Command
and Technical Reference, SA22-7351, and IBM RS/6000 SP Management,
Easy, Lean, and Mean, GG24-2563 all contain information on the WCOLL
environment variable.
2. Designate each of the nodes in the GPFS domain as an IBM VSD node.
3. Ensure that the rvsd and hc daemons are active on the GPFS nodes.
Note
It is necessary to have set up at least one VSD. The rvsd and hc do not
start unless they detect the presence of one VSD defined for the GPFS
nodes. This VSD may or may not be used in the GPFS FS; the choice is up
to you.
VSDs
Before the FS can be created, the underlying VSDs must be setup. The
nodes with the VSDs configured may be strictly VSD server nodes, or they
can also be GPFS nodes. The application needs to be studied, and a decision
needs to be made as to whether the VSD server nodes are included in the
GPFS domain.
Recall that there are two types of data that GPFS handles: Metadata and the
data itself. GPFS can decide what is stored into each VSD: Metadata only,
data only, or data and metadata. It is possible to separate metadata and data
to ensure that data corruption does not affect the metadata and vice versa.
Further, it can impact performance. This is best seen if RAID is involved.
RAID devices are not suited for handling metadata because metadata is small
in size and can be handled using small I/O block sizes. RAID is most effective
at handling large I/O block sizes. Metadata can, therefore, be stored in a
non-RAID environment, such as mirrored disks, while the data can be stored
in a RAID disk. This protects both data and metadata from each other and
maximizes the performance given that RAID is chosen.
Once the redundancy strategy has been adopted, there are two choices to
creating VSDs: Have GPFS do it for you or manually create them. Either way,
this is done through the use of a Disk Descriptor file. This file can be manually
set up or done through the use of SMIT panels. If using SMIT, run smit gpfs
and then select the Prepare Disk Descriptor File option. Figure 134 on page
332 shows the SMIT panel for our example.
In this case, the VSD vsd1n15 has already being created on node 15
(sp3n15). Do not specify a name for the server node because the system has
all of the information in needs from the configuration files in the SDR. In
addition, the VSD(s) must be in the Active state on the VSD server node and
all the GPFS nodes prior to the GPFS FS creation.
If the VSDs have not been created, specify the name of the disk (such as
hdisk3) in the disk name field instead of vsd1n15 and specify the server
where this hdisk is connected. GPFS then creates the necessary VSDs to
create the FS.
The failure group number may be system generated or user specified. In this
case, a number of 1 is specified. If no number is specified, the system
provides a default number that is equal to the VSD server node number +
4000.
File System
There are two ways to create a GPFS FS: Using SMIT panels or the mmcrfs
command. Figure 135 on page 333 shows the SMIT panel to be used. This is
accessed by running smit gpfs and then selecting the Create File System
option. Details on mmcrfs can be found in General Parallel File System for
AIX: Installation and Administration Guide , SA22-7278.
mmchconfig pagepool=60M -i
It is also possible to add and delete nodes from a GPFS configuration. The
commands to do so are mmaddnode and mmdelnode. Be careful when adding or
subtracting nodes from a GPFS configuration. GPFS uses quorum to
determine if a GPFS FS stays mounted or not. It is easy to break the quorum
requirement when adding or deleting nodes. Adding or deleting nodes
automatically configures them for GPFS usage! Newly added nodes are
considered GPFS nodes in a down state and are not recognized until a restart
of GPFS. By maintaining quorum, you ensure that you can schedule a good
time to refresh GPFS on the nodes.
Deleting a FS
For example, if we want to delete fs1, which we have created in Figure 135
on page 333, we can run:
mmdelfs fs1
The command is mmfsck. This command checks for and repairs the following
file inconsistencies:
• Blocks marked allocated that do not belong to any file. The blocks are
marked free.
• Files for which an i-node is allocated, but no directory entry exists.
mmfsck either creates a directory entry for the file in the /lost+found
directory, or it destroys the file.
• Directory entries pointing to an i-node that is not allocated. mmfsck
removes the entries.
• Ill-formed directory entries. They are removed.
• Incorrect link counts on files and directories. They are updated with the
accurate counts.
• Cycles in the directory structure. Any detected cycles are broken. If the
cycle is a disconnected one, the new top level directory is moved to the
/lost+found directory.
FS Attributes
FS attributes can be listed with the mmlsfs command. If no flags are specified,
all attributes are listed. For example, to list all the attributes of fs1, run:
mmlsfs fs1
To change FS attributes, use the mmchfs command. There are eight attributes
that can be changed:
1. Automatic mount of FS at GPFS startup
2. Maximum number of files
3. Default Metadata Replication
4. Quota Enforcement
5. Default Data Replication
6. Stripe Method
7. Mount point
8. Migrate FS
mmchfs -r 2
The command mmlsattr shows the replication factors for one or more files. If it
is necessary to change this, use the mmchattr command.
For example, to list the replication factors for a file /gpfs/fs1/test.file, run:
mmlsattr /gpfs/fs1/test.file
If the value turns out to be 1 for data replication, and you want to change this
to 2, run:
mmchattr -r 2 /gpfs/fs1/test.file
Re striping a GPFS FS
If disks have been added to a GPFS, you may want to re-stripe the FS data
across all the disks for system performance. This is particularly useful if the
FS is seldom updated, for the data has not had a chance to propagate out to
the new disk(s). To do this, run mmrestripefs.
There are three options with this command, and any one of the three must be
chosen. The -b flag stands for rebalancing. This is used when you simply
want to re-stripe the files across the disks in the FS. The -m flag stands for
migration. This option moves all critical data from any suspended disk in the
FS. Critical data is all data that would be lost if the currently suspended
disk(s) are removed. The -r flag stands for replication. This migrates all data
from a suspended disk and restores all replicated files in the FS according to
their replication factor.
For example, a disk has been added to fs1, and you are ready to re-stripe the
data onto this new disk, run:
mmrestripefs fs1 -b
Query FS Space
The AIX command df shows the amount of free space left in a FS. This can
also be run on a GPFS FS. However, if information regarding how balanced
the GPFS FS is, the command to use is mmdf. This command is run against a
specific GPFS FS and shows the VSDs that make up this FS and the amount
of free space within each VSD.
mmdf fs1
It is, however, possible to run multiple levels of GPFS codes provided that
each level is in its own group within one system partition.
There are two possible scenarios to migrate to GPFS v1.2 from previous
versions: Full and staged. As its name implies, a full migration means that all
the GPFS nodes within a system are installed with GPFS v1.2. A staged
migration means that certain nodes are selected to form a GPFS group with
GPFS v1.2 installed. Once you are convinced that this test group is safe, you
may migrate the rest of your system.
SP Manuals
For the IBM Virtual Shared Disk (VSD) and the IBM Recoverable Virtual
Shared Disk (RVSD), the manual IBM Parallel System Support Programs for
AIX: Managing Shared Disks , SA22-7349 is an excellent guide on installing
and configuring the virtual disk technology especially Chapters 1 to 6.
For the General Parallel Filesystem for AIX (GPFS), this manual will help you:
General Parallel File System for AIX: Installation and Administration Guide ,
SA22-7278.
SP Redbooks
Redbooks are always good references. There are a couple of redbooks that
you may want to take a look at. Inside the RS/6000 SP, SG24-5145 gives you
This chapter covers this by first giving an explanation about the technology
used by all the problem management tools available on the RS/6000 SP. It
then describes two ways of using these tools and setting up monitors for
critical components, such as memory, file system space, and daemons. This
first method is using the command line interface through the Problem
Management subsystem (PMAN), and the second method is using the user
graphical interface (SP Event Perspective).
AIX provides facilities and tools for error logging, system tracing, and system
dumping (creation and analysis). Most of these facilities are included in the
bos.rte fileset within AIX and, therefore, installed on every node and control
workstation automatically. However, some additional facilities, especially
tools, are included in an optionally installable package called
bos.sysmgt.serv_aid that should be installed in your nodes and control
workstation.
By analyzing this log, you can get an idea of what went wrong, when and
possible why. However, due to the way information is presented by the errpt
command, it makes it difficult to correlate errors within a single machine. This
is much worse in the SP where errors could be caused by components on
different machines. We will get back to this point later in this chapter.
The errdemon daemon keeps the log file updated based on information and
errors logged by subsystems through the errlog facility or through the errsave
facility if they are running at kernel level. In any case, the errdemon daemon
adds the entries in the error log in a first-come-first-serve basis.
This error log facility also provides a mechanism through which you could
create a notification object for specific log entries. You could instruct the
errdemon daemon to send you an e-mail every time there is a hardware error.
The IBM Parallel System Support Programs for AIX: Diagnostic Guide ,
GA22-7350, Section "Using the AIX Error Log Notification Facility" on page
72, provides excellent examples on setting up notification methods.
Log analysis is not bad. However, log monitoring is much better. You do not
really want to go and check the error log on every node within your 128 nodes
installation. Probably what you do is to create some notification objects in
your nodes to instruct the errdemon daemon on those nodes to notify you in
case of any critical error getting logged into the error log.
PSSP provides facilities for log monitoring and error notification. This differs
from AIX notification in the sense that although it uses the AIX notification
methods, it provides a global view of your system; so, you could, for example,
create a monitor for your AIX error log on all your nodes at once with a single
command or a few clicks.
Tracing basically works in a two-step mode. You turn on the trace on selected
subsystems and/or calls, and then you analyze the trace file through the
report tools.
The events that can be included or excluded from the tracing facility are listed
in the /usr/include/sys/trchkid.h header file. They are called hooks and
sub-hooks. With these hooks, you can tell the tracing facility which specific
event you want to trace. For example, you could generate a trace for all the
CREAT calls that include file creations.
To learn more about tracing, refer to Chapter 11 "Trace Facility" of the AIX
Problem Solving Guide and Reference , SC23-4123.
In AIX v3, the default location for the system dump is the paging space (hd6).
It means that when the system is started up again, the dump needs to be
moved to a different location. By default, the final location of a system dump
is the /var/adm/ras directory, which implies that the /var file system should
have enough free space to hold this dump. The size of the dump depends on
your system memory and load. It can be obtained (without causing a system
dump) by using the sysdumpdev -e command.
If there is not enough space in /var/adm/ras for copying the dump, the system
will ask you what to do with this dump (throw it away, copy it to tape, and so
on). This is changed for SP nodes since they usually do not have people
staring at the console because there is no console (at least not a physical
console). Similar to machines running AIX v3, the primary dump device is not
hd6 but hd7 (a special dump device); so when the machine boots up, there is
no need for moving the dump since the device is not being use for anything
else. Although your nodes are running AIX v4, so the primary dump device
should be hd6 (paging space), the /etc/rc.sp script will change it back to
/dev/hd7 on every boot.
A system dump certainly can help a lot in determining who took the machine
out of order. A good system dump in the right hands can point to the guilty
component. Keep in mind that a system dump is a copy of selected areas of
Since some components run only on the control workstation (such as the
SDR daemon, the host respond daemon, the switch admin daemon, and so
on), others run only on nodes (such as the switch daemon). This needs to be
taken into consideration in the search for logs. The IBM Parallel System
Support Programs for AIX: Diagnosis Guide , GA22-7350 contains a complete
list of PSSP log files and their location.
Unfortunately, there is not a common rule for analyzing log files. They are
very specific to each component, and, in most of the cases, they are created
as internal debugging mechanisms and not for public consumption.
In this redbook, we cover some of these log files and explain how to read
them. However, this information may be obsolete for the next release of
PSSP. The only official logging information is the AIX error log. However,
nothing is stopping you from reading these log files. As a matter of fact, these
SP log files sometimes are essential for problem determination.
When the SDR data is compiled, the EMCDB is placed in a staging file. When
the Event Manager daemon on a node or the control workstation initializes, it
automatically copies the EMCDB from the staging file to a run-time file on the
node or the control workstation. The run-time file is called
Each time you execute the haemcfg command, or recreate the Event
Management subsystem through the syspar_ctrl command, a new EMCDB
file is created with a new version number. The new version number is stored
in the Syspar SDR class as shown in Figure 139.
To check the version number of the run-time version, you can use the
following command:
or
The way in which Event Manager daemons determine the EMCDB version
has important implications for the configuration of the system. To place a new
version of the EMCDB into production (that is, to make it the run-time
version), you must stop each Event Manager daemon in the domain after the
haemcfg command is run. Stopping the daemons dissolves the existing peer
group. Once the existing peer group is dissolved, the daemon can be
restarted. To check if the peer group has been dissolved, use the following
command:
Once the peer group is dissolved, the daemons can be restarted. As they
restart, the daemons form a new peer group.
13.5.1 Authorization
In order to use the Problem Management subsystem, users need to obtain a
Kerberos principal, and this principal needs to be listed in the access control
list (ACL) file for the PMAN subsystem. This ACL file is managed by the
sysctl subsystem, and it is located at /etc/sysctl.pman.acl. The content of this
file is as follows:
#acl#
# These are the kerberos principals for the users that can configure
# Problem Management on this node. They must be of the form as indicated
# in the commented out records below. The pound sign (#) is the comment
# character, and the underscore (_) is part of the "_PRINCIPAL" keyword,
# so do not delete the underscore.
Each time you make a change to this file, the sysctl subsystem must be
refreshed. To refresh the sysctl subsystem use the following command:
refresh -s sysctld
The pmandef command has a very particular syntax: so, if you want to give it a
try, take a look at the PSSP: Command and Technical Reference,
SA22-7351, on page 350 for a complete definition of this command. Chapter
25 "Using the Problem Management Subsystem" in the PSSP :Administration
Guide, SA22-7348 contains several examples and a complete explanation
about how to use this facility.
Let us assume that you want to get a notification on the console’s screen
each time there is an authentication failure for remote execution. We know
that the remote shell daemon (rshd) logs these errors to the
/var/adm/SPlogs/SPdaemon.log; so, we can create a monitor for this specific
error.
First, we need to identify the error that gets logged into this file every time
somebody tries to execute a remote shell command without the
corresponding credentials. Let us try and watch the error log file:
Feb 27 14:30:16 sp3n01 rshd[17144]: Failed krb5_compat_recvauth
Feb 27 14:30:16 sp3n01 rshd[17144]: Authentication failed from
sp3en0.msc.itso.ibm.com: A connection is ended by software.
Now, there is a small problem to solve. If we are going to check this log file
every few minutes, how do we know if the log entry is new, or if it was already
reported? Fortunately, the way user-defined resource variables work is based
on strings. The standard output of the script you associate with a
user-defined resource variable is stored as the value of that variable. This
means that if we print out the last Authentication failed entry every time, the
variable value will change only when there is a new entry in the log file.
Let’s create the definition for a user-defined variable. To do this, PMAN needs
a configuration file that has to be loaded to the SDR by using the
pmanrmloadSDR command.
In this file, you can define all sixteen user-defined variables (there must be
one stanza per variable). In this case, we have defined the
IBM.PSSP.pm.User_state1 resource variable. The resource monitor
(pmanrmd) will update this variable every 60 seconds as specified in the
sample interval (SampInt). The value of the variable will correspond to the
standard output of the /usr/local/bin/Guard.pl script. Let us see what the
script does:
#!/usr/lpp/ssp/perl5/bin/perl
my $logfile="/var/adm/SPlogs/SPdaemon.log";
my $lastentry;
while (<LOG>) {
if(/Authentication failed/) {
$lastentry = $_;
}
print "$lastentry";
The script printed out the Authentication failed entry from the log file. If there
is no new entry, the old value will be the same as the new value; so, all we
have to do is to create a monitor for this variable that gets notified every time
the value of this variable changes. Let us take a look at the monitor’s
definition:
You can get a complete definition of this resource variable (and others) by
executing the following command:
[sp5en0:/]# haemqvar "" IBM.PSSP.pm.User_state1 "*"|more
This command gives you a very good explanation along with examples on
how to use it.
Now that we have subscribed our monitor, let us see what the
/usr/local/bin/SaySomething.pl script does:
#!/usr/lpp/ssp/perl5/bin/perl
$cwsdisplay = "sp5en0:0";
$term="/usr/dt/bin/aixterm";
$cmd = "/usr/local/bin/SayItLoud.pl";
$title = qq/\"Warning on node $ENV{’PMAN_LOCATION’}\"/;
$msg = $ENV{’PMAN_RVFIELD0’};
$bg = "red";
$fg = "white";
$geo = "60x5+200+100";
system($execute);
This script will open a warning window with a red background notifying the
operator (it is run on node 0, the control workstation) about the intruder.
The script /usr/local/bin/SayItLoud.pl will display the error log entry (the
resource variable value) inside the warning window. Let’s take a look at this
script:
#!/usr/lpp/ssp/perl5/bin/perl
print "@ARGV\n";
print "------ Pres Enter ------\n";
<STDIN>
Now that the monitor is active, let us try to access one of the nodes. We
destroy our credentials (the kdestroy command), and then we try to execute a
command on one of the nodes:
[sp5en0:/]# kdestroy
[sp5en0:/]# dsh -w sp5n01 date
sp5n01: spk4rsh: 0041-003 No tickets file found. You need to run "k4init".
sp5n01: rshd: 0826-813 Permission is denied.
dsh: 5025-509 sp5n01 rsh had exit code 1
After a few second (a minute at most), we receive the warning window shown
in following warning message at the control workstation:
The example shown here is very simple. It is not intended to be complete, but
to show how to use these user-defined resource variables.
The log_event script uses the AIX alog command to write to a wraparound file.
The size of the wraparound file is limited to 64 K. The alog command must be
used to read the file. Refer to the AIX alog man page for more information on
this command.
Through this interface, you can create monitors for triggering events based
on defined conditions and generate actions by using the Problem
Management subsystem when any of these events is triggered.
To better illustrate this point, let’s define a condition for a file system full. This
condition will later be used in a monitor. The following steps are required for
creating a condition:
Step 1 Decide what you want to monitor. In this step, you need to narrow
down the condition you want to monitor. For example: We want to
monitor free space in the /tmp file system. Then, we have to decide on
the particular resource we want to monitor and the condition. We
PSSP provides some facilities to find out the right variable. In releases
previous to PSSP 3.1 the only way to get some information on
resource variables is through the help facility on SP Perspectives.
However, in PSSP 3.1 there is a new command that will help you find
the right variable, and it will provide you with information about how to
use it. Let’s use this new command called haemqvar.
We can use this command to list all the variables related to file
systems as follows:
[sp3en0:/]# haemqvar -d IBM.PSSP.aixos.FS "" "*"
IBM.PSSP.aixos.VG.free Free space in volume group, MB.
IBM.PSSP.aixos.FS.%totused Used space in percent.
IBM.PSSP.aixos.FS.%nodesused Percent of file nodes that are used.
# lsvg | lsvg -i -l
spdata:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
spdatalv jfs 450 450 1 open/syncd /spdata
loglv00 jfslog 1 1 1 open/syncd N/A
rootvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
hd6 paging 64 64 1 open/syncd N/A
hd5 boot 1 1 1 closed/syncd N/A
hd8 jfslog 1 1 1 open/syncd N/A
hd4 jfs 18 18 1 open/syncd /
hd2 jfs 148 148 1 open/syncd /usr
hd9var jfs 13 13 1 open/syncd /var
hd3 jfs 32 32 1 open/syncd /tmp
hd1 jfs 1 1 1 open/syncd /home
Example expression:
As you can see in the Create Condition pane, there are two initial
input boxes for the name (Name) of the condition and the description
(Description). For our example, let’s name the condition
File_System_Getting_Full and give a brief description, such as The
file system you are monitoring is getting full. Better do
something!. This is shown in Figure 144 on page 363.
If you click on Show Details..., it will present you the same output we
got through the haemqvar command. We will leave the last input box
empty, which represents the resources ID that you want to fix. For
example, this resource variable (IBM.PSSP.aixos.FS.%totused) has two
resource IDs. One is the volume group name (VG) and the other is the
logical volume name (LV). By using the last input box, we could have
fixed one or the two resource IDs to a specific file system; so, this
condition could be applied to that particular file system only. However,
leaving this input blank enables us to use this condition in any
monitor.
SP Manuals
The only SP manual that can help you with this is the PSSP: Administration
Guide, SA22-7348 for PSSP 3.1 and the PSSP: Administration Guide,
GC23-3897 for PSSP 2.4. In both books, there is a section dedicated to
availability and problem management as well as SP Perspectives. We
recommend you to read at least Chapters 24 and 25 of the PSSP 3.1 guide
and Chapters 23 and 24 of the PSSP 2.4 guide.
SP Redbooks
There are several books that cover the topics in this Chapter. However, we
recommend three of them. Chapters 2 and 3 of RS/6000 SP Monitoring:
Keeping it Alive, SG24-4873 will give you a good understanding about the
concepts involved. The other redbook is Inside the RS/6000 SP, SG24-5145.
This redbook contains an excellent description of the Event Management and
Problem Management subsystems. Finally, the redbook RS/6000 SP PSSP
2.2 Technical Presentation, SG24-4868, contains detailed information on
these topics.
This chapter discusses how to maintain backup images for the CWS and SP
nodes as well as how to recover the images you created. In addition, we
discuss how to apply the latest PTFs for AIX and PSSP. We provide the
technical steps for information based on the environment we set at the
beginning of this book. Finally, we discuss the overview of software migration
and coexistence.
Remember that the mksysb command backs up only rootvg data. Thus, data
other than rootvg should be backed up with the command savevg or another
backup utility, such as sysback.
Also, remember that the node image you create is a file and is not bootable
so that you should follow the network boot process, as discussed in Chapter
9, “Frames and Nodes Installation” on page 249, to restore it.
There are many ways you can set up an SP node backup depending upon
your environment. Here, we introduce the way we set it up in our
environment.
9 sp3n09 10 sp3n10
en0 en0
7 sp3n07 8 sp3n08
The boot/install server en0 en0
node’s image is created on
5 sp3n05 6 sp3n06
the control workstation’s
en0 en0
/spdata/sys1/install/images
directory 1
sp3n01en1 BIS
en1
192.168.31.11
sp3n01
en0192.168.3.11 /bos.obj.sp3n05.image
The boot/install server’s /bos.obj.sp3n06.image
SWITCH /spdata/sys1/install/images /bos.obj.sp3n07.image
directory is NFS mouted on /bos.obj.sp3n08.image
Frame 1 all nodes /bos.obj.sp3n09.image
/bos.obj.sp3n10.image
/bos.obj.sp3n11.image
/bos.obj.sp3n12.image
/bos.obj.sp3n13.image
/bos.obj.sp3n14.image
The control workstation’s /bos.obj.sp3n15.image
/spdata/sys1/install/images 192.168.3.130
directory is NFS mouted on
/bos.obj.sp3n01.image
the boot/install node
sp3en0
Of course, you can write scripts to automate this process. Due to the nature
of this book, we only introduce the mechanism of the node backup strategy.
The first step is to put the image that you want to restore in the
/spdata/sys1/install/images directory. Then you have to change the network
environment for that node. To do this in PSSP 2.4 or earlier, you do the
following:
PSSP 3.1 has some modifications to the spbootins command; you do not
have the same flags you had in PSSP 2.4 or earlier. If you try to change the
environment using SMIT in PSSP 3.1 with the procedure just described, you
will get a response similar to the following:
spbootins: 0016-601 An option was used that is no longer supported by
this command.
Use the "spchvgobj" command.
spbootins: Syntax:
spbootins [ -c selected_vg ]
[ -r {install | customize | disk | maintenance | diag | migrate }][ -s
yes | no ]{start_frame start_slot node_count | -l <node_list>}
spbootins: Syntax:
spchvgobj < -r < volume_group >
[ -h pv_list ]
[ -i install_image ]
[ -p code_version ]
[ -v lppsource_name ]
[ -n boot_server ]
[ -c 1 | 2 | 3 ]
[ -q true | false ]
{start_frame start_slot node_count | -l <node_list>}
To change the environment in PSSP 3.1, you run the following command:
# spchvgobj -r rootvg -i <image name> -l <node_number>
# spbootins -r install -l <node_number>
Now network boot the node to restore the correct image. You can do this in
another node, different from the original, without worrying about the node
number and specific configuration of it. After the node is installed, pssp_script
customizes it with the correct information.
If the status of the install is OK, then you are done with the update of the AIX
PTFs on the CWS. If the status of the install is that it has failed, then review
the output for the cause of the failure and resolve the problem.
Note
In many cases, the latest PSSP PTFs include the microcode for the
supervisor card. We strongly recommend that you check the state of the
supervisor card after applying the PSSP PTFs.
Note, that before you apply the latest PTFs to the nodes, make sure you
apply the same level of PTFs on the CWS and boot/install server nodes.
For any of the options you choose, it is better to install the PTFs on one node
and do the testing before applying them to all the nodes. In our scenario, we
selected sp3n01 as the test node for installing the PTFs.
1. Log in as root and mount the lppsource directory of the CWS in sp3n01 by
issuing the command:
# mount sp3en0:/spdata/sys1/install/aix432/lppsource /mnt
2. Apply the PTFs using the command:
# smitty update_all
INPUT device for directory / software: /mnt
First run this with the PREVIEW only option set to yes and check that all
prerequisites are met. If it is OK, then go ahead and install the PTFs
with the PREVIEW only option changed back to no.
3. Unmount the directory you had mounted in step1 using the command:
# umount /mnt
4. If everything runs OK on the test node, then prepare the script from the
/smit.script file for the rest of the nodes. As an example, you may create
the following script:
#/use/bin/ksh!
# Name of the Script:ptfinst.ksh
#
mount sp3en0:/spdata/sys1/install/aix432/lppsource /mnt
/usr/lib/instl/sm_inst installp_cmd -a -d ’/mnt’ -f ’_update_all’ ‘-c’
’-N’ ’-g’ ’-X’
umount /mnt
While installing the PTFs, if you get any output saying that a reboot is
required for the PTFs to take effect, you should reboot the node. Before
rebooting a node, if you have a switch, you may need to fence it using the
command:
# Efence -autojoin sp3n01
For installing PSSP PTFs, follow the same procedure except for step 1; you
need to mount the PSSP PTFs directory instead of the lppsource directory.
The command is:
# mount sp3en0:/spdata/sys1/install/pssplpp/PSSP-3.1 /mnt
When updating the ssp.css fileset of PSSP, you must reboot the nodes for the
Kernel extensions to take effect.
It is recommended to make another backup image after you have applied the
PTFs.
Because migration of your CWS, your nodes, or both, is a complex task, you
must do careful planning before you attempt to migrate. Thus, a full migration
plan involves breaking your migration tasks down into distinct, verifiable (and
recoverable) steps and planning of the requirements for each step. A
well-planned migration has the added benefit of minimizing system downtime.
From PSSP Level From AIX Level To PSSP Level To AIX Level
You can migrate the AIX level and update the PSSP levels at the same time.
However, we recommend to migrate the AIX level first without changing the
PSSP level and verify system stability and functionality. Then update the
PSSP.
However, even if you have found your migration path, some products or
components of PSSP have limitations that might restrict your ability to
migrate:
• Switch Management
• RS/6000 Cluster Technology
• Performance Toolbox Parallel Extensions
• High Availability Cluster Multi-Processing
• IBM Virtual Shared Disk
• IBM Recoverable Virtual Shared Disk
For more information about these limitations, refer to the document IBM
RS/6000 SP Planning Volume 2, Control Workstation and Software
Environment, GA22-7281.
Before migrating, you may want to create one or more system partitions. As
an option, you can create a production system partition with your current AIX
and PSSP level software and a test system partition with your target level of
AIX and PSSP 3.1 level software.
Before you migrate any of your nodes, you must migrate your CWS and
boot/install server node to the latest level of AIX and PSSP of any node you
wish to serve. After these general considerations, we now give some details
of the migration process at the CWS level and then at the node level.
To identify the appropriate method, you must use the information in Table 8 in
the document IBM Parallel System Support Install & Migration Guide Version
3 Release 1, GA22-7347, on page 128.
Although the way to migrate a node has not changed with PSSP 3.1, we point
out here how the PSSP 3.1 enhancements can be used when you want to
migrate.
1. Migration Install of Nodes to PSSP 3.1
Set the bootp_response parameter to migrate for the node you migrate. With
the new PSSP 3.1 commands (spchvgobj, spbootins):
If we migrate the nodes 5 and 6 from AIX4.2.1 and PSSP 2.4 to AIX 4.3.2
and PSSP 3.1, we issue the following commands assuming the lppsource
name directory is /spdata/sys1/install/aix432/lppsource:
# spchvgobj -r rootvg -p PSSP-3.1 -v aix432 -l 5,6
# spbootins -r migrate -l 5,6
The SDR is now updated and setup_server will be executed. Verify this
with the command: splstdata -G -b -l <node_list>
Finally, a shutdown followed by a network boot will migrate the node. The
AIX part will be done by NIM; whereas, the script pssp_script does the
PSSP part.
2. mksysb Install of Nodes
This is the node installation that we discussed in Chapter 9, “Frames and
Nodes Installation” on page 249.
3. Update to a new level of PSSP and update to a new modification level of
AIX.
If you are on AIX 4.3.1 and PSSP 2.4 and you want to go to AIX 4.3.2 and
PSSP 3.1, you must first update the AIX level of the node by mounting the
aix432 lppsource directory from the CWS on your node and running the
installp command.
Then, after you have the right AIX level installed on your node, you must
set the bootp_response parameter to customize with the new PSSP 3.1
commands ( spchvgobj, spbootins) for the nodes 5 and 6.
# spchvgobj -r rootvg -p PSSP-3.1 -v aix432 -l 5,6
# spbootins -r customize -l 5,6
14.5.6 Coexistence
PSSP 3.1 can coexist with PSSP 2.2 and later. Coexistence is the ability to
have multiple levels of AIX and PSSP in the same partition.
Table 28 shows what AIX levels and PSSP levels are supported by PSSP 3.1
in the same partition. Any combination of PSSP levels listed in this table can
coexist in a system partition. So, you can migrate to a new level of PSSP or
AIX one node at a time.
Table 28. Possible AIX or PSSP Combinations in a Partition
Some PSSP components and related LPPs still have some limitations. Also,
many software products have PSSP and AIX dependencies.
SP Manuals
Refer to Chapter 6 of PSSP:Installation and Migration Guide (Version 2
Release 4), GC23-3898, and Installation and Migration Guide (Version 3
Release 1), GA22-7347. For details on how to boot from the mksysb tape,
read the AIX Version 4.3 Installation Guide , SC23-4111.
SP Redbooks
RS/6000 SP Software Maintenance , SG24-5160. This redbook provides
everything you need for software maintenance. It is strongly recommended to
read this for real production work. For the sections of backup and PTFs, you
15.2 Environment
This section describes the environment for our RS/6000 SP system. From the
initial RS/6000 SP system, we added a second switched frame and added
one high node, four thin nodes, two Silver nodes, and three wide nodes as
shown in Figure 148 on page 386.
In the Figure 148, sp3n17 is set up as the boot/install server. The Ethernet
adapter (en0) of sp3n17 is cabled to the same segment (subnet 3)of the en0
of sp3n01 and CWS. The en0 of the rest of nodes in frame 2 are cabled with
the en1 of sp3n17 so that they will be in the same segment(subnet 32).
Thus, we install sp3n17, which is a boot/install server, first from CWS. Then
we install the rest of the node from sp3n17. In the following sections, we
15 sp3n15 31 sp3n31
en0 192.168.31.115 en0 192.168.32.131
13 sp3n13 14 sp3n14 29 sp3n29
en0 en0 en0 192.168.32.129
11 sp3n11 12 sp3n12 27 sp3n27
en0 en0 en0 192.168.32.127
192.168.31.XX
9 sp3n09 10 sp3n10 25 sp3n25 26 sp3n26
en0 en0 en0 en0 192.168.32.XX
7 sp3n07 8 sp3n08 23 sp3n23 24 sp3n24
en0 en0 en0 en0
5 sp3n05 6 sp3n06 21 sp3n21 22 sp3n22
en0 en0 en0 en0
1 17
Subnet 31 sp3n01en1 BIS sp3n17en1 BIS
en1
192.168.31.11
en1 192.168.32.117 Subnet 32
sp3n01 sp3n17
en0 192.168.3.11 en0 192.168.3.117
SW ITCH SW ITCH
Frame 1 Frame 2
192.168.3.XX
sp3en0 192.168.3.130
Subnet 3
Figure 148. Environment after Adding a Second Switched Frame and Nodes
1. Archive the SDR on the CWS. Everytime you reconfigure your system, it is
strongly recommended to back up the SDR with the command:
[sp3en0:/]# SDRArchive
SDRArchive: SDR archive file name is
/spdata/sys1/sdr/archives/backup.98350.1559
In case something goes wrong, you can simply restore with the command:
SDRRestore <archive_file>
2. Unpartition your system (Optional) from the CWS.
If your existing system has multiple partitions defined and you want to add
a frame that has a switch, you need to bring the system down to one
partition by using the Eunpartition command before you can add the
additional frame.
3. Connect the frame with RS-232 and recable the Ethernet adapters (en0),
as described in 15.2, “Environment” on page 385, to your CWS.
4. Configure the RS-232 control line.
Each frame in your system requires a serial port on the CWS configured to
accommodate the RS-232 line. Note that SP-attached servers require two
serial lines. Define tty1 for the second Frame:
[sp3en0:/]# mkdev -c tty -t ’tty’ -s ’rs232’ -p ’sa1’ -w ’s2’
5. Enter frame information and reinitialize the SDR.
For SP frames, this step creates frame objects in the SDR for each frame
in your system. At the end of this step, the SDR is reinitialized resulting in
the creation of node objects for each node attached to your frames.
Note
You must perform this step once for SP frames and once for non-SP
frames (SP-attached servers). You do not need to reinitialize the SDR until
you are entering the last set of frames (SP or non-SP).
Note
If frames are not contiguously numbered, repeat this step for each series
of contiguous frames.
The S70 and S70 Advanced Server require two tty port values to define the
tty ports on the CWS to which the serial cables connected to the server are
attached. The spframe tty port value defines the serial connection to the
operator panel on the S70 and S70 Advanced Server for hardware controls.
The s1 tty port value defines the connection to the serial port on the S70 and
S70 Advanced Server for serial terminal (s1term) support. A switch port value
is required for each S70 or S70 Advanced Server attached to your SP.
Specify the spframe command with the -n option for each series of contiguous
non-SP frames. Specify the -r yes option when running the command for the
final series of frames.
If you have 2 S70 servers (frames 3 and 4), then the first server has the
following characteristics:
Frame Number: 3
Note
The SP-attached server in your system will be represented with the node
number corresponding to the frame defined in this step. Continue with the
remaining installation steps to install the SP-attached server as an SP
node.
[sp3en0:/]# spmon -d
1. Checking server process
Process 16264 has accumulated 0 minutes and 0 seconds.
Check ok
2. Opening connection to server
Connection opened
Check ok
3. Querying frame(s)
2 frame(s)
Check ok
4. Checking frames
This step was skipped because the -G flag was omitted.
5. Checking nodes
--------------------------------- Frame 1 -------------------------------------
Frame Node Node Host/Switch Key Env Front Panel LCD/LED is
Slot Number Type Power Responds Switch Fail LCD/LED Flashing
-------------------------------------------------------------------------------
1 1 high on yes yes normal no LCDs are blank no
5 5 thin on yes yes normal no LEDs are blank no
6 6 thin on yes yes normal no LEDs are blank no
7 7 thin on yes yes normal no LEDs are blank no
8 8 thin on yes yes normal no LEDs are blank no
9 9 thin on yes yes normal no LEDs are blank no
10 10 thin on yes yes normal no LEDs are blank no
11 11 thin on yes yes normal no LEDs are blank no
If you are adding an extension node to your system, you may want to enter
the required node information now. For more information, refer to Chapter
9 of Installation and Migration Guide(Version 3 R 1), GA22-7347.
6. Acquire the hardware Ethernet addresses with the command: sphrdward
This step gets hardware Ethernet addresses for the en0 adapters for your
nodes from the nodes themselves and puts them into the Node Objects in
the SDR. This information is used to set up the /etc/bootptab files for your
boot/install servers.
To get all hardware Ethernet addresses for the nodes specified in the node
list (the -l flag), enter:
[sp3en0:/]# sphrdwrad -l 17,21,22,23,24,25,26,27,29,31
A sample output looks like:
Note
• Do not do this step on a production running system because it shuts
down the nodes.
• Select only the new nodes you are adding. All the nodes you select are
powered off and back on.
• The nodes for which you are obtaining Ethernet addresses must be
physically powered on when you perform this step. No ttys can be
opened in write mode.
Note
The command spadaptrs is supported by only two adapters for the Ethernet
(en), FDDI (fi), and Token Ring (tr) in PSSP V2.4 or earlier. However, with
PTFs(ssp.basic.2.4.0.4) on PSSP 2.4 or PSSP3.1, it is changed to support
as many adapters as you can have in the system.
9. Configure initial host names for nodes to change the default host name
information in the SDR node objects with the command sphostnam. The
default is the long form of the en0 host name, which is how the spethernt
command processes defaulted host names. However, we set the
hostname as short name:
[sp3en0:/]# sphostnam -a en0 -f short -l 17,21,22,23,24,25,26,27,29,31
10.Set Up Nodes to Be Installed.
Note
You cannot export /usr or any directories below /usr because an NFS
export problem will occur. If you have exported the
/spdata/sys1/install/image directory or any parent directory, you must
unexport it using the exportfs -u command before running setup_server.
From the output of step 7, we need to change the image name and AIX
version. In addition, we have checked sp3n17 node points to the CWS as
boot/install server, and all the rest of nodes point to sp3n17 as boot/install
server, which is the default in a multi-frame environment. However, if you
need to select the different node to be boot/install server, you can use -n
option of the spchvgobj command.
To change these information in SDR, enter:
[sp3en0:/]# spchvgobj -r rootvg -i bos.obj.ssp.432 -l
17,21,22,23,24,25,26,27,29,31
A sample output looks like:
spchvgobj: Successfully changed the Node and Volume_Group objects for
node number 17, volume group rootvg.
spchvgobj: Successfully changed the Node and Volume_Group objects for
node number 21, volume group rootvg.
spchvgobj: Successfully changed the Node and Volume_Group objects for
node number 22, volume group rootvg.
To select the sample tuning file, issue the cptuning command to copy to
/tftpboot/tuning.cust on the CWS and propagate from there to each node
in the system when it is installed, migrated, or customized.
Note that each node inherits its tuning file from its boot/install server.
Nodes that have as their boot/install server another node (other than the
CWS) obtain their tuning.cust file from that server node; so, it is necessary
to propagate the file to the server node before attempting to propagate it
to the client node. The settings in the /tftpboot/tuning.cust file are
maintained across a boot of the node.
14.Perform additional node customization, such as adding installp images,
configuring host names, setting up NFS, AFS, or NIS, and configuring
adapters that are not configured automatically (optional).
The script.cust script is run from the PSSP NIM customization script
(pssp_script) after the node’s AIX and PSSP software have been installed
but the before the node has been rebooted. This script is run in a limited
environment where not all services are fully configured. Because of this
limited environment, you should restrict your use of script.cust to function
that must be performed prior to the post-installation reboot of the node.
The firstboot.cust script is run during the first boot of the node
immediately after it has been installed. This script runs in a more normal
environment where most all services have been fully configured.
15.Additional switch configuration (optional)
If you have added a frame with a switch, perform:
1. Select a topology file from the /etc/SP directory on the CWS.
Note
SP-attached servers never contain a node switch board, therefore, never
include non-SP frames when determining your topology files.
15.7.4.3 Phase III: Rebuild SDR and Install New 332 MHz SMP Nodes
1. Rebuild SDR with all required node information on the CWS.
2. Replace old nodes with new 332 MHz SMP nodes. Be careful to cable
networks, DASD, and tapes in the proper order (for example, ent1 on
the old SP node should be connected to what will be ent1 on the new
332 MHz SMP Node).
3. Netboot all nodes being sure to select the correct AIX & PSSP levels.
4. Verify AIX and PSSP base code levels on nodes.
5. Verify AIX and PSSP fix levels on nodes and upgrade if necessary.
SP Manuals
To reconfigure your SP system, you should have hands-on experience with
initial planning and implementation. The manuals RS/6000 SP: Planning,
Volume 1, Hardware and Physical Environment, GA22-7280 and RS/6000
SP: Planning, Volume 2, Control Workstation and Software, GA22-7281 give
you a good description of what you need. For details about reconfiguration of
you SP system, you can refer to Chapter 5 of the following two manuals:
PSSP: Installation and Migration Guide (Version 2 R 4) , GC23-3898, and
Installation and Migration Guide(Version 3 R 1), GA22-7347.
Other Sources
Migrating to the RS/6000 SP 332 MHz SMP Node, IBM Intranet:
http://dscrs6k.aix.dfw.ibm.com/
Note that the setup_server script should run on the boot/install servers. If you
have a boot/install server setup other than CWS, run setup_server through the
spbootins command with -s yes (which is the default) on CWS, then
setup_server will run on each boot/install server using dsh and return the
progress message output on CWS.
A NIM client may be in a state that conflicts with your intentions for the node.
You may intend to install a node, but setup_server returns a message that the
nim -o bos_inst command failed for this client. When setup_server runs on the
NIM master to configure this node, it detects that the node is busy installing
and does not reconfigure it. This can happen for several reasons:
• During a node NIM mksysb installation, the client node being installed was
interrupted before the successful completion of the node installation.
• A node was booted in diagnostics or maintenance mode, and now you
would like to reinstall it.
• The node was switched from one boot response to another.
Each of these occurrences causes the client to be in a state that appears that
the node is still installing.
To correct this problem, check with the command lsnim -l <client_name> and
issue the following command for the NIM client:
# nim -Fo reset <client_name>
It is recommended that you should always set back to disk when you switch
boot response from one state to another.
Login to the system specified in the lock file and determine if setup_server is
currently running. If it is not running, remove the lock file and run setup_server
again on the system that failed to create the lppsource resource.
In another case of NIM allocation failures, you may get the following error
messages:
0042-001 nim: processing error encountered on "master":
rshd: 0826-813 Permission is denied. rc=6.
0042-006 m_allocate: (From_Masster) rcmd Error 0
allnimres: 0016-254: Failure to allocate lpp_source resource
lppsource_defualt
from server (node_number) (node_name) to client (node_number)
(node_name)
(nim -o allocate ; rc=1)
This failure is caused by incorrect or missing rcmd support on the CWS, in the
./rhosts file, for the boot/install server nodes. The ./rhosts file needs to have
an entry for the boot/install server hostname when trying to execute the
allnimres command. The setup_server command on the boot/install server
node should correct this problem.
If you have missing installp images from the lppsource directory, download
from the AIX4.3 installation media to /spdata/sys1/install/aix432/lppsource.
Then, remove the lppsource with nim -o remove aix432 and run setup_server.
Then check if the /var file system is full. If this is the case, either define more
space for /var or remove unnecessary files.
Check the Login Control facility to see whether the user's access to the node
has been blocked. The system administrator should verify that the user is
allowed access. The system administrator may have blocked interactive
access so that parallel jobs could run on a node.
If automount is not running, check with the mount command to see if any
automount points are still in use. If you see an entry similar to the following
one, there is still an active automount mount point. For AIX 4.3.0 or earlier
systems:
# mount
sp3n05.msc.itso.ibm.com (pid23450@/u) /u afs Dec 07 15:41
ro,noacl,ignore
If the mount command does not show any active mounts for automount, issue
the following command to start the autmounter:
# /etc/auto/startauto
Note that the automount daemon should be started automatically during boot.
Check to see if your SP system is configured for automounter support by
issuing:
# splsdata -e | grep amd_config
If the result is true, you have automounter support configured for the SP in
your Site Environment options.
If the startauto command was successful, but the automount daemon is still
not running, check to see if the SP automounter function has been replaced
by issuing:
# ls -l /etc/auto/*.cust
If the result of this command contains an entry similar to:
the SP function to start the automounter has been replaced. View this file to
determine which automounter was started and follow local procedures for
diagnosing problems for that automounter.
If the result of the ls command does not show any executable user
customization script, check both the automounter log file
/var/adm/SPlogs/auto/auto.log and the daemon log file
/var/adm/SPlogs/SPdaemon.log for error messages.
If the startauto command fails, find the reported error messages in PSSP:
Messages Reference and follow the recommended actions. Check the
automounter log file /var/adm/SPlogs/auto/auto.log for additional messages.
Also, check the daemon log file /var/adm/SPlogs/SPdaemon.log for
messages that may have been written by the automounter daemon itself.
If automounter is running, but the user cannot access user files, the problem
may be that automount is waiting for a response from an NFS server that is
not responding or that there is a problem with a map file. Check the
/var/adm/SPlogs/SPdaemon.log for information relating to NFS servers not
responding.
If the problem does not appear to be related to an NFS failure, you will need
to check your automount maps. Look at the /etc/auto/maps/auto.u map file to
see if an entry for the user exists in this file.
Another possible problem is that the server is exporting the file system to an
interface that is not the interface from which the client is requesting the
mount. This problem can be found by attempting to mount the file system
manually on the system where the failure is occurring.
Note
It is important that you DO NOT stop the daemon with the kill -kill or
kill -9. This will prevent the automount daemon from cleaning up its
mounts and releasing its hold on the file systems. It may cause file system
hangs and force you to reboot your system to recover those file systems
The probable causes are a bad Kerberos name format, a Kerberos principal
does not exist, an incorrect Kerberos password, or a corrupted Kerberos
database. Recovery action is to repeat the command with the correct syntax.
An example is:
# k4init root.admin
Check the /.klogin file if it has entry for the user principal. If all the information
is correct, but the Kerberos command fails, suspect a database corruption.
The probable causes for this problem are the krb-srvtab file does not exist on
the node or on the control workstation or the krb-srvtab has the wrong key
version or krb-srvtab file is corrupted. Analyze the error messages to confirm
services’s principal identity problem. Make sure the /.klogin file,
The probable causes are that the ticket has expired, a valid ticket does not
exists, host name resolution is not correct, or ACL files do not have correct
entries. Destroy the ticket using k4destroy and issue a new ticket by issuing
k4init root.admin if the user is root. Then check the hosname resolution,
ACL files, and the Kerberos database.
Then, on the control workstation, change the directory to /tftpboot and verify
the <node_name>-new-srvtab file. FTP this file to the node’s /etc, and
rename the file to krb-srvtab. Then set the node back to disk as follows:
# spbootins -r disk -l <node_list>
Using either method, you can login to the node and check the hostname,
network interfaces, network routes, and hostname resolution to determine
why the node is not responding.
4. Check the hats log file for the Group Leader node. Group Leader nodes
are those that host the adapter whose address is listed below the line
Group ID in the output of the lssrc -ls hats command.
5. Delete and add the hats subsystem with the following command on the
CWS:
# syspar_ctrl -c hats.sp3en0
Then:
# syspar_ctrl -A hats.sp3en0
or, on the nodes:
# syspar_ctrl -c hats
Then:
# syspar_ctrl -A hats
The BUMP controls the system when the power is off or the AIX operating
system is stopped. The BUMP releases control of the system to AIX after it is
loaded. If AIX stops or is shut down, the BUMP again controls the system.
Run CSS_test from the command line. You can optionally select the following
options:
-q To suppress messages.
-l To designate an alternate log file.
For detail list of possible device status for SP switch, refer to P119-120 of the
IBM PSSP for AIX: Diagnosis Guide (Version 3 R 1), GA22-7350
To isolate an adapter or switch error for the SP Switch, first view the AIX error
log. For switch related errors, log in to the primary node; for adapter
The Resource Name (Res Name) in the error log should give you an
indication of how the failure was detected. For details, refer to Table 17 and
Table 18 in P121-132 from IBM PSSP for AIX: Diagnosis Guide (Version 3 R
1), GA22-7350.
The Eufence command fails to distribute the topology file if the Kerberos
authentication is not correct.
The Eunfence command will time out if the Worm daemon is not running on the
node. So, before running the Eunfence command, make sure the Worm
daemon is up and running on the node. To start the Worm daemon on the
node, it is required that you run the /usr/lpp/spp/css/rc.switch script.
If the problem persists after having correct Kerberos authentication, and the
Worm daemon is running, the next step is to reboot the node. Then, try the
Eunfence command again.
If neither of the previous steps resolve the problem, you can run diagnostics
to isolate a hardware problem on the node.
The last resort, if all else fails, would be to issue an Eclock command. This is
completely disruptive to the entire switch environment; so, it should only be
issued if no one is using the switch. An Estart must be run after Eclock
completes.
[sp3en0:/]# Estart
Switch initialization started on sp3n01
Initialized 14 node(s).
Switch initialization completed.
• If the switch is operational, and Estart is failing because the oncoming
primary's switch port is fenced, you must first change the oncoming
primary to another node on the switch and Estart. Once the switch is
operational, you can then Eunfence the old oncoming primary node. If you
also want to make it the active primary, then issue an Eprimary command
to make it the oncoming primary node and Estart the switch once again.
[sp3en0:/]# Eprimary 5
Eprimary: Defaulting oncoming primary backup node to
sp3n15.msc.itso.ibm.com
[sp3en0:/]# Estart
Estart: Oncoming primary != primary, Estart directed to oncoming primary
Estart: 0028-061 Estart is being issued to the primary node:
sp3n05.msc.itso.ibm.com.
Switch initialization started on sp3n05.msc.itso.ibm.com.
Initialized 12 node(s).
Switch initialization completed.
[sp3en0:/]Eunfence 1
All node(s) successfully unfenced.
[sp3en0:/]# Eprimary 1
Eprimary: Defaulting oncoming primary backup node to
sp3n15.msc.itso.ibm.com
[sp3en0:/]# Estart
Estart: Oncoming primary != primary, Estart directed to oncoming primary
Estart: 0028-061 Estart is being issued to the primary node:
sp3n01.msc.itso.ibm.com.
Switch initialization started on sp3n01.msc.itso.ibm.com.
Initialized 13 node(s).
Switch initialization completed.
• If the oncoming primary's switch port is fenced, and the switch has not
been started, you can not check that the node is fenced or not with the
This section describes the SDR classes and system files when you change
either the primary Ethernet IP address and host name for the SP nodes or the
CWS. We suggests that you avoid making any host name or IP address
changes if possible. The tasks are tedious and in some cases require
rerunning the SP installation steps. For detail procedures, refer the Appendix
H in IBM RS/6000 SP: PSSP Administration Guide,SA22-7348. These IP
address and host name procedures support SP nodes at PSSP levels PSSP
3.1 (AIX 4.3), PSSP 2.4 (AIX 4.2 and 4.3), PSSP 2.2 (AIX 4.1-4.2), and PSSP
2.3 (AIX 4.2 or 4.3) systems. The PSSP 3.1 release supports both SP node
coexistence and system partitioning.
Consider the following PSSP components when changing the IP address and
hostnames:
• Network Installation Manager (NIM)
• System partitioning
• IBM Virtual Shared Disk
• High Availability Control Workstation (HACWS)
SP Manuals
This chapter introduces a summary of general problem diagnosis to prepare
for the exam. Therefore, you should read Part 2 of IBM PSSP for AIX:
Diagnosis Guide (Version 3 R 1), GA22-7350, for full description. In addition,
you may read Chapters 4, 5, 8, 12, and 14 of IBM for AIX PSSP:
Administration Guide, SA22-7348, to get the basic concepts of each topic we
discuss here.
SP Redbooks
There is no problem determination redbook available for PSSP 2.4. You can
use RS/6000 SP: PSSP 2.2 Survival Guide, SG24-4928, for PSSP 2.2 This
redbook discusses extreme details on node installation and SP switch
problems.
This appendix contains the answers and a brief explanation to the sample
questions included in every chapter.
Question 2 - The answer is B. The two switch technologies (SP Switch and
HiPS) are not compatible. PSSP 2.4 is the last PSSP level that support the
HiPS switch. PSSP 3.1 or later does not support the older switch.
Question 3 - The answer is A. PSSP 3.1 requires AIX 4.3.2 or later. The
Performance Toolbox manager extension (perfagent.server fileset) is no
longer a prerequisite in PSSP 3.1. Refer to 2.12, “Software Requirements” on
page 51 for details.
Question 4 - The answer is A. The new PCI thin nodes (both PowerPC and
POWER3 versions) have two PCI slots available for additional adapters. The
Ethernet and SCSI adapters are integrated. The switch adapter uses a
special MX (mezzanine bus) adapter (MX2 for the POWER3 based nodes).
For more information, refer to 2.4.1, “Internal Nodes” on page 14.
A.5 SP Security
Answers to questions in 6.16, “Sample Questions” on page 203, are as
follows:
Question 2 - The answer is D. One of the reasons why PSSP 3.1 still
requires Kerberos v4, although it supports, through AIX, Kerberos v5, is the
fact that the hardmon daemon and the sysctl facility still require Kerberos v4
for authentication. Refer to 6.12, “SP Services That Utilize Kerberos” on page
190 for details.
Question 4 - The answer is B. All the user related configuration files are
managed by the user.admin file collection. This collection is defined by
default and it activated when you selected the SPUM as your user
management facility. Refer to 7.5.3.2, “user.admin Collection” on page 216
for details.
Question 2 - The answer is B. In the release prior to PSSP 3.1, the System
Performance Measurement Interface (SPMI) library was required by some
PSSP components. This library was packaged as part of the Performance
Toolbox Aide (PAIDE) package companion of the Performance Toolbox for
AIX. In AIX 4.3.2, which is a prerequisite for PSSP 3.1, the SPMI library is
shipped in the perfagent.tools fileset and not in the perfagent.server
component as in previous releases. Although most of the PSSP components
will not use the SPMI library, the aixos resource monitor will need it in order to
provide resource variables to Event Management. In summary, the
perfagent.tools components is a pre-requisite for PSSP 3.2 running on AIX
4.3.2.
Question 1 - The answers are A and C. The initial hostname is the real host
name of a node, while the reliable hostname is the hostname associated to
the en0 interface on that node. Most of the PSSP components will use the
reliable hostname for accessing PSSP resources on that node. The initial
hostname can be set to a faster network interface (such as the SP Switch) if
applications use the node’s hostname for accessing resources.
Question 1 - The answer is D. The SDR_test script checks the SDR and
reports any errors found. It will contact the SDR daemon and will try to create
and remove classes and attributes. If this test is successful, then the SDR
directory structure and the daemons are set up correctly. Refer to 10.3.1.2,
“Checking the SDR Initialization: SDR_test” on page 280 for details.
Question 2 - The answer is D. The spmon -d command will contact the frame
supervisor card only if the -G flag is used. If this flag is not used, the spmon -d
Question 1 - The answer is B. The lsvsd command, when used with the -l
flag, will list all the configured virtual shared disks on a node. To display all
the virtual shared disks configured in all nodes, you may use the dsh
command to run the lsvsd command on all nodes.
Question 2 - The answer is D. In order to get the virtual shared disk working
properly, you have to install the VSD software on all the nodes where you
want VSD access (client and server), then you need to grant authorization to
the Kerberos principal you will use to configure the virtual shared disks on the
nodes. After you grant authorization, you may designate which node will be
configured to access the virtual shared disks you define. After doing this, you
can start creating the virtual shared disks. Remember that when you create
Question 1 - The answer is D. The log_event script uses the AIX alog
command to write to a wraparound file. The size of the wraparound file is
limited to 64 K. The alog command must be used to read the file. Refer to the
AIX alog man page for more information on this command.
Question 2 - The answer is A. If for some reason the /etc/passwd file gets
erased or emptied, as happened here, you will not be able to log on to this
node until the file gets restored. To do that, you have start the node in
Question 4 - The answer is A. The supfilesrv daemon runs on all the file
collection servers. If the daemon is not running, clients will prompt this error
message when trying to contact the server.
Question 5 - The answers are B and C. Most cases when the error message
refers to authenticator decoding problems, they are related to either the time
difference between the client and the server machine because a time stamp
is used to encode and decode messages in Kerberos; so, if the time
difference between the client and server is more than five minutes, Kerberos
will fail with this error. The other common case is when the /etc/krb-srvtab file
is corrupted or out-of-date. This will also cause Kerberos to fail.
IBM may have patents or pending patent applications covering subject matter
in this document. The furnishing of this document does not give you any
license to these patents. You can send license inquiries, in writing, to the IBM
Director of Licensing, IBM Corporation, 500 Columbus Avenue, Thornwood,
NY 10594 USA.
Licensees of this program who wish to have information about it for the
purpose of enabling: (i) the exchange of information between independently
created programs and other programs (including this one) and (ii) the mutual
use of the information which has been exchanged, should contact IBM
Corporation, Dept. 600A, Mail Drop 1329, Somers, NY 10589 USA.
The information contained in this document has not been submitted to any
formal IBM test and is distributed AS IS. The information about non-IBM
("vendor") products in this manual has been supplied by the vendor and IBM
assumes no responsibility for its accuracy or completeness. The use of this
information or the implementation of any of these techniques is a customer
Any pointers in this publication to external Web sites are provided for
convenience only and do not in any manner serve as an endorsement of
these Web sites.
Reference to PTF numbers that have not been released through the normal
distribution process does not imply general availability. The purpose of
including these reference numbers is to alert IBM customers to specific
information relative to the implementation of the PTF when it becomes
available to each customer according to the normal IBM PTF distribution
process.
Microsoft, Windows, Windows NT, and the Windows 95 logo are trademarks
or registered trademarks of Microsoft Corporation.
The publications listed in this section are considered particularly suitable for a
more detailed discussion of the topics covered in this redbook.
This section explains how both customers and IBM employees can find out about ITSO redbooks,
redpieces, and CD-ROMs. A form for ordering books and CD-ROMs by fax or e-mail is also provided.
• Redbooks Web Site http://www.redbooks.ibm.com/
Search for, view, download or order hardcopy/CD-ROM redbooks from the redbooks web site. Also
read redpieces and download additional materials (code samples or diskette/CD-ROM images) from
this redbooks site.
Redpieces are redbooks in progress; not all redbooks become redpieces and sometimes just a few
chapters will be published this way. The intent is to get the information out much quicker than the
formal publishing process allows.
• E-mail Orders
Send orders via e-mail including information from the redbooks fax order form to:
e-mail address
In United States [email protected]
Outside North America Contact information is in the “How to Order” section at this site:
http://www.elink.ibmlink.ibm.com/pbl/pbl/
• Telephone Orders
United States (toll free) 1-800-879-2755
Canada (toll free) 1-800-IBM-4YOU
Outside North America Country coordinator phone number is in the “How to Order”
section at this site:
http://www.elink.ibmlink.ibm.com/pbl/pbl/
• Fax Orders
United States (toll free) 1-800-445-9269
Canada 1-403-267-4455
Outside North America Fax phone number is in the “How to Order” section at this site:
http://www.elink.ibmlink.ibm.com/pbl/pbl/
This information was current at the time of publication, but is continually subject to change. The latest
information for customer may be found at http://www.redbooks.ibm.com/ and for IBM employees at
http://w3.itso.ibm.com/.
Company
Address
We accept American Express, Diners, Eurocard, Master Card, and Visa. Payment by credit card not
available in all countries. Signature mandatory for credit card payment.
479
480 IBM Certification Study Guide: RS/6000 SP
Index
Images installation 272
lpp installation 273
Symbols SRC 190
/etc/rc.sp 347
AIX error log 346
/unix 348
Amd
/usr/include/sys/trchkid.h 347
See Berkeley automounter
/var/adm/ras 347
apply the PTFs 369
/var/adm/SPlogs 349
ARP cache 94
/var/adm/SPlogs/SPdaemon.log 354
auth_install 176
auth_methods 176
Numerics auth_root_rcmd 176
100BASE-TX 86, 94, 96 Authentication methods 176
10BASE-2 86 Authorization 191
10BASE-T 86 AutoFS 430
332 MHz SMP node 385 Automounter
8274 95 /etc/amd/amd-maps/amd.u 229
AIX Automounter 205
migration 228
A
abbreviations 477 mkautomap 228
Access control 206 autosensing 96
Access Control Lists 190
ACL files 185 B
acronyms 477 backup 241
adapters backup images 369
Ethernet 253, 256 Berkley automounter 205
FDDI 256 BNC 86
switch 256 boot/install server 29, 89, 90
Token Ring 256 configuring 261
Adding a frame 385, 386 selecting 261
Adding a Switch 405 bootlist 117
AFS 177 bootp 267
adduser 200 bootp_response 381
chown 200 bos.rte 345
creategroup 200 bos.sysmgt.serv_aid 345
delete 200 bos.sysmgt.trace 346
examine 200 bosinst.data 124
kas 200 broadcast storm 93
kinit 200 BUMP 441
klog.krb 200
listowned 200
membership 200 C
Central Electronics Complex (CEC) 136
pts 200
Central Manager
removeusers 200
see LoadLeveler
setfields 200
Coexistence 407
token.krb 200
Commands
AIX
/var/sysman/super update 220
filesets 242
483
DOMAIN 77 $HOME/.k5login 197
dynamic port allocation 133 $HOME/.netrc 175
$HOME/.rhosts 175, 196
.config_info 270
E .install_info 270
endpoint map 133
.profile 237
Enter 251
/.k 181
Enterprise Server 135
/etc/amd/amd-maps/amd.u 229
environment 250
/etc/environment 237
Error Conditions 441
/etc/ethers 211
Ethernet 238, 239
/etc/group 211
Ethernet switch 86, 92
/etc/hosts 211
Event Management 291
/etc/hosts.equiv 175, 196
client 350
/etc/inetd.conf 196, 238
haemd 350
/etc/inittab 237, 238
Resource Monitor Application Programming In-
/etc/krb.conf 181
terface 350
/etc/krb.realms 182
Event Manager 163
/etc/krb-srvtab 181, 191, 198
/etc/netgroup 211
F /etc/networks 211
Fast Ethernet 94, 95 /etc/passwd 211
File Collections /etc/profile 237
/share/power/system/3.2/.profile 215 /etc/protocols 211
/var/sysman/file.collections 214 /etc/publickey 211
/var/sysman/sup 217 /etc/rc.net 238
/var/sysman/sup/lists 214 /etc/rpc 211
/var/sysman/super update 220 /etc/security/group 212
Available 213 /etc/security/passwd 212
diskinfo 222 /etc/services 212, 239
hierarchical 217 /etc/sysctl.acl 202
Master Files 214 /etc/sysctl.conf 202
node.root 216, 217 /spdata/sys1/install//lppsource 273
power_system 216, 217 /spdata/sys1/install/images 272
predefined file collections 216 /spdata/sys1/install/pssp 274
Primary file collections 215 /spdata/sys1/install/pssplpp/PSSP-x.x 273
Resident 213 /spdata/sys1/spmon/hmacls 191
rlog 222 /tmp/tkt 181
scan 221, 222 /tmp/tkt_hmrmd 192
Secondary file collection 215 /tmp/tkt_splogd 192
secondary file collection 217 /usr/lpp/ssp/bin/spmkuser.default 208
Software Update Protocol 213 /var/adm/SPlogs/kerberos/kerboros.log 182
status 222 /var/kerberos/database/slavesave 188
SUP 213 <hostname>-new-srvtab 270
sup.admin 216 bosinst_data 274
supper 214 CSS_test.log 402
user.admin 216 firstboot.cust 272
when 222 image.data 274
where 222 pmandefaults 354
Files pssp_script 274
485
Ticket 179 Mirroring 406
Ticket Cache File 179 mksysb 369
Ticket-Granting Ticket 179 Modification 377
kshell port 195, 196
kvalid_user 197
N
naming conventions 241
L Network Boot Process 419
LED Network Information System
LED 231 419 client 78
LED 260 419 maps 79
LED 299 419 Master Server 78
LED 600 419 Slave Server 78
LED 606 419 Network installation 93
LED 607 419 NFS 323
LED 608 419 nim_res_op 374
LED 609 419 NIS
LED 610 419 /etc/ethers 211
LED 611 419 /etc/group 211
LED 613 419 /etc/netgroup 211
LED 622 419 /etc/networks 211
LED 625 419 /etc/passwd 211
LED C06 419 /etc/protocols 211
LED C10 419 /etc/publickey 211
LED C40 420 /etc/rpc 211
LED C42 420 /etc/security/group 212
LED C44 420 /etc/security/passwd 212
LED C45 420 /etc/services 212
LED C46 420 clients 212
LED C48 420 master server 211, 212
LED C52 420 NIS client 212
LED C54 420 passwd 213
LED C56 420 script.cust 212
libc.a 195 slave 212
libspk4rcmd.a 195 slave server 212
libvaliduser.a 197 yppasswd 213
LoadLeveler node
central manager 299 boot 263
cluster 297 dependent node 25
job step 298 external node 22
scheduler 299 High node 14
SYSPRIO 300 installation 263
logs 292 Internal Nodes 14
lsmksysb 373, 374 standard node 14
Thin node 14
Wide node 14
M Node conditioning 264
MAC address 256
Node Object 108
manual node conditioning 422
Nways LAN RouteSwitch 95
Migration 377
487
rexec 175 spdata 240
rlogin 175 spk4rsh 195, 196
rsh 175, 177, 194 splstdata 117, 164
telnet 175, 176 spmirrorvg 115
serial link 238, 239 spmkvgobj 109
Service and Manufacturing Interface (SAMI) 142 spmon 163
set_auth_method 176 spot_aix432 374
shared-nothing 49 SPUM
shell port 196 smit site_env_dialog 206
Simple Network Management Protocol 353 spunmirrorvg 116
SMIT src 163
Additional Adapter Database Information 256 SRChasMessage 163
Boot/Install Server Information 260 SSA disks 119
Change Volume Group Information 259 subnet 88
Get Hardware Ethernet Address 256 supervisor card 12
Hostname Information 257 supervisor microcode 252, 390
List Database Information 288 Switch
non-SP Frame Information 251 Operations
RS/6000 SP Installation/Configuration Verifica- clock setting 263
tion 288 primary node setting 263
RS/6000 SP Supervisor Manager 252 Start 267
Run setup_server Command 261 Topology setting 262
Select Authorization Methods for Root access to sysctl
Remote Commands 258 /etc/sysctl.acl 202
Set Primary/Primary Backup Node 263 /etc/sysctl.conf 202
SIte Environment Information 250 Kerberos 201
SP Ethernet Information 253 Tcl 202
SP Frame Information 251 SYSPRIO
Start Switch 267 see LoadLeveler
Store a Topology File 262 System Dump 347
Topology File Annotator 262 System Management 211
smit hostname 74 File Collection 213
smit mktcpip 74 NIS 210, 211
SNMP See Simple Network Management Protocol SystemGuard 441
Software Maintenance 369
SP LAN 85
SP Log Files 348
T
TB3MX 143
SP security
TCP/IP 239
Kerberos 177
Thin-wire Ethernet 86
SP Switch frame 10
ticket cache 192
SP Switch Router 26
ticket forwarding 195
sp_configd 353
Topology Services 291
spacs_cntrl 210
Reliable Messaging 350
SP-attached servers 22, 135, 388
TP
spbootins 113
See Twisted Pair
spbootlist 117
trace facility 346
spchvgobj 112
tunables 238
spcn 163
Twisted Pair 86
SPCNhasMessage 163
V
Version 377
Virtual Front Operator Panel 191
volume group 259
Volume_Group 108
489
490 IBM Certification Study Guide: RS/6000 SP
ITSO Redbook Evaluation
IBM Certification Study Guide RS/6000 SP
SG24-5348-00
Your feedback is very important to help us maintain the quality of ITSO redbooks. Please complete
this questionnaire and return it using one of the following methods:
• Use the online evaluation form found at http://www.redbooks.ibm.com
• Fax this form to: USA International Access Code + 1 914 432 8264
• Send your comments in an Internet note to [email protected]
Please rate your overall satisfaction with this book using the scale:
(1 = very good, 2 = good, 3 = average, 4 = poor, 5 = very poor)
Was this redbook published in time for your needs? Yes___ No___