0% found this document useful (0 votes)
40 views

System P Overview

Uploaded by

joseph.yuzf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

System P Overview

Uploaded by

joseph.yuzf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58



System p
Overview
SA76-0087-01


System p
Overview
SA76-0087-01
Note
Before using this information and the product it supports, read the information in “Notices” on
page 45 and the IBM Systems Safety Information manual, G229-9054.

Second Edition (September 2007)


This edition applies to IBM System p servers that contain the POWER6 processor and to all associated models.
This edition replaces SA76-0087-00.
© Copyright International Business Machines Corporation 2007. All rights reserved.
US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract
with IBM Corp.
Contents
About this publication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
How to send your comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

Chapter 1. Introduction to the IBM System p POWER6 processor-based systems . . . . 1


Model numbers and names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.
Enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.
Improvements to documentation. . . . . . . . . . . . . . . . . . . . . . . . . . . 2.
Access the documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.
Highlights of the documentation . . . . . . . . . . . . . . . . . . . . . . . . . . 2.
IBM System p POWER6 product specifications . . . . . . . . . . . . . . . . . . . . . . . 3.
System specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.
Physical package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.
Features of the 9117-MMA system . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.
Processor card features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.
Memory features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.
Disk and media features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.
External expansion unit features . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.
7311-D11 external expansion unit . . . . . . . . . . . . . . . . . . . . . . . . . 6.
7311-D11 expansion unit physical package . . . . . . . . . . . . . . . . . . . . . 7.
7311-D20 external expansion unit . . . . . . . . . . . . . . . . . . . . . . . . . 7.
7311-D20 expansion unit physical package . . . . . . . . . . . . . . . . . . . . . 8.
7314-G30 external expansion unit . . . . . . . . . . . . . . . . . . . . . . . . . 8.
7314-G30 expansion unit physical package . . . . . . . . . . . . . . . . . . . . . 8.
7031-D24 and 7031-T24 external expansion units . . . . . . . . . . . . . . . . . . . . 8.
7031-D24 and 7031-T24 expansion unit physical packages . . . . . . . . . . . . . . . . 9.
PCI adapter slots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.
Hardware Management Console models . . . . . . . . . . . . . . . . . . . . . . . . 9.
Operating system environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Chapter 2. IBM SAS controllers . . . . . . . . . . . . . . . . . . . . . . . . . 13


Benefits of SAS controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Features of SAS controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Advanced features of SAS controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Disk arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
RAID level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Stripe-unit size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
hdisk and pdisk names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
States for disk arrays (hdisks) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
States for physical disks (pdisks) . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
pdisk descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Chapter 3. Advanced POWER Virtualization . . . . . . . . . . . . . . . . . . . . 19


Virtual I/O Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Micro-Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Chapter 4. RAS and manageability . . . . . . . . . . . . . . . . . . . . . . . . 23


Reliability, Availability, and Serviceability . . . . . . . . . . . . . . . . . . . . . . . . . 23
Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Designed for reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Placement of components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Redundant components and concurrent repair . . . . . . . . . . . . . . . . . . . . . . 23
Continuous monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

© Copyright IBM Corp. 2007 iii


Detecting and deallocating failing components . . . . . . . . . . . . . . . . . . . . . 24
Handling uncorrectable errors . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Cache protection mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
PCI error recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Serviceability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Detecting errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Diagnosing problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Reporting problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Notifying the appropriate contacts . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Locating and repairing the problem . . . . . . . . . . . . . . . . . . . . . . . . . 30
Manageability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Service processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
System diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Electronic Service Agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Accessing the Electronic Services Web site . . . . . . . . . . . . . . . . . . . . . . . 34
Manage serviceable events with the HMC . . . . . . . . . . . . . . . . . . . . . . . . 34
Hardware user interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Advanced System Management interface . . . . . . . . . . . . . . . . . . . . . . . 34
Accessing the ASMI using an HMC . . . . . . . . . . . . . . . . . . . . . . . . 34
Accessing the ASMI using a Web browser . . . . . . . . . . . . . . . . . . . . . . 35
Accessing the ASMI using an ASCII terminal . . . . . . . . . . . . . . . . . . . . . 35
Graphics terminal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

Appendix A. Supported hardware features . . . . . . . . . . . . . . . . . . . . 37

Appendix B. Accessibility features . . . . . . . . . . . . . . . . . . . . . . . . 43

Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Terms and conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

iv System p: Overview
About this publication
This publication describes the design, components, functions, features, and capabilities of the IBM®
System p™ POWER6™ processor-based systems. It is intended for executives, data processing managers,
data processing technical staff, consultants, and vendors who want to learn the advantages of the IBM
System p POWER6 processor-based systems.

For information about the accessibility features of this product, for users who have a physical disability,
see Appendix B, “Accessibility features,” on page 43.

Related publications
You can access these books from the Support for IBM System p Web site at
http://www.ibm.com/systems/support/p . The books include:
v Plan:
– Introducing Improved Information Delivery for IBM System p Hardware, SA76-0105
– Site and Hardware Planning Guide, SA76-0091
– Site Preparation and Physical Planning Guide, SA76-0103
v Install:
– Installation and Configuration Guide for the Hardware Management Console, SA76-0084
v Use:
– Advanced POWER Virtualization Operations Guide, SA76-0100
– Electronic Service Agent, ESA-0001
– Functional Matrix, SA76-0088
– Operations Guide for the Hardware Management Console and Managed Systems, SA76-0085
– IBM SAS RAID Controller Reference Guide for AIX, SA76-0112
v Troubleshoot:
– AIX Diagnostics and Service Aids, SA76-0106

How to send your comments


Your feedback is important in helping to provide the most accurate and highest quality information. If
you have any comments about this publication, send your comments using Resource Link™ at
http://www.ibm.com/servers/resourcelink. Click Feedback on the navigation pane. Be sure to include
the name of the book, the form number of the book, and the specific location of the text you are
commenting on (for example, a page number or table number).

© Copyright IBM Corp. 2007 v


vi System p: Overview
Chapter 1. Introduction to the IBM System p POWER6
processor-based systems
The IBM System p POWER6 processor-based systems provide a combination of scalability, availability,
reliability, and virtualization.
POWER6 technology
The IBM System p 570 (9117-MMA) system contains a system unit based on POWER6 technology
and a new PCI Express-based internal I/O subsystem. The system unit consists of up to two
2-core processor cards, a system unit backplane, and up to three dc-dc regulators. The 2-core
processor cards contain the POWER6 dual core processor with an integrated memory controller
and a 32 MB level 3 (L3) cache chip along with 12 DDR2 dual inline memory module (DIMM)
slots.
POWER6 processors can run 64-bit applications while concurrently supporting 32-bit applications
to enhance flexibility. These processors feature simultaneous multithreading, allowing two
application threads to be run at the same time, which can significantly reduce the time to
complete tasks. The PCI Express internal I/O subsystem contains four (8x) PCI Express slots and
two PCI-X DDR slots.
SAS technology
For POWER6 processor-based systems, new Serial Attached SCSI (SAS) technology is being used,
which includes the IBM SAS DASD controller chip that provides an SAS interface to the DASD
subsystem and a Serialized AT Attachment (SATA) interface to the media devices. A conversion
from SATA to IDE occurs on the media backplane, which allows standard IDE media optical
devices to be installed.
GX+ bus slots
Two GX+ bus slots provide the interface to either a RIO-2 adapter or the new GX Dual-Port 12X
Channel Attach adapter. The GX Dual-Port 12X Channel Attach adapter provides support for the
new 7314-G30 external expansion unit.
Advanced POWER Virtualization
Advanced POWER™ Virtualization allows you to enhance the virtualization capabilities of your
system. The Advanced POWER Virtualization feature includes the Virtual I/O Server and
enablement for Micro-Partitioning™.
Host Ethernet adapter
The 9117-MMA supports the choice of 1 Gb or 10 Gb integrated host Ethernet adapters (HEA).
These IBM-supplied ports can be selected at the time of initial order. The 9117-MMA supports
virtualization of these integrated Ethernet adapters.
Improved RAS
The Reliability, Availability, and Serviceability (RAS) features help to ensure that the system
operates when required, performs reliably, and handles any failures that might occur in an
efficient manner. The POWER6 processor-based system offers many features that are designed to
increase Reliability, Availability, and Serviceability.

Model numbers and names


The IBM systems hardware information, available through the Support for IBM System p Web site,
contains information about IBM System POWER6 processor-based servers. Each model is referenced
throughout the information in various ways, depending on the context of the article.

The following table shows the MTM name, the full name, the short name, and the IBM brand or family
for each model.

© Copyright IBM Corp. 2007 1


Table 1. List of models
MTM Full name of model Short name of model Brand or family
9117-MMA IBM System p 570 p 570 System p
(9117-MMA)

For instructions to access the Support for IBM System p Web site, refer to “Improvements to
documentation.”

Enhancements
Enhanced features with the IBM System p POWER6 processor-based systems include an updated user
interface on the Hardware Management Console (HMC), advanced virtualization functions, and enhanced
RAS features.
Documentation
The documentation is available on the Support for IBM System p Web site, which is a customized
Web-based solution that provides information to help you plan for, install, and maintain IBM
System p POWER6 processor-based systems.
Hardware Management Console
An updated user interface that requires fewer clicks to access key tasks is now available. Added
accessibility features allow technology that assists the user.
Advanced POWER Virtualization
Advanced POWER Virtualization functions facilitate highly efficient system utilization.
Enhanced RAS
Enhanced Reliability, Availability, and Serviceability (RAS) features are designed to improve
application availability.
Electronic problem reporting
An updated Electronic Service Agent™ allows server information and problems to be
electronically reported to the service and support organization.

Improvements to documentation
The documentation is available through the Support for IBM System p Web site, which provides the
information you need to plan for, install, and maintain IBM System p POWER6 processor-based systems.

Access the documentation


1. Go to the Support for IBM System p Web site at http://www.ibm.com/systems/support/p .
2. Select System p from the Select your product Hardware field and click Go.
3. Select Product documentation libraries under Popular links.
4. Select POWER6 specific documentation. The System p POWER6 hardware publications are
displayed.

Highlights of the documentation


Highlights of the new information delivery for IBM System p POWER6 processor-based systems include:
Synchronization of delivery
Information is published and maintained on the same schedule as the product.
Improved online help information on the HMC
v The online help now includes a more powerful search engine, a new interface, and print
feature.
v You can operate all features using the keyboard in addition to the mouse.

2 System p: Overview
Adding and exchanging field replaceable units (FRUs) using the HMC
The Service Management menus on the HMC provide an interactive step-by-step procedure using
illustrations to help customers and service representatives add and exchange parts if needed.

IBM System p POWER6 product specifications


The IBM System p POWER6 processor-based systems include the IBM System p 570 (9117-MMA), five
external expansion units, and two new models of the Hardware Management Console (HMC).
IBM System p 570 (9117-MMA)
The 9117-MMA is a 2-16 core midrange server mounted in an industry-standard 19-inch rack.
Multiple systems, each of which is 4-EIA units high, can be joined to build larger n-core systems.
Each 9117-MMA building block can have up to two 2-core processor cards.
Up to four building blocks can be joined together, so a 16-core system is the maximum
configuration.
All configurations use modular symmetric multiprocessor (SMP) architecture. This design allows
customers to start with what they need and add additional systems as necessary, without
disrupting the base system. Capacity on Demand (CoD) features enable you to activate dormant
processors for times as short as one minute.
External expansion units
The external expansion units that the 9117-MMA supports include:
v The 7311-D11, which contains six PCI-X slots.
v The 7311-D20, which contains seven PCI-X slots and 12 hot-swappable SCSI disk drive bays
arranged in two 6-packs.
v The 7314-G30, which contains six PCI-X DDR slots.
v The 7031-D24 and 7031-T24 both include Ultra 320 SCSI interface connections for up to 24 LVD
Ultra 320 SCSI disk drives. The models are available in the following configurations:
– The 7031-D24 - rack-mountable configuration
– The 7031-T24 - stand-alone deskside configuration

Note: For additional details on the external expansion units, refer to “External expansion unit
features” on page 6.
Hardware Management Console (HMC) 7042-C06
The 7042-C06 is a desktop model that includes one IBM-supplied 10/100/1000 Ethernet port, but
two additional dual-port 10/100/1000 Gb Ethernet adapters can be added.
Hardware Management Console (HMC) 7042-CR4
The 7042-CR4 is a 1-EIA unit high, 19-inch rack-mountable model that has two IBM-supplied
Ethernet ports, but one additional dual-port 10/100/1000 Gb Ethernet adapter can be added.

System specifications
Learn about the system specifications for a single 9117-MMA system.
Table 2. 9117-MMA specifications
Description Range (operating)
Operating temperature 5 to 35 degrees C (41 to 95 F)
Relative humidity 8% to 80%
Wet bulb 23 degrees C (73 F) (maximum configuration)
Noise level 6.2 to 7.1 bels (operating 4-core configurations)
Operating voltage 200 to 240 V AC 50/60 Hz

Chapter 1. Introduction to the IBM System p POWER6 processor-based systems 3


Table 2. 9117-MMA specifications (continued)
Description Range (operating)
Power consumption 1,400 watts (maximum)
Power source loading 1.428 kVA (maximum configuration)
Thermal output 4,778 British thermal unit (Btu)/hour (maximum configuration)

Physical package
The IBM System p 570 (9117-MMA) system is available only in the rack-mountable configuration.

The following table lists the dimensions of the system.


Table 3. Physical packaging of the 9117-MMA
Dimension One 9117-MMA system
Height 174 mm (6.85 inches)
Width 483 mm (19.0 inches)
Depth 793 mm (31.2 inches) from the front rack rail mounting surface to the rear of the
power supply
Weight 63.6 kg (140 pounds)

A 9117-MMA system can have one to four system units. If you choose to install the 9117-MMA system in
a rack manufactured by a company other than IBM, review the supplier’s installation and planning
information for any product-specific installation requirements before installing the system or systems.

Features of the 9117-MMA system


A fully configured 9117-MMA has four building blocks.

A fully configured 9117-MMA building block includes the following capacities:


v Up to eight processor cards using the POWER6 chip, for a total of 16 processors
v Up to 768 GB of DDR2 memory
v 24 SAS disk drives for an internal storage capacity of 7.2 TB using 300 GB drives
v 24 PCI slots: 8 PCI-X DDR and 16 (8x) PCI Express
v Four SlimLine media bays for optional optical storage devices

For a multiple-drawer server configuration, a processor fabric cable or cables, and a service interface
cable are required.

The service interface card in the 9117-MMA system has the following ports supplied by IBM:
v Two 10/100 Ethernet ports
v Two system ports
v Two HMC ports
v Two SPCN ports

Note: For a multiple-drawer server configuration with more than one service interface card, the service
interface card in system unit one and system unit two must both be connected to the HMC.

In addition, the 9117-MMA building block has one internal SAS controller, redundant hot-swap cooling
fans, redundant power supplies, and redundant processor voltage regulators.

4 System p: Overview
Processor card features
Each 9117-MMA system unit can contain up to 2-core processor cards with 64-bit, copper-based POWER6
microprocessors running at 3.5, 4.2, or 4.7 GHz.

All processor card features are available only as Capacity on Demand (CoD) features. The initial order of
the 9117-MMA system must contain the feature code of the processor card, as well as the processor
activation feature code.

The following table contains the processor card feature codes and processor activation feature codes.
Table 4. Processor card feature codes and processor activation feature codes
Processor card feature code Description
5620 3.5 GHz POWER6 2-core processor card, 0-core active, 12 DDR2 memory
slots. CoD options include:
v 5670: 1-way processor activation (permanent)
v 5640: Utility CoD (100 processor minutes)
v 5642: Utility CoD (one year prepaid)
v 5650: CoD (one day billing)
5622 4.2 GHz POWER6 2-core processor card, 0-core active, 12 DDR2 memory
slots. CoD options include:
v 5672: 1-way processor activation (permanent)
v 5641: Utility CoD (100 processor minutes)
v 5643: Utility CoD (one year prepaid)
v 5653: CoD (one day billing)
7380 4.7 GHz POWER6 2-core processor card, 0-core active, 12 DDR2 memory
slots. CoD options include:
v 5403: 1-way processor activation (permanent)
v 5404: Utility CoD (100 processor minutes)
v 5408: Utility CoD (one year prepaid)
v 5656: CoD (one day billing)

Each processor card features one POWER6 chip with two processor cores and 8MB of L2 cache (each core
has a private 4MB L2 cache), 32 MB of L3 cache, and 12 slots of DDR2 memory DIMM technology.

Note: Utility CoD billing for feature codes 5404, 5640, and 5641 provides payment for temporary use of
the processor card features for 100 minutes of usage. The purchase of this feature occurs after the
customer has 100 minutes of use on processors in the shared processor pool that are not permanently
active.

Memory features
The processor cards that are used in the 9117-MMA have 12 slots for memory DIMMs.

The following table lists the memory feature codes that are available. The 9117-MMA system supports
CoD options for memory.
Table 5. Memory feature codes
Feature code Description
5692 0/2GB (4X0.5GB) DIMMS, 667 MHz, DDR2, POWER6™ CoD memory
5693 0/4GB (4X1GB) DIMMS, 667 MHz, DDR2, POWER6 CoD memory
5694 0/8GB (4X2GB) DIMMS, 667 MHz, DDR2, POWER6 CoD memory

Chapter 1. Introduction to the IBM System p POWER6 processor-based systems 5


Table 5. Memory feature codes (continued)
Feature code Description
5695 0/16GB (4X4GB) DIMMS, 533 MHz, DDR2, POWER6 CoD memory
5696 0/32GB (4X8GB) DIMMS, 400 MHz, DDR2, CoD memory
7954 Activation of on/off memory
5680 Activation of 1GB DDR2 memory for feature codes 5692 through 5696
5681 Activation of 256 GB DDR2 memory
5691 Activation of 1 GB-day on/off memory

Each processor card should have an equal amount of memory to provide balanced memory across the
processor cards. This enables memory access to be distributed evenly over system components to provide
optimal performance.

Disk and media features


Each 9117-MMA system has six disk drive bays and one SlimLine media device bay. In a fully configured
9117-MMA with four systems, 24 disk drive bays are available, which provide a maximum internal
storage capacity of 7.2 TBs. (The minimum configuration includes one 73 GB disk drive.)

The following table lists the disk drive features that are available.
Table 6. Disk drive feature codes
Feature code Description
3646 73 GB 15K RPM SAS disk drive
3647 146 GB 15K RPM SAS disk drive
3648 300 GB 15K RPM SAS disk drive

Up to four SlimLine media device bays are available in a fully configured system. Feature code 5629, the
optional media enclosure and backplane, is required to support one SlimLine device in each system.

Any combination of the following DVD-ROM and DVD-RAM drives can be installed:
v IDE SlimLine DVD-ROM drive, feature code 5756
v 4.7 GB IDE SlimLine DVD-RAM drive, feature code 5757

Feature code 5629 (the optional media enclosure and backplane), and a DVD-ROM or DVD-RAM device,
are required in a system running the Linux® operating system.

External expansion unit features


A single 9117-MMA system has four (8x) PCI Express slots and two PCI-X DDR slots. If more PCI-X slots
are needed, such as to extend the number of logical partitions, you can attach external expansion units.

Up to 20 7311-D11 or 7311-D20, and up to 32 7314-G30 expansion units can be attached to a fully


configured 9117-MMA system with four system units.

7311-D11 external expansion unit


Two 7311-D11 expansion units fit side-by-side in the 4-EIA units high enclosure (feature code 7311)
mounted in a 19-inch rack, such as the IBM 7014-T00 or 7031-T24.

The 7311-D11 expansion unit features six PCI-X slots. Only the blind-swap cassettes are supported.

6 System p: Overview
The 7311-D11 expansion unit offers a modular growth path for the 9117-MMA systems with increasing
I/O requirements. A fully configured 9117-MMA supports the attachment of up to 20 7311-D11 expansion
units, and the combined system supports up to 120 PCI-X, eight PCI-X DDR2, and 15 PCI Express
adapters.

Note: To attach the 20 7311-D11 expansion units to the four 9117-MMA systems requires five RIO-2
remote I/O loop adapters (feature code 1800), which block one of the PCI Express slots.

The 7311-D11 expansion unit has the following attributes:


v 4-EIA units high rack-mountable enclosure (feature code 7311) that can hold one or two 7311-D11
expansion units
v Six PCI-X slots: 3.3 V, keyed, 133 MHz blind-swap hot-plug
v Standard redundant hot-plug power and cooling devices
v Two RIO-2 and two SPCN ports

7311-D11 expansion unit physical package: Listed below are the physical characteristics of one 7311-D11
expansion unit. If you place two expansion units side-by-side, the weight for the two expansion units is
also listed:
v Width: 221 mm (8.7 inches)
v Depth: 711 mm (28.0 inches)
v Height: 168 mm (6.6 inches)
v Weight:
– One expansion unit: 16.8 kg (37 pounds)
– Two expansion units plus the mounting enclosure: 39.1 kg (86 pounds)

7311-D20 external expansion unit


The 7311-D20 expansion unit is a 4-EIA units high full-sized expansion unit that must be mounted in a
rack.

The 7311-D20 expansion unit offers a modular growth path for the 9117-MMA systems with increasing
I/O requirements.

A fully configured 9117-MMA can have 20 7311-D20 expansion units attached. The combined system
supports up to 140 PCI-X, eight PCI-X DDR2, and 15 PCI Express adapters.

Note: To attach the 20 7311-D20 expansion units to the four 9117-MMA systems requires five RIO-2
remote I/O loop adapters (feature code 1800), which block one of the PCI Express slots.

PCI-X cards are inserted into the slot from the top of the expansion unit. The adapters are protected by
plastic separators, which are designed to prevent grounding and damage when adding or removing
adapters.

The 7311-D20 expansion unit has the following attributes:


v 4-EIA units high rack-mountable enclosure assembly
v Seven PCI-X slots: 3.3 V, keyed, 133 MHz hot-plug
v 12 hot-swappable SCSI disk drive bays arranged in two 6-packs
v Optional redundant hot-plug power and cooling (feature code 6268)
v Two RIO-2 and two SPCN ports

Note: The 7311-D20 expansion unit initial order, or an existing 7311-D20 expansion unit that is
migrated from another System p model, must have the RIO-2 ports available (feature code 6417).

Chapter 1. Introduction to the IBM System p POWER6 processor-based systems 7


7311-D20 expansion unit physical package: The expansion unit has the following physical
characteristics:
v Width: 445 mm (17.5 inches)
v Depth: 610 mm (24.0 inches)
v Height: 178 mm (7.0 inches)
v Weight: 45.9 kg (101 pounds)

7314-G30 external expansion unit


Two 7314-G30 expansion units fit side-by-side in the 4-EIA units high enclosure (feature code 7314)
mounted in a 19-inch rack, such as the IBM 7014-T00 or 7014-T42. The 7314-G30 expansion unit is
designed to be attached to the system unit using the InfiniBand™™ bus and InfiniBand cables.

The 7314-G30 expansion unit features six PCI-X DDR slots. Only the blind-swap cassettes are supported.

The 7314-G30 expansion unit offers a modular growth path for the 9117-MMA systems with increasing
I/O requirements. Up to four 7314-G30 expansion units can be attached in a loop using the GX Dual-Port
12X Channel Attach adapter (feature code 1802). Two loops for each 9117-MMA are supported, allowing
up to 32 7314-G30 expansion units for a fully configured 9117-MMA. The combined system supports up
to 200 PCI-X DDR and 12 PCI Express adapters.

Note: To attach the 32 7314-G30 expansion units to the four 9117-MMA systems requires eight GX
Dual-Port 12X Channel Attach adapters, which block four of the PCI Express slots.

The 7314-G30 expansion unit has the following attributes:


v 4-EIA units high rack-mountable enclosure (feature code 7314) can hold one or two 7314-G30 expansion
units
v Six PCI-X DDR 266 MHz adapter slots
v Cassettes can be installed and removed without removing the expansion unit from the rack
v Because the PCI slots support hot-pluggable adapters, adapters can be installed or replaced without
turning off the power or removing the covers
v Standard redundant hot-plug power and cooling devices

7314-G30 expansion unit physical package: The expansion unit has the following physical
characteristics:
v Width: 224 mm (8.8 inches)
v Depth: 800 mm (31.5 inches)
v Height: 172 mm (6.8 inches)
v Weight:
– One expansion unit: 20 kg (44 pounds)
– Two expansion units plus the mounting enclosure: 45.9 kg (101 pounds)

7031-D24 and 7031-T24 external expansion units


The 7031-D24 and 7031-T24 expansion units provide power, cooling, and Ultra 320 SCSI interface
connections for up to 24 LVD Ultra 320 SCSI disk drives.

The 7031-D24 and 7031-T24 expansion units are available in the following configurations:
v Model 7031-D24 - rack-mountable configuration
v Model 7031-T24 - stand-alone deskside configuration

The 24 disk drive bays are organized into four independent SCSI groups of six drive bays each. With the
use of up to four SCSI repeater cards, you can use either of the following host SCSI bus connection
options:

8 System p: Overview
v A single initiator to each SCSI group
v A high-availability dual initiator feature that supports the connection of two adapters to a SCSI group.

The high-availability SCSI connection feature can be used on any or all of the drive groups in the
enclosure and together with other drive groups in the enclosure, using the standard connection option.
Either model can be set up to use 100-127 V ac or 200-240 V ac.

7031-D24 and 7031-T24 expansion unit physical packages: The following are the physical characteristics
of the 7031-D24 and 7031-T24 expansion units:
v The 7031-D24 rack-mountable expansion unit has the following physical characteristics:
– Width: 447 mm (17.5 inches)
– Depth: 660 mm (26 inches)
– Height: 171 mm (6.75 inches)
– Weight: 54 kg (120 lb.)
v The 7031-T24 deskside model has the following physical characteristics:
– Width: 305 mm (12.0 inches)
– Depth: 665 mm (26 inches)
– Height: 508 mm (20.0 inches)
– Weight: 66 kg (145 lb.)

PCI adapter slots


Various configurations of I/O expansion units add support for PCI adapter slots for your 9117-MMA
system.

The following table summarizes the maximum number of I/O expansion units that is supported for a
9117-MMA system, and the number of PCI adapter slots that are available, when all of the I/O expansion
units are the same model.
Table 7. Maximum number of I/O expansion units supported and total number of PCI adapter slots
Maximum number of
9117-MMA building external expansion
block/processor units Total number of PCI adapter slots
7311-D11 7311-D20 7314-G30
One building 4 3 PCI Express 3 PCI Express 3 PCI Express
block/2-core
2 PCI-X DDR 2 PCI-X DDR 26 PCI-X DDR

24 PCI-X 28 PCI-X
One building 8 3 PCI Express 3 PCI Express 3 PCI Express
block/4-core
2 PCI-X DDR 2 PCI-X DDR 50 PCI-X DDR

48 PCI-X 56 PCI-X

Hardware Management Console models


The Hardware Management Console (HMC) is a dedicated workstation that provides a graphical user
interface for configuring, operating, and performing basic system tasks for your POWER6 processor-based
servers.

The following table lists the desktop and rack-mountable HMC models available for POWER6
processor-based systems.

Chapter 1. Introduction to the IBM System p POWER6 processor-based systems 9


Table 8. HMC models available for POWER6 processor-based systems
Type-model Description
7042-C06 IBM 7042-C06 Desktop Hardware Management Console
7042-CR4 IBM 7042-CR4 Rack-Mounted Hardware Management Console

Note: You also can upgrade the 7310 HMC models (for example, 7310-C04) to an HMC that can manage
a POWER6 processor-based system. To do this, order miscellaneous equipment specification (MES) 0962
and upgrade the machine code to Version 7 Release 3.1.0. To upgrade to Version 7 Release 3.1.0, you must
start at Version 6 Release 1.2.

The 7042-C06 is a desktop model that includes one 10/100/1000 Ethernet port supplied by IBM; two
additional dual-port 10/100/1000 Ethernet adapters can be installed.

The 7042-CR4 HMC is a 1-EIA unit high, 19-inch rack-mountable model that has two Ethernet ports
supplied by IBM and can be extended with one additional two-port 10/100/1000 Gb Ethernet adapter.

One HMC can manage multiple POWER6 processor-based systems. An Ethernet connection is required
between the HMC and one of the Ethernet ports on the service processor. Ensure that sufficient Ethernet
adapters are available on the HMC to create public and private networks, if you need both.

Two HMCs are recommended in configurations that have high availability requirements. The service
processor in the 9117-MMA system supports the connection of two HMCs, so there are no additional
features needed for an 9117-MMA to support a dual HMC environment. The HMCs provide a locking
mechanism so that only one HMC at a time has write access to the service processor. In a configuration
with multiple systems, the customer is required to provide a switch or hub to connect one HMC to both
of the service processors in systems one and two.

Note: When two HMCs are being used for high availability, an Ethernet hub is required, provided by the
customer.

When an HMC is connected to the 9117-MMA, the integrated system ports are disabled. If you need
serial connections (for example, for HACMP™ heartbeat signals), you must order an additional
asynchronous adapter (feature code 5723).

It is a good practice to connect the HMC to the first HMC port on the system, labeled as HMC Port 1,
although other network configurations are possible. A second HMC can be attached to HMC Port 2 of the
server for redundancy (or vice versa).

The default mechanism for allocation of the IP addresses for the service processor HMC ports is dynamic.
The HMC can be configured as a Dynamic Host Configuration Protocol (DHCP) server, providing the IP
address at the time the managed server is powered on. If the service processor of the managed server
does not receive a DHCP reply before timeout, predefined IP addresses will be set up on both ports.
Static IP address allocation is also an option. You can configure the IP address of the service processor
ports with a static IP address by using the Advanced System Management Interface (ASMI) menus. See
“Service processor” on page 32 for predefined IP addresses and additional information.

Note: To access the ASMI (for example, to set up an IP address of a new POWER6 processor-based server
when the HMC is not available or not providing DHCP services), you can connect any PC client to one of
the service processor HMC ports with any kind of Ethernet cable, and use a Web browser to access the
predefined IP address, such as https://169.254.2.147

Functions that can be performed using an HMC include:


v Creating and maintaining a multiple logical partition environment

10 System p: Overview
v Displaying a virtual operating system session terminal for each partition
v Displaying a virtual operator panel of contents for each partition
v Detecting, reporting, and storing changes in hardware conditions
v Powering managed systems on and off
v Acting as a service focal point

The HMC provides both a graphical and command-line interface for all management tasks. The
command-line interface is also available by using the SSH secure shell connection to the HMC.

Operating system environment


Several operating system environments are available for the IBM System p POWER6 processor-based
systems.

Table 9 displays a summary of the minimum supported operating system levels for the System p
POWER6 processor-based systems.
Table 9. Supported operating systems for System p POWER6 processor-based systems
Operating system Version
®
AIX AIX 5L™ Version 5.3 with the 5300-06 Technology Level
Linux (SUSE) SLES 10 SP1
Virtual I/O Server 1.4

Chapter 1. Introduction to the IBM System p POWER6 processor-based systems 11


12 System p: Overview
Chapter 2. IBM SAS controllers
In POWER6 processor-based systems, the six drives in the internal hard disk drive enclosure in the
system unit use Serial Attached SCSI (SAS) technology.

SAS architecture defines a serial device interconnection and transportation protocol that defines the rules
for information exchange between devices. SAS is an evolution of the parallel SCSI device interface into a
serial point-to-point interface.

Benefits of SAS controllers


SAS controllers have a robust SAS expandable architecture that incorporates fibre channel-like
functionality (that is, dual path).

Additionally, SAS controllers can offer the following benefits:


v An improved signal quality because of a point-to-point connection between device and adapter, or
expander
v Improved availability and redundancy, with dual paths to each drive
v Reduced potential customer problems with point-to-point:
– There is no contention when accessing a drive
– IBM SAS controllers minimize command time-outs
– IBM SAS controllers prevent situations where one drive on a bus stops the entire bus
v Performance growth capability
v An improved disk to adapter ratio, providing more addressability: parallel SCSI up to 36, and SAS up
to 60
v IBM SAS controllers use SCSI commands, providing:
– Minimal impacts to operating systems
– Compatibility for High Speed Software (applications)
v IBM SAS controllers ease the ability to identify failing devices

Features of SAS controllers


The IBM SAS controllers are optimized for SAS disk configurations that use dual paths through dual
expanders for redundancy and reliability.

Additionally, SAS controllers offer the following features:


v A PCI-X266 system interface or PCI Express system interface
v A physical link speed of 3 Gb per second supporting transfer rates of 300 MB per second
v Support of SAS devices and non-disk Serial Advanced Technology Attachment (SATA) devices
v Management of path redundancy and path switching for multiported SAS devices
v Support for RAID (Redundant Array of Independent Disks)
v Support attachment of other devices, such as non-RAID disks, tape, and optical devices
v RAID disk arrays and non-RAID devices supported as bootable devices

© Copyright IBM Corp. 2007 13


Advanced features of SAS controllers
Advanced features in the IBM SAS controllers include:
v Background parity checking
v Background data scrubbing
v Disks formatted to 528 bytes per sector, providing cyclical redundancy checking (CRC) and logically
bad block checking
v Optimized skip read/write disk support for transaction workloads
v Supports a maximum of 64 advanced function disks with a total device support maximum of 255 (for
example, the number of all physical SAS and SATA devices plus number of logical RAID disk arrays
must be fewer than 255 per controller)

AIX supports all of the functions listed in the following table. If you are using another operating system,
consult the documentation for that operating system regarding support.
Table 10. IBM SAS RAID controller cards
Features Card 1
Custom Card Identification Number (CCIN) 572C
Description PCI-X266 planar 3Gb SAS
Form factor Planar integrated
Adapter LED/feature code 2502
Physical links 8
RAID levels supported 0
Write cache size 0
Removable cache card No
Multi initiator and high availability support No
Auxiliary write cache (AWC) support No

SAS physical links are a set of four wires used as two differential signal pairs. One differential signal
transmits in one direction while the other differential signal transmits in the opposite direction. Data can
be transmitted in both directions simultaneously.

Physical links are contained in ports with each port containing one or more physical link. Each port is a
wide port if there is more than one physical link in the port, or a narrow port if there is only one
physical link in the port. A port is identified by a unique SAS worldwide name (also called an SAS
address).

An SAS controller contains one or more SAS ports. A path is a logical point-to-point link between an SAS
initiator port in the controller and an SAS target port in the I/O device (that is, disk). A connection is a
temporary association between a controller and I/O device through a path which enables communication
to a device. The controller can communicate to the I/O device over this connection using either the SCSI
command set or the ATA/ATAPI command set, depending on the device type.

An expander facilitates connections between a controller port and multiple I/O device ports. An
expander routes connections between the expander ports. If an I/O device supports multiple ports, then
it is possible for more than one path to the device when there are expander devices on the path. An SAS
fabric is the summation of all paths between all controller ports and all I/O device ports in the SAS
subsystem.

14 System p: Overview
Note: The external hard disk drive on the POWER6 processor-based systems use the existing SCSI
technology.

Disk arrays
RAID technology is used to store data across a group of disks known as a disk array. The disk arrays are
groups of disks that work together with a specialized array controller to potentially achieve higher data
transfer and input and output (I/O) rates than those provided by single large disks. The array controller
keeps track of how the data is distributed across the disks.

Depending on the RAID level selected, this storage technique provides the data redundancy required to
keep data secure and the system operational. If a disk failure occurs, the disk can usually be replaced
without interrupting usual system operation. Disk arrays also have the potential to provide higher data
transfer and input and output (I/O) rates than those provided by single large disks.

Each disk array can be used by AIX in the same way that a single non-RAID disk would be used. For
example, after creating a disk array, you can create a file system on the disk array or use AIX commands
to make the disk array available to the system by adding the disk array to a volume group.

The IBM SAS RAID controller is managed by the IBM SAS Disk Array Manager. The IBM SAS Disk
Array Manager serves as the interface to the controller and I/O devices. It also handles the monitoring
and recovery features of the controller.

If a disk array will be used as the boot device, you might have to prepare the disks by booting from the
diagnostic CD and creating the disk array before installing AIX. You might want to perform this
procedure when the original boot drive is part of a disk array. Non-RAID disks formatted to 528 bytes
per sector will be automatically put into a single-drive RAID 0 array on system boot. If a single drive
RAID 0 array is your boot device, then the single drive RAID 0 array can be installed onto it without
using the diagnostics CD. Otherwise, use the CD to create the boot drive configuration.

RAID level
The RAID level of a disk array determines how data is stored on the disk array and the level of
protection that is provided.

The RAID level supported by the IBM SAS RAID controller has it own attributes and uses a different
method of writing data. Currently, RAID level 0 stripes data across the disks in the array, for optimal
performance.

RAID level 0 offers a high potential I/O rate, but it is a nonredundant configuration. As a result, there is
no redundant data available for the purpose of reconstructing data in the event of a disk failure. There is
no error recovery beyond what is usually provided on a single disk. If a physical disk fails in a RAID
level 0 disk array, the disk array is marked as failed. All data in the array must be backed up regularly to
protect against data loss.

Stripe-unit size
With RAID technology, data is striped across an array of physical disks. This data distribution scheme
complements the way the operating system requests data.

The granularity at which data is stored on one disk of the array before subsequent data is stored on the
next disk of the array is called the stripe-unit size. The collection of stripe units from the first disk of the
array to the last disk of the array is called a stripe.

You can set the stripe-unit size of an IBM SAS disk array to 16 KB, 64 KB, or 256 KB. You might be able
to maximize the performance of your IBM SAS disk array by setting the stripe-unit size to a value that is

Chapter 2. IBM SAS controllers 15


slightly larger than the size of the average system I/O request. For large system I/O requests, use a
stripe-unit size of 256 KB. The optimum stripe-unit size is displayed on the screen when you create the
disk array.

hdisk and pdisk names


IBM SAS disk arrays are assigned names using the hdisk form, in the same way as other disk storage
units in AIX.

These names are automatically assigned whenever you create a disk array. The names are deleted when
you delete the disk array. The individual physical disks that comprise disk arrays or serve as candidates
to be used in disk arrays are represented by pdisk names. A pdisk is a disk that is formatted to 528 bytes
per sector. Disks that are formatted to 512 bytes per sector are assigned names using the hdisk form.
These disks must be formatted to 528 bytes per sector before they can be used in disk arrays.

The List SAS Disk Array Configuration option in the IBM SAS Disk Array Manager can be used to
display these pdisk and hdisk names, along with their associated location codes.

States for disk arrays (hdisks)


The valid states for IBM SAS disk arrays are Degraded, Failed, Missing, Optimal, Rebuilding, and Unknown.
Degraded
The array’s protection against disk failures is degraded or its performance is degraded. When one
or more Array Member pdisks is in the Failed state, the array is still functional but might no
longer be fully protected against disk failures.
Failed The array is no longer accessible because of disk failures or configuration problems.
Missing
A previously configured disk array no longer exists.
Rebuilding
Redundancy data for the array is being reconstructed. After the rebuilding process is complete,
the array will return to the Optimal state. Until then, the array is not fully protected against disk
failures.
Unknown
The state of the disk array could not be determined.

States for physical disks (pdisks)


The valid states for pdisks are Active, Failed, Missing, RWProtected, and Unknown.
Active The disk is functioning correctly.
Failed
The controller cannot communicate with the disk, or the pdisk is the cause of the disk array being
in a Degraded state.
Missing
The disk was previously connected to the controller but is no longer detected.
RWProtected
The disk is unavailable because of a hardware or a configuration problem.
Unknown
The state of the disk could not be determined.

16 System p: Overview
pdisk descriptions
The fourth column in the output is a description of the device. For an array, the description indicates the
RAID level of the array. The description of a pdisk indicates whether the disk is configured as an Array
Member, Hot Spare, or an Array Candidate.
Array Member
A 528 bytes per sector pdisk that is configured as a member of an array.
Array Candidate
A 528 bytes per sector pdisk that is a candidate for becoming an Array Member or a Hot Spare.

Chapter 2. IBM SAS controllers 17


18 System p: Overview
Chapter 3. Advanced POWER Virtualization
Learn about the components of the Advanced POWER Virtualization hardware feature, including
Micro-Partitioning and the Virtual I/O Server.

Advanced POWER Virtualization is a hardware feature that you can purchase to enhance the vitalization
capabilities of your system. In general, the Advanced POWER Virtualization feature includes the Virtual
I/O Server and the enablement for Micro-Partitioning.

The following table describes each component of the Advanced POWER Virtualization feature and the
hardware platforms on which each component is available.
Table 11. Advanced POWER Virtualization components
Component Description Hardware platforms
Virtual I/O Server The Advanced POWER Virtualization The Virtual I/O Server is available
feature includes the installation image for all POWER6 processor-based
for the Virtual I/O Server software. systems.
The Virtual I/O Server facilitates the
sharing of physical I/O resources
between AIX and Linux client logical
partitions within the server.
Micro-Partitioning The Advanced POWER Virtualization Micro-Partitioning is available on all
feature includes firmware enablement POWER6 processor-based systems.
for Micro-Partitioning.
Micro-Partitioning is the ability to
allocate processors to logical
partitions in increments of .1.

When you specify the Advanced POWER Virtualization hardware feature with the initial system order,
the firmware is activated to support Micro-Partitioning and the Virtual I/O Server. For upgrade orders, a
key similar to the Capacity on Demand key is included to enable the firmware. The Virtual I/O Server is
a licensed software component of the Advanced POWER Virtualization feature. It contains one charge
unit per activated processor, including software maintenance.

Virtual I/O Server


Learn the concepts of the Virtual I/O Server and its primary components.

The Virtual I/O Server is software that is located in a logical partition. This software facilitates the
sharing of physical I/O resources between AIX and Linux client logical partitions within the server. The
Virtual I/O Server provides virtual SCSI target and Shared Ethernet Adapter capability to client logical
partitions within the system, allowing the client logical partitions to share SCSI devices and Ethernet
adapters. The Virtual I/O Server software requires that the logical partition be dedicated solely for its
use.

The Virtual I/O Server is available as part of the Advanced POWER Virtualization hardware feature.

Using the Virtual I/O Server facilitates the following functions:


v Sharing of physical resources between logical partitions on the system
v Creating logical partitions without requiring additional physical I/O resources

© Copyright IBM Corp. 2007 19


v Creating more logical partitions than there are I/O slots or physical devices available with the ability
for partitions to have dedicated I/O, virtual I/O, or both
v Maximizing use of physical resources on the system
v Helping to reduce the Storage Area Network (SAN) infrastructure

The Virtual I/O Server supports client logical partitions running the following operating systems:
v AIX 5.3 or later
v SUSE Linux Enterprise Server 9 for POWER (or later)

For the most recent information about devices that are supported on the Virtual I/O Server, to download
Virtual I/O Server fixes and updates, and to find additional information about the Virtual I/O Server, see
the Virtual I/O Server Web site.

The Virtual I/O Server comprises the following primary components:


v Virtual SCSI
v Virtual Networking

The following sections provide a brief overview of each of these components.

Virtual SCSI

Physical adapters with attached disks or optical devices on the Virtual I/O Server logical partition can be
shared by one or more client logical partitions. The Virtual I/O Server offers a local storage subsystem
that provides standard SCSI-compliant logical unit numbers (LUNs). The Virtual I/O Server can export a
pool of heterogeneous physical storage as an homogeneous pool of block storage in the form of SCSI
disks.

Unlike typical storage subsystems that are physically located in the SAN, the SCSI devices exported by
the Virtual I/O Server are limited to the domain within the server. Although the SCSI LUNs are SCSI
compliant, they might not meet the needs of all applications, particularly those that exist in a distributed
environment.

The following SCSI peripheral-device types are supported:


v Disks backed by a logical volume
v Disks backed by a physical volume
v Optical devices (DVD-RAM and DVD-ROM)

Virtual networking

Shared Ethernet Adapter allows logical partitions on the virtual local area network (VLAN) to share
access to a physical Ethernet adapter and to communicate with systems and partitions outside the server.
This function enables logical partitions on the internal VLAN to share the VLAN with stand-alone
servers.

Micro-Partitioning
Micro-Partitioning allows multiple logical partitions to share the system’s processing power. Use this
topic to learn more about Micro-Partitioning and how it functions in a virtual computing environment.

Micro-Partitioning enables you to allocate processors to logical partitions in increments of .1. For example,
one partition might have .6 of a processor, while another partition might have 1.4 processors. Such
partitions are referred to as shared processor partitions. You can choose between dedicated processor
partitions and shared processor partitions using Micro-Partitioning.

20 System p: Overview
Micro-Partitioning allows for increased overall use of system resources by automatically applying only
the required amount of processor resource needed by each partition. You can configure the POWER
hypervisor to continually adjust the amount of processor capacity that is allocated to each shared
processor partition based on workload. Tuning parameters provide the system administrator with
extensive control over the amount of processor resources that each partition can use.

Micro-Partitioning is supported by AIX 5.3 + APAR IY58321 or later, i5/OS® V5R4 or later, and Linux. To
use Micro-Partitioning on IBM System p5™ 570, the Advanced POWER Virtualization feature is required.

Chapter 3. Advanced POWER Virtualization 21


22 System p: Overview
Chapter 4. RAS and manageability
Many design features help lower the total cost of ownership (TCO) of the IBM POWER6 processor-based
systems. The IBM Reliability, Availability, and Serviceability (RAS) technology allows the possibility to
improve your TCO architecture by reducing unplanned downtime.

Reliability, Availability, and Serviceability


Reliability, Availability, and Serviceability (RAS) features can help to ensure that the system operates
when required, performs reliably, and handles failures in an efficient manner.

The POWER6 processor-based systems feature mainframe-inspired RAS features.

Reliability
Highly reliable systems are built with highly reliable components. On IBM POWER6 processor-based
systems, this basic premise is expanded upon with a clear design for reliability architecture and
methodology.

A concentrated, systematic, architecture-based approach is designed to improve overall system reliability


with each successive generation of system offerings.

Designed for reliability


Systems designed with fewer components and interconnects have fewer opportunities to fail. Simple
design choices such as integrating two processor cores on a single POWER chip can dramatically reduce
the opportunity for system failures. In this case, a 16-core server will include half as many processor
chips (and chip socket interfaces) as with a single-CPU-per-processor design. Not only does this reduce
the total number of system components, it reduces the total amount of heat generated in the design,
resulting in an additional reduction in required power and cooling components.

Placement of components
Packaging is designed to deliver both high performance and high reliability. For example, the reliability
of electronic components is directly related to their thermal environment, that is, large decreases in
component reliability are directly correlated with relatively small increases in temperature, POWER6
processor-based systems are carefully packaged to ensure adequate cooling. Critical system components
such as the POWER6 processor chips are positioned on printed circuit cards so they receive fresh air
during operation. In addition, POWER6 processor-based systems are built with redundant, variable-speed
fans that can automatically increase output to compensate for increased heat in the central electronic
complex.

Redundant components and concurrent repair


High-opportunity components, or those that most affect system availability, are protected with
redundancy and the ability to be repaired concurrently.

Continuous monitoring
Aided by the IBM First Failure Data Capture (FFDC) methodology and the associated error reporting
strategy, commodity managers build an accurate profile of the types of failures that might occur, and
initiate programs to enable corrective actions.

The IBM support team also continually analyzes critical system faults, testing to determine if system
firmware and maintenance procedures and tools are effectively handling and recording faults as
designed.

© Copyright IBM Corp. 2007 23


Availability
The POWER6 processor-based systems include many features that can enhance availability.

Detecting and deallocating failing components


Runtime correctable or recoverable errors are monitored to determine if there is a pattern of errors. If
these components reach a predefined error limit, the service processor initiates an action to deconfigure
the ″faulty″ hardware, helping to avoid a potential system outage and to enhance system availability.

To detect and deallocate failing components, the following features are used:
Persistent deallocation
To enhance system availability, a component that is identified for deallocation or deconfiguration
on a POWER6 processor-based system is flagged for persistent deallocation. Component removal
can occur either dynamically (while the system is running) or at boot-time (IPL), depending both
on the type of fault and when the fault is detected.
In addition, runtime unrecoverable hardware faults can be deconfigured from the system after the
first occurrence. The system can be rebooted immediately after failure and resume operation on
the remaining stable hardware. This prevents the same faulty hardware from affecting system
operation again, while the repair action is deferred to a more convenient, less critical time.
Dynamic processor deallocation
Dynamic processor deallocation enables automatic deconfiguration of processor cores when
patterns of recoverable errors, for example correctable errors on processor caches, are detected.
Dynamic processor deallocation can prevent a recoverable error from escalating to an
unrecoverable system error, which might otherwise result in an unscheduled server outage.
Dynamic processor deallocation relies upon the service processor’s ability to use First Failure
Data Capture (FFDC)-generated recoverable error information to notify the POWER hypervisor
when a processor core reaches its predefined error limit.
v In a shared processor logical partitioning environment, the POWER hypervisor in conjunction
with the operating system will drain the run-queue for the failing core, redistribute the work to
the remaining CPUs, deallocate the offending CPU, and continue normal operation, although
potentially at a lower level of system performance.
v In dedicated processor logical partitioning environment, the platform can request deallocation
of the processor from the operating system.
The logical partitioning strategy also enables additional system availability improvements,
allowing any processor to be shared with any logical partition on the system.
POWER6 Processor Instruction Retry
POWER6 processor-based systems include a suite of mainframe-inspired processor instruction
retry features that can significantly reduce situations that could result in checkstop. The POWER6
processor recovery occurs in the following order:
1. Automatically retry a failed instruction and continue with the task.
2. Interrupt a repeatedly failing instruction and move it to a new processor and continue with
the task.
3. In the event that spare capacity is not found, use the predefined logical partition availability
priority list that was created, so that capacity is obtained from lower demand logical
partitions. For example, capacity could be first obtained from a test environment instead of a
financial accounting system.
4. When other recovery methods fail, try to contain the termination to the logical partition that is
using the faulty core at that instruction.
Memory protection
Memory and cache arrays comprise data ″bit lines″ that feed into a memory word. A memory

24 System p: Overview
word is addressed by the system as a single element. Depending on the size and addressability of
the memory element, each data bit line might include thousands of individual bits (memory
cells). For example:
v A single memory module on a memory dual inline memory module (DIMM) can have a
capacity of 1 Gb, and supply 8 bit lines of data for an ECC word. In this case, each bit line in
the ECC word holds 128 Mb behind it, corresponding to more than 128 million memory cell
addresses.
v A 32KB L1 cache with a 16-byte memory word, on the other hand, would have only 2 Kb
behind each memory bit line.
A memory protection architecture that provides good error resilience for a relatively small L1
cache might be very inadequate for protecting the much larger system main store. Therefore, a
variety of different protection methods are used in POWER6 processor-based systems to avoid
uncorrectable errors in memory. Memory protection plans must take into account many factors,
including size, desired performance, and memory array manufacturing characteristics. POWER6
processor-based systems have a number of protection schemes designed to prevent, protect, or
limit the effect of errors in main memory. These capabilities include:
Hardware scrubbing
Hardware scrubbing is a method used to deal with transient or soft errors. IBM POWER6
processor-based systems periodically address all memory locations and any memory
locations with an ECC error are rewritten with the correct data.
Error correcting code
Error correcting code (ECC) allows a system to detect up to two errors in a memory word
and correct one of them. However, without additional correction techniques if more than
one bit is corrupted, a system will fail. For example, a burst error (sequential bad bits) or
DRAM failure is not tolerated by a system that exclusively uses ECC. For this reason,
Chipkill™ memory is used.
Chipkill
Chipkill is an enhancement to ECC that enables a system to sustain the failure of an
entire DRAM. Chipkill spreads the bit lines from a DRAM over multiple ECC words, so
that a catastrophic DRAM failure would affect at most one bit in each word. Barring a
future single bit error, the system can continue indefinitely in this state with no
performance degradation until the failed DIMM can be replaced. To avoid this scenario,
IBM POWER6 processor-based systems use a technology called redundant bit steering.
Redundant bit steering
IBM systems use redundant bit steering to avoid situations where multiple single-bit
errors align to create a multi-bit error. In the event that an IBM POWER6 processor-based
system detects an abnormal number of errors on a bit line, it can dynamically ″steer″ the
data stored at this bit line into one of a number of spare lines. This both reduces exposure
to multi-bit errors, and helps to defer maintenance until all redundant bits have been
used.

Handling uncorrectable errors


Occasionally an uncorrectable data error can occur in memory or cache. When this happens, the IBM
POWER6 processor-based system attempts to limit the impact to the least possible disruption, using a
strategy that first considers the data source.

Sometimes an uncorrectable error is transient in nature and occurs in data that can be recovered from
another repository. In cases where the data cannot be recovered from an other source, a technique called
special uncorrectable error handling is used to determine whether the corruption is a threat to the system.
Many times the data is not needed and can be written over, the error condition is voided, and the system
will continue to operate normally.

Chapter 4. RAS and manageability 25


When an uncorrectable error is detected, the system modifies the associated ECC word, thereby signaling
to the rest of the system that the ″standard″ ECC is no longer valid. The service processor is then notified
and takes appropriate actions. If you are using AIX 5L Version 5.2 or later or Linux, and a process
attempts to use the data, the operating system is notified of the error and the operating system will
terminate only the specific user program. It is only in the case where the corrupted data is used by the
POWER hypervisor that the entire system must be rebooted, thereby preserving overall system integrity.

Cache protection mechanisms


IBM POWER6 processor-based systems are designed with cache protection mechanisms, including cache
line delete in both L2 and L3 arrays, processor instruction retry and alternate processor recovery
protection on L1-I and L1-D, and redundant repair bits in L1-I, L1-D, and L2 caches, as well as L2 and L3
directories.

PCI error recovery


PCI adapters are generally complex designs involving extensive on-board instruction processing, often on
embedded microcontrollers.

PCI adapters can account for a large portion of the hardware-based error opportunity on a large server.
While servers that rely only on boot time diagnostics can identify failing components to be replaced by
hot-swap and reconfiguration, runtime errors pose a more significant problem.

The traditional means of handling problems is through adapter internal error reporting and recovery
techniques in combination with operating system device drive management and diagnostics.

A method that uses a combination of system firmware enablement and Extended Error Handling (EEH)
device drivers allows recovery from intermittent PCI bus errors. This approach works by recovering and
resetting the adapter, thereby initiating system recovery for a permanent PCI bus error. Rather than
failing immediately, the faulty device is frozen and restarted, preventing a machine check. For the
POWER6 processor-based systems, this capability has been extended to PCI Express bus errors, and also
includes expanded Linux support for EEH.

Serviceability
The IBM POWER6 serviceability strategy evolves from the service architecture deployed on POWER5
processor-based systems.

The IBM serviceability system package incorporates:


v Easy access to service components
v On-demand service education
v An automated guided repair strategy that uses common service interfaces for a converged service
approach across multiple IBM server platforms

Customer control of the service environment extends to firmware maintenance on all of the POWER6
processor-based systems. This strategy contributes to higher systems availability with reduced
maintenance costs.

Detecting errors
The first and most crucial component of a solid serviceability strategy is the ability to accurately and
effectively detect errors when they occur.

While not all errors are a threat to system availability, those that go undetected can be problematic
because the system does not have the opportunity to evaluate and act if necessary.

Features that are included in the IBM POWER6 processor-based systems that aid in detecting errors, are
as follows:

26 System p: Overview
Error checkers
IBM POWER6 processor-based systems contain specialized hardware detection circuitry, used to
detect erroneous hardware operations. Error checking hardware ranges from parity error
detection coupled with processor instruction retry and bus retry, to ECC correction on caches and
system buses. All IBM hardware error checkers have distinct attributes:
v They continually monitor system operations to detect potential calculation errors.
v They attempt to isolate physical faults based on runtime detection of each unique failure.
v They can initiate a wide variety of recovery mechanisms designed to correct the problem. IBM
POWER6 processor-based systems include extensive hardware and firmware recovery logic.
Fault isolation
Error checker signals are captured and stored in hardware fault isolation registers (FIRs).
Associated circuitry is used to limit the domain of an error to the first checker that encounters the
error. In this way, runtime error diagnostics can be deterministic such that for every check station,
the unique error domain for that checker is defined and documented. Ultimately, the error
domain becomes the field replaceable unit (FRU) callout, and manual interpretation of the data is
not usually required.
First Failure Data Capture
First failure data capture (FFDC) is an isolation technique that ensures that when a fault is
detected in a system through error checkers or other types of detection methods, the root cause of
the fault will be captured without the need to re-create the problem or run extended tracing or a
diagnostics program.
For the vast majority of faults, a good FFDC design means that the root cause will be detected
automatically without the intervention of a service representative. Pertinent error data related to
the fault is captured and saved for analysis. In hardware, FFDC data is collected from the fault
isolation registers. In firmware, this data consists of return codes, function calls, and so on.
FFDC ″check stations″ are carefully positioned within the server logic and data paths to ensure
that potential errors can be quickly identified and accurately tracked to a field replaceable unit
(FRU).
Fault isolation
The service processor interprets error data captured by the FFDC checkers in order to determine
the root cause of an error event.
Root cause analysis might indicate that the event is recoverable, meaning that a service action
point or need for repair has not been reached. Alternatively, it might indicate that a service action
point has been reached, where the event exceeded a predetermined threshold or was
unrecoverable. Based upon the isolation analysis, recoverable error threshold counts might be
incremented. If the event is recoverable, then a service action might not be necessary.
If the event is deemed to require a service action, additional information will be collected to
service the fault. For unrecoverable errors or for recoverable events that meet or exceed their
service threshold, a request for service will be initiated though an error logging component.

Diagnosing problems
Using the extensive network of advanced and complementary error detection logic built into the
hardware, firmware, and operating systems, IBM POWER6 processor-based systems can perform
considerable self-diagnosis.
Boot-time
When you start a POWER6 processor-based system, the service processor initializes system
hardware. Boot-time diagnostic testing uses a multi-tier approach for system validation, starting
with managed low-level diagnostics supplemented with system firmware initialization and
configuration of I/O hardware, followed by operating system-initiated software test routines.
Boot-time diagnostic routines include:

Chapter 4. RAS and manageability 27


Built-in self tests (BISTs)
Built-in self tests (BISTs) for both logic components and arrays ensure the internal
integrity of components. Because the service processor assists in performing these tests,
the system is enabled to perform fault determination isolation whether the system
processor is operational or not. The boot-time BISTS might also find faults not detectable
by processor-based power-on self test (POST) or diagnostics.
Wire tests
Wire tests discover and precisely identify connection faults between components such as
processors, memory, or I/O hub chips.
Initialization of components
During the initialization of components, processes can take place that help aid in isolation
of future errors. For example, when ECC memory starts, it writes patterns of data and
allows the server to store valid ECC data for each location, which can help isolate errors.
To minimize boot time, the system determines which of the diagnostics are required to be started
in order to ensure correct operation based on the way the system was powered off, or based on
the boot-time selection menu.
Run time
All POWER6 processor-based systems can monitor critical system components during run time,
and take corrective actions when recoverable faults occur. The IBM hardware error check
architecture provides the ability to report non-critical errors through the service processor to the
HMC.
Device drivers
In certain cases, diagnostics are best performed by operating system-specific drivers, most notably
I/O devices that are owned directly by a logical partition. In these cases, the operating system
device driver will often work in conjunction with I/O device microcode to isolate and or recover
from problems. Potential problems are reported to an operating system device driver, which logs
the error. I/O devices also might include specific exercisers that can be invoked by the diagnostic
facilities for problem re-creation if required by service procedures.

Reporting problems
After diagnosing an error, IBM POWER6 processor-based systems report the error through a number of
mechanisms. This situation ensures that appropriate entities are aware that the system might be operating
in an error state.

However, a crucial piece of solid reporting strategy is ensuring that a single error communicated through
multiple error paths is correctly aggregated, so that subsequent notifications are not inadvertently
duplicated.
Error logging and analysis
After the root cause of an error has been identified by a fault isolation component, an error log
entry is created with some basic data, such as:
v An error code uniquely describing the error event
v The location of the failing component
v The part number of the component to be replaced, including pertinent data such as
engineering and manufacturing levels
v Return codes
v Resource identifiers
v First Failure Data Capture (FFDC) data
Data that contains information on the effect that the repair will have on the system is also
included. Error log routines in the operating system can then use this information and decide
whether to contact service and support, send a notification only, or continue without alert.

28 System p: Overview
Remote support
The Remote Management and Control (RMC) application is delivered as part of the base
operating system, including the operating system on the Hardware Management Console (HMC).
RMC provides a secure transport mechanism across the LAN interface between the operating
system and the HMC and is used by the operating system diagnostic application for transmitting
error information. The RMC application performs a number of other functions as well, but these
are not used for the service infrastructure.
Manage serviceable events using the HMC
A critical requirement in a logically partitioned environment is to ensure that errors are not lost
before being reported for service, and that errors are only reported once, regardless of how many
logical partitions experience the potential effect of the error. The Manage Serviceable Events task on
the HMC is responsible for aggregating duplicate error reports, and ensures that all errors are
recorded for review and management.
When a local or globally reported service request is made to the operating system, the operating
system diagnostic subsystem uses the RMC subsystem to relay error information to the HMC. For
global events (platform unrecoverable errors, for example) the service processor will also forward
error notification of these events to the HMC, providing a redundant error-reporting path in case
of errors in the RMC network.
The first occurrence of each failure type will be recorded in the Manage Serviceable Events task on
the HMC. The Manage Serviceable Events task will filter and maintain a history of duplicate reports
from other logical partitions or the service processor. It then looks across all active service event
requests, analyzes the failure to ascertain the root cause and, if enabled, contacts the IBM support
organization for service. This method ensures that all platform errors will be reported through at
least one functional path, ultimately resulting in a single notification for a single problem.
Extended error data
Extended error data (EED) is additional data that is collected either automatically at the time of a
failure or manually at a later time. The data collected is dependent on the invocation method, but
includes information such as firmware levels, operating system levels, additional fault isolation
register values, recoverable error threshold register values, system status, and any other pertinent
data.
The data is formatted and prepared for transmission back to IBM to assist the service and
support organization with preparing a service action plan for the service provider or for
additional analysis.
Handling system dumps
In some circumstances, an error might require a dump to be automatically or manually created.
In this event, it will be saved to the HMC upon reboot. Specific HMC information is included as
part of the information that can optionally be sent to IBM support for analysis. If additional
information relating to the dump is required, or if it becomes necessary to view the dump
remotely, the dump record will contain information that will allow the IBM support center to
identify which HMC the dump is located on.

Notifying the appropriate contacts


After an IBM POWER6 processor-based system has detected, diagnosed, and reported an error to an
appropriate aggregation point, the system then takes steps to notify the customer, and if necessary, the
IBM support organization.

Depending upon the assessed severity of the error and the support agreement, this notification could
range from a simple notification to a dispatch of a service representative automatically to the customer
site with the correct replacement part.
Customer notify
When an event is important enough to report, but does not indicate a need for a repair action or
the need to contact IBM service and support, it is classified as customer notify. Customers are
notified because these events might be of interest to an administrator. The event might be a
Chapter 4. RAS and manageability 29
symptom of an expected systemic change, such as a network reconfiguration or failover testing of
redundant power or cooling systems. Examples include:
v Network events, for example, a loss of contact over a local area network (LAN)
v Environmental events, for example, a temperature warning
v Events that need further examination by the customer, but not necessarily require a part
replacement or repair action
Customer notify events are serviceable events by definition because they indicate that something
has happened that requires customer awareness, in the event they want to take further action.
These events can always be reported to IBM at the customer’s discretion.
Contacting the service and support organization
A correctly configured system can contact the service and support organization to initiate an
automatic or manual call from a customer location. It can include error data, server status, or
other service-related information.
This action invokes the service organization for the appropriate service action to begin,
automatically opening a problem report, and in some cases also dispatching field support.
Automated reporting provides faster and potentially more accurate transmittal of error
information. While configuring call home is optional, customers are strongly encouraged to
configure this feature in order to obtain the full value of IBM service enhancements.
Vital Product Data and inventory management
IBM POWER6 processor-based systems store vital product data (VPD) internally, which keeps a
record of how much memory is installed, how many processors are installed, manufacturing level
of the parts, and so on. These records provide valuable information that can be used by remote
support and service representatives, enabling them to provide assistance in keeping the firmware
and software on the server up-to-date.
IBM problem management database
At the IBM support center, historical problem data is entered into the IBM service and support
problem management database. All of the information related to the error, along with any service
actions taken by the service provider are recorded for problem management by the support and
development organizations. The problem is then tracked and monitored until the system fault is
repaired.

Locating and repairing the problem


The final component of a comprehensive design for serviceability is the ability to effectively locate and
replace parts that require service. IBM POWER6 processor-based systems use a combination of visual
cues and guided maintenance procedures to ensure that the identified part is replaced correctly.
Guiding light
Guiding light uses a series of flashing LEDs, allowing a service provider to quickly and easily
identify the location of system components. Because some customer configurations are very
complex, guiding light can handle multiple error conditions simultaneously.
In the guiding light LED implementation, when a fault condition is detected on a POWER6
processor-based system an amber system attention LED will be illuminated. The service provider
can engage the identify mode by selecting a specific problem. Guiding light identifies the part that
needs to be replaced by flashing the amber identify LED.
Operator panel
The operator panel on IBM POWER6 processor-based system is a 4 X 16 element LCD display
used to present boot progress codes, indicating advancement through the system power-on and
initialization processes. The operator panel is also used to display error and location codes when
an error occurs that prevents the system from booting. The service representative or customer can
also change various boot-time options, and perform a subset of the service functions that are
available on the Advanced System Management (ASM) interface.

30 System p: Overview
Concurrent maintenance
IBM POWER6 processor-based systems are designed with the understanding that certain
components have higher intrinsic failure rates than others. The movement of fans, power
supplies, and physical storage devices naturally makes them more susceptible to wear or stress,
while other devices such as I/O adapters might begin to wear from repeated plugging or
unplugging. For this reason, when correctly configured, these devices are specifically designed to
be concurrently maintainable.
In other cases, a customer might be in the process of moving or redesigning a datacenter, or
planning a major upgrade. At times like these, flexibility is crucial. IBM POWER6 processor-based
systems are designed for redundant or concurrently maintainable power, fans, physical storage,
and I/O towers.
Blind-swap PCI adapters
Blind-swap PCI adapters represent significant service and ease-of-use enhancements in I/O
subsystem design while maintaining high PCI adapter density.
Standard PCI designs supporting hot-add and hot-replace require top access so that adapters can be
slid into the PCI I/O slots vertically. Blind-swap allows PCI adapters to be concurrently replaced
without having to put the I/O expansion unit into a service position.
Firmware updates
Firmware on the POWER6 processor-based servers is released in a cumulative sequential fix
format, packaged as an RPM file for concurrent application and activation. Administrators can
install and activate many firmware patches without cycling power or rebooting the server.
The new firmware image is loaded on the HMC using any of the following methods:
v Media distributed by IBM, such as a CD-ROM
v A problem Fix distribution from the IBM service and support repository
v Download from the IBM Web site
v FTP from another server

IBM supports multiple firmware releases in the field, so under expected circumstances, a server
can operate on an existing firmware release, using concurrent firmware fixes to stay up-to-date
with the current patch level. Because changes to some server functions (for example, changing
initialization values for chip controls) cannot occur during system operation, a patch in this area
will require a system reboot for activation.

Activation of new firmware functions, as opposed to patches, will require the installation of a
new firmware release level. This process is disruptive to server operations in that it requires a
scheduled outage and full server reboot.

In addition to concurrent and disruptive firmware updates, concurrent patches include functions
that are not activated until a subsequent server reboot. A server with these patches will operate
normally. Additional fixes will be installed and activated when the system reboots after the next
scheduled outage.

POWER6 firmware allows you to view the status of a system power control network background
firmware update. This subsystem will update as necessary as migrated nodes or I/O expansion
units are added to the configuration. You can view the progress of the update, and start and stop
the background update if a more convenient time becomes available.
Repair and verify
Repair and verify is a system used to guide a service provider through the procedure of repairing
a system and verifying that the problem has been repaired. The steps are customized in the
appropriate sequence for the particular repair for the specific system being repaired. Repair
scenarios covered by repair and verify include:
v Replacing a defective field replaceable unit (FRU)

Chapter 4. RAS and manageability 31


v Reattaching a loose or disconnected component
v Correcting a configuration error
v Removing or replacing an incompatible FRU
v Updating firmware, device drivers, operating systems, middleware components, and IBM
applications after replacing a part
v Installing a new part
Repair and verify procedures are designed to be used both by service providers who are familiar
with the task at hand and by those who are not. On-demand education content is placed in the
procedure at the appropriate location. Throughout the repair and verify procedure, repair history
is collected and provided to the service and support problem-management database for storage
with the serviceable event, to ensure that the guided maintenance procedures are operating
correctly.
Service documentation on the Support for IBM System p Web site
The Support for IBM System p Web site is an electronic information repository for POWER6
processor-based systems. The Support for System p Web site provides online training and
educational material, as well as service documentation. In addition, the Web site provides service
procedures that are not handled by the automated repair and verify guided component.
The Support for IBM System p Web site is located at http://www.ibm.com/systems/support/p
. For details on accessing the documentation, see “Improvements to documentation” on page
2.

Manageability
Several functions and tools help manageability, and can allow you to efficiently and effectively manage
your system.

Service processor
The service processor is an embedded controller running the service processor’s internal operating
system.

The service processor operating system has specific programs and device drivers for the service processor
hardware. The host interface is a processor support interface connected to the POWER6 processor. The
service processor is always working, regardless of main system unit’s state. The system unit can be in the
following states:
v Standby (power off)
v Operating, ready to start partitions
v Operating with logical partitions running

The service processor is used to monitor and manage the system hardware resources and devices. The
service processor checks the system for errors, ensuring the connection to the HMC for manageability
purposes and accepting Advanced System Management Interface (ASMI) Secure Sockets Layer (SSL)
network connections. The service processor provides the ability to view and manage the machine-wide
settings using the ASMI, and allows complete system and partition management from the HMC.

Note: The service processor enables a system that will not boot to be analyzed. The error log analysis can
be performed from either the ASMI or the HMC.

The service processor uses two Ethernet 10/100 Mbps ports:


v Both Ethernet ports are only visible to the service processor and can be used to attach the server to an
HMC or to access the ASMI. The ASMI options can be accessed through an HTTP server that is
integrated into the service processor operating environment.

32 System p: Overview
v Both Ethernet ports have a default IP address:
– Service processor Eth0 or HMC1 port is configured as 169.254.2.147 (This applies to the service
processor in drawer 1 or the top drawer.)
– Service processor Eth1 or HMC2 port is configured as 169.254.3.147 (This applies to the service
processor in drawer 1 or the top drawer.)

System diagnostics
The system diagnostics consist of stand-alone diagnostics, which are loaded from the DVD-ROM drive,
and online diagnostics (available in AIX).
v Online diagnostics, when installed, are a part of the AIX operating system on the disk or server. They
can be booted in single-user mode (service mode), run in maintenance mode, or run concurrently
(concurrent mode) with other applications. They have access to the AIX error log and the AIX
configuration data.
– Service mode, which requires a service mode boot of the system, enables the checking of system
devices and features. Service mode provides the most complete checkout of the system resources.
All system resources, except the SCSI adapter and the disk drives used for paging, can be tested.
– Concurrent mode enables the normal system functions to continue while selected resources are
being checked. Because the system is running in normal operation, some devices might require
additional actions by the user or diagnostic application before testing can be done.
– Maintenance mode enables the checking of most system resources. Maintenance mode provides the
same test coverage as service mode. The difference between the two modes is the way they are
invoked. Maintenance mode requires that all activity on the operating system be stopped. The
shutdown -m command is used to stop all activity on the operating system and put the operating
system into maintenance mode.
v The System Management Services (SMS) error log is accessible on the SMS menus. This error log
contains errors that are found by partition firmware when the system or partition is booting.
v The service processor’s error log can be accessed on the ASMI menus.
v You can also access the system diagnostics from a Network Installation Management (NIM) server.

Electronic Service Agent


Electronic Service Agent, along with the Electronic Services Web site, make up IBM Electronic Services.

Electronic Service Agent automatically monitors and collects hardware problem information and sends
this information to IBM. It also can collect information about hardware, software, system configuration,
and performance management, which might help the IBM service and support organization assist in
diagnosing problems.

Electronic Service Agent is a no-charge software tool that is located on your system to continuously
monitor events and periodically send service information to IBM service and support on a user-definable
timetable. This tool tracks and captures service information, hardware error logs, and performance
information. It automatically reports hardware error information to IBM service and support as long as
the system is under an IBM maintenance agreement or within the IBM warranty period. Service
information and performance information reporting do not require an IBM maintenance agreement.

To access Electronic Service Agent user guides, perform the following steps:

1. Go to the IBM Electronic Services news Web site at http://www.ibm.com/support/electronic .


2. In the navigation pane, select Electronic Service Agent.
3. In the contents pane, select Reference Guides → System p → AIX.

To receive maximum coverage, activate Electronic Service Agent on every platform, partition, and
Hardware Management Console (HMC) in your network. If your IBM System p server is managed by an

Chapter 4. RAS and manageability 33


HMC, the HMC will report all hardware problems, and the AIX operating system will report only
software problems and system information. You must configure the Electronic Service Agent on the HMC.
The AIX operating system will not report hardware problems for a system managed by an HMC.

Accessing the Electronic Services Web site


The Electronic Services Web site provides the ability to view service information reported by Electronic
Service Agent, use the Premium Search function, open and manage service requests, receive support
messages by platform or individual, and customize the site to your preferences.

Ensure that you have the appropriate operating system level before installing Electronic Service Agent.
You will need AIX 5L Version 5.3 with the 5300-06 Technology Level or later.

Manage serviceable events with the HMC


Service strategies become more complicated in a partitioned environment. The Manage Serviceable Events
task in the HMC can help streamline this process.

Each logical partition reports errors it detects, without determining whether other logical partitions also
detect and report the errors. For example, if one logical partition reports an error for a shared resource,
such as a managed system power supply, other active logical partitions might report the same error.

By using the Manage Serviceable Events task in the HMC, you can avoid long lists of repetitive call-home
information by recognizing that these are repeated errors and consolidating them into one error.

In addition, you can use the Manage Serviceable Events task to initiate service functions on systems and
logical partitions including the exchanging of parts, configuring connectivity, and managing dumps.

Hardware user interfaces


In addition to the HMC, other hardware management user interfaces can be used to manage your IBM
System p servers

Advanced System Management interface


The Advanced System Management interface (ASMI) is the interface to the service processor that enables
you to manage the operation of the server, such as auto power restart, and to view information about the
server, such as the error log and vital product data. Some repair procedures require connection to the
ASMI.

The ASMI is accessible through the HMC. For details, see “Accessing the ASMI using an HMC.” The
ASMI is also accessible using a Web browser on a system that is connected directly to the service
processor (in this case, either a standard Ethernet cable or a crossed cable) or through an Ethernet
network. Use the ASMI to change the service processor IP addresses or to apply some security policies
and avoid the access from undesired IP addresses or range.

You might be able to use the service processor’s default settings. In that case, accessing the ASMI is not
necessary.

Accessing the ASMI using an HMC: If configured to do so, the HMC connects directly to the ASMI for
a selected system from this task.

To connect to the Advanced System Management interface from an HMC:


1. Open Systems Management from the navigation pane.
2. From the work pane, select one or more managed systems to work with.
3. From the System Management tasks list, select Operations.
4. From the Operations task list, select Advanced System Management (ASM).

34 System p: Overview
Accessing the ASMI using a Web browser: The Web interface to the ASMI is accessible through
Microsoft® Internet Explorer® 6.0, Netscape 7.1, Mozilla Firefox, or Opera 7.23 running on a PC or mobile
computer connected to the service processor. The Web interface is available during all phases of system
operation, including the initial program load (IPL) and run time. However, some of the menu options in
the Web interface are unavailable during IPL or run time to prevent usage or ownership conflicts if the
system resources are in use during that phase.

Accessing the ASMI using an ASCII terminal: The ASMI on an ASCII terminal supports a subset of the
functions provided by the Web interface and is available only when the system is in the platform standby
state. The ASMI on an ASCII console is not available during some phases of system operation, such as
the IPL and run time.

Graphics terminal
The graphics terminal is available to users who want a graphical user interface (GUI) to their AIX or
Linux systems. To use the graphics terminal, plug the graphics adapter into a PCI slot in the back of the
server. You can connect a standard monitor, keyboard, and mouse to the adapter to use the terminal. This
connection allows you to access the SMS menus, as well as an operating system console.

Chapter 4. RAS and manageability 35


36 System p: Overview
Appendix A. Supported hardware features
Several hardware features are supported on IBM System p servers that contain the POWER6 processor.

Use this information as an additional source of information that, together with the latest sales and
marketing resources, may be used to enhance your knowledge of IBM server solutions.

Note: This information does not replace the latest sales and marketing publications and tools that
document features. For the latest information, see the IBM Offering Information web site at
http://www.ibm.com/common/ssi.

In the following tables, an X in the 9117-MMA column indicates that the feature is supported. The matrix
is divided into the following sections:
v “Adapters”
v “Cables” on page 38
v “Disks” on page 40
v “Expansion units” on page 40
v “Media devices” on page 40
v “Memory” on page 40
v “Miscellaneous features” on page 41
v “Pointing devices” on page 41
v “Processor” on page 42
v “Rack related” on page 42

Note: If you are installing a new feature, ensure that you have the software required to support the new
feature and determine whether there are any existing PTF prerequisites to install. To do this, use the IBM
Prerequisite Web site at http://www-912.ibm.com/e_dir/eServerPrereq.nsf

Adapters
9117-
Feature MMA Description
Asynchronous
2943 X 8-Port Asynchronous Adapter EIA-232/RS-422, PCI bus
5723 X 2-Port Asynchronous IEA-232 PCI Adapter
Cassettes
5646 X Blind Swap Cassette Kit - Short PCI slot- Type III
5647 X Blind Swap Cassette Kit - Standard PCI slot- Type III
Fibre Channel
5758 X 4 Gb Single-Port Fibre Channel PCI-X 2.0 DDR Adapter
5759 X 4 Gb Dual-Port Fibre Channel PCI-X 2.0 DDR Adapter
5773 X 4 Gigabit PCI Express Single Port Fibre Channel Adapter
5774 X 4 Gigabit PCI Express Dual Port Fibre Channel Adapter
Graphics
2849 X POWER GXT135P Graphics Accelerator with Digital Support

© Copyright IBM Corp. 2007 37


9117-
Feature MMA Description
5748 X POWER GXT145 PCI Express Graphics Accelerator
LAN
5700 X Gigabit Ethernet-SX PCI-X Adapter
5701 X 10/100/1000 Base-TX Ethernet PCI-X Adapter
5706 X 2-Port 10/100/1000 Base-TX Ethernet PCI-X Adapter
5707 X 2-Port Gigabit Ethernet-SX PCI-X Adapter
5721 X 10 Gb Ethernet-SR PCI-X 2.0 DDR Adapter
5722 X 10 Gb Ethernet-LR PCI-X 2.0 DDR Adapter
5740 X 4-Port 10/100/1000 Base-TX PCI-X Adapter
5767 X 2-Port 10/100/1000 Base-TX Ethernet PCI Express Adapter
5768 X 2-Port Gigabit Ethernet-SX PCI Express Adapter
Miscellaneous internal system
1800 X RIO-2 Remote I/O Loop Adapter
1802 X GX Dual Port- 12X HCA
5636 X 2X- 1GB Virtual Ethernet- Integrated I/O ports
5637 X 2X- 10GB (SR) Virtual Ethernet- Integrated I/O ports
5639 X 4X- 1GB Virtual Ethernet- Integrated I/O ports
5648 X Service Interface Card.
This feature is used to connect each CEC enclosure to the active Service Processor through
the external service interface cable so the drawer content can be monitored for RAS
purposes.
6446 X Dual-port 12X Channel Attach- Short Run
6457 X Dual-port 12X Channel Attach- Long Run
7878 X System Port Riser Card
Miscellaneous
2737 X Keyboard/Mouse Attachment Card - PCI
2738 X 2-Port USB PCI Adapter
4764 X PCI-X Cryptographic Coprocessor (FIPS-4)
SCSI
5736 X PCI-X DDR Dual Channel Ultra320 SCSI Adapter

Cables
Feature 9117-MMA Description
Asynchronous
2934 X Asynchronous Terminal/Printer Cable EIA-232
2936 X Asynchronous Cable EIA-232/V.24
3925 X Serial Port Converter Cable, 9-Pin to 25-Pin
3926 X Asynchronous Printer/Terminal Cable, 9-pin to 25-pin, 4 m
8113 X RJ-45 to DB-25 Converter Cable
Fiber

38 System p: Overview
Feature 9117-MMA Description
2456 X LC-SC 50 Micron Fiber Converter Cable
2459 X LC-SC 62.5 Micron Fiber Converter Cable
Graphics
4242 X 6-Foot Extender Cable for Displays (15-pin D-shell to 15-pin D-shell)
4276 X VGA to DVI Connection Converter
InfiniBand cable
1828 X 12X to 4X Channel Conversion Cable- 1.5 meter
1830 X 1.5 Meter 12X cable
1840 X 3.0 Meter 12X Cable
1841 X 12x to 4x Channel Conversion Cable - 3 meter
1842 X 12X to 4X Channel Conversion Cable- 10 meter
Keyboard
4256 X Extender Cable - USB Keyboards, 2 m
LAN
7801 X Ethernet Cable, 6 m, Hardware Management Console to System Unit
7802 X Ethernet Cable, 15 m, Hardware Management Console to System Unit
Miscellaneous
2877 X ARTIC960RxD Quad DTA, H.100, 4-Drop Cable
3124 X Serial-to-Serial Port Cable for Drawer to Drawer
3125 X Serial-to-Serial Port Cable for Rack to Rack
3927 X Serial Port Null Modem Cable, 9-pin to 9-pin, 3.7 m
3928 X Serial Port Null Modem Cable, 9-pin to 9-pin, 10 m
Rack Related
5657 X Service Interface Cable- 2 Drawer
Connects the components in each CEC enclosure to the active Service Processor for
monitoring system functions.
5658 X Service Interface Cable- 3 Drawer
Connects the components in each CEC enclosure to the active Service Processor for
monitoring system functions.
5660 X Service Interface Cable- 4 Drawer
Connects the components in each CEC enclosure to the active Service Processor for
monitoring system functions.
SCSI External
2124 X Ultra 320 SCSI Cable 1 meter
2125 X Ultra 320 SCSI Cable 3 meter
2126 X Ultra 320 SCSI Cable 5 meter
2127 X Ultra 320 SCSI Cable 10 meter
2128 X Ultra 320 SCSI Cable 20 meter
2138 X Ultra 320 SCSI Cable 0.55 meter for I/O Drawer attachment.

Appendix A. Supported hardware features 39


Disks
9117-
Feature MMA Description
SAS
3646 X 73 GB 15K RPM SAS Disk Drive
3647 X 146 GB 15K RPM SAS Disk Drive
3648 X 300 GB 15K RPM SAS Disk Drive

Displays
9117-
Feature MMA Description
3643 X T120 Flat Panel Monitor
3645 X T117 Flat Panel Monitor

Expansion units

The expansion units shown here are identified using machine type model (MTM) numbers rather than
feature codes.

MTM 9117-MMA Description


7314- X 7314 Model G30 I/O Drawer
G30
7311-D11 X I/O Drawer Rack-mountable Expansion Cabinet Model D11
7311- X Rack-Mounted High-Density Expansion Drawer Model D20
D20

Media devices
Feature 9117-MMA Description
IDE CD/DVD
5756 X IDE Slimline DVD-ROM Drive
5757 X 4.7 GB IDE Slimline DVD-RAM Drive
5629 X Media Enclosure and Backplane

Memory
Feature 9117-MMA Description
DIMMs
4495 X 4/8GB (4X2GB) DIMMS, 276-pin 533 MHz, DDR2 SDRAM
7893 4GB (4x1GB) DIMMs, 276-pin, 533 MHz, DDR2 SDRAM
7894 8GB (4x2GB) DIMMs, 276-pin, 533 MHz, DDR2 SDRAM
Memory Capacity Upgrade on Demand (CUoD)
5692 X 0/2GB (4X0.5GB) DIMMS, 667 MHz, DDR2, POWER6 CoD Memory

40 System p: Overview
Feature 9117-MMA Description
5693 X 0/4GB (4X1GB) DIMMS, 667 MHz, DDR2, POWER6 CoD Memory
5694 X 0/8GB (4X2GB) DIMMS, 667 MHz, DDR2, POWER6 CoD Memory
5695 X 0/16GB (4X4GB) DIMMS, 533 MHz, DDR2, POWER6 CoD Memory
5696 X 0/32GB (4X8GB) DIMMS, 400 MHz, DDR2, POWER6 CoD Memory
Memory Capacity Upgrade on Demand (CUoD) activation
5680 X Activation of 1GB DDR2 POWER6 Memory
7663 X 1GB DDR2 Memory Activation
Memory Capacity Upgrade on Demand (CUoD) usage billing
5691 X ON/OFF Memory Billing for 1 GB-Day-P6 Memory
7954 X On/Off Memory Enablement

Miscellaneous features
Feature 9117-MMA Description
CEC planars
5663 X Processor Enclosure and Backplane
Disk bays
5668 X SAS DASD Backplane -6 slot
I/O planars
5666 X I/O Backplane
LPAR
7942 X Advanced POWER Virtualization
Other
5667 X System Midplane
1845 X Operator Panel

Pointing devices
Feature 9117-MMA Description
8841 X Mouse - USB, Business Black with Keyboard Attachment Cable

Power
9117-
Feature MMA Description
CEC related
5625 X Processor Power Regulator- POWER6 technology
5686 X Virtual Processor Power Regulator
Drawer related
5628 X AC Power Supply, 1600 W
7870 X Power Distribution Backplane

Appendix A. Supported hardware features 41


Processor
Feature 9117-MMA Description
Processor fabric
3660 X Processor Fabric Cable, 2 enclosure
3664 X Processor Fabric Cable, 3 enclosure
3665 X Processor Fabric Cable, 4 enclosure
Processor card
5620 X 3.5 GHz POWER6 -2 Core Processor Card, 0-core active, 12 DDR2 Memory Slots
5621 X 4.2 GHz POWER6 -2 Core Processor Card, 0-core active, 8 DDR2 Memory Slots
5622 X 4.2 GHz POWER6 -2 Core Processor Card, 0-core active, 12 DDR2 Memory Slots
7380 X 4.7 GHz POWER6 -2 Core Processor Card, 0-core active, 12 DDR2 Memory Slots

Rack related
9117-
Feature MMA Description
5626 X System CEC Enclosure with Bezel
5627 X System CEC Enclosure with Bezel
9570 X Reserved Rack Space Indicator - 4U
7164 X Rack-mount Drawer Rail Kit

42 System p: Overview
Appendix B. Accessibility features
Accessibility features help users who have a physical disability, such as restricted mobility or limited
vision, to use information technology products successfully.

The following list includes the major accessibility features:


v Keyboard-only operation
v Interfaces that are commonly used by screen readers
v Keys that are tactilely discernible and do not activate just by touching them
v Industry-standard devices for ports and connectors
v The attachment of alternative input and output devices

IBM and accessibility

See the IBM Accessibility Center at http://www.ibm.com/able/ for more information about the
commitment that IBM has to accessibility.

If you need an accessible version of this publication, send a request to [email protected]. In the request,
make sure that you include the publication number and the title.

© Copyright IBM Corp. 2007 43


44 System p: Overview
Notices
This information was developed for products and services offered in the U.S.A.

The manufacturer may not offer the products, services, or features discussed in this document in other
countries. Consult the manufacturer’s representative for information on the products and services
currently available in your area. Any reference to the manufacturer’s product, program, or service is not
intended to state or imply that only that product, program, or service may be used. Any functionally
equivalent product, program, or service that does not infringe any intellectual property right of the
manufacturer may be used instead. However, it is the user’s responsibility to evaluate and verify the
operation of any product, program, or service.

The manufacturer may have patents or pending patent applications covering subject matter described in
this document. The furnishing of this document does not give you any license to these patents. You can
send license inquiries, in writing, to the manufacturer.

The following paragraph does not apply to the United Kingdom or any other country where such
provisions are inconsistent with local law: THIS INFORMATION IS PROVIDED “AS IS” WITHOUT
WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A
PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain
transactions, therefore, this statement may not apply to you.

This information could include technical inaccuracies or typographical errors. Changes are periodically
made to the information herein; these changes will be incorporated in new editions of the publication.
The manufacturer may make improvements and/or changes in the product(s) and/or the program(s)
described in this publication at any time without notice.

Any references in this information to Web sites not owned by the manufacturer are provided for
convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at
those Web sites are not part of the materials for this product and use of those Web sites is at your own
risk.

The manufacturer may use or distribute any of the information you supply in any way it believes
appropriate without incurring any obligation to you.

Any performance data contained herein was determined in a controlled environment. Therefore, the
results obtained in other operating environments may vary significantly. Some measurements may have
been made on development-level systems and there is no guarantee that these measurements will be the
same on generally available systems. Furthermore, some measurements may have been estimated through
extrapolation. Actual results may vary. Users of this document should verify the applicable data for their
specific environment.

Information concerning products not produced by this manufacturer was obtained from the suppliers of
those products, their published announcements or other publicly available sources. This manufacturer has
not tested those products and cannot confirm the accuracy of performance, compatibility or any other
claims related to products not produced by this manufacturer. Questions on the capabilities of products
not produced by this manufacturer should be addressed to the suppliers of those products.

All statements regarding the manufacturer’s future direction or intent are subject to change or withdrawal
without notice, and represent goals and objectives only.

© Copyright IBM Corp. 2007 45


The manufacturer’s prices shown are the manufacturer’s suggested retail prices, are current and are
subject to change without notice. Dealer prices may vary.

This information is for planning purposes only. The information herein is subject to change before the
products described become available.

This information contains examples of data and reports used in daily business operations. To illustrate
them as completely as possible, the examples include the names of individuals, companies, brands, and
products. All of these names are fictitious and any similarity to the names and addresses used by an
actual business enterprise is entirely coincidental.

If you are viewing this information in softcopy, the photographs and color illustrations may not appear.

The drawings and specifications contained herein shall not be reproduced in whole or in part without the
written permission of the manufacturer.

The manufacturer has prepared this information for use with the specific machines indicated. The
manufacturer makes no representations that it is suitable for any other purpose.

The manufacturer’s computer systems contain mechanisms designed to reduce the possibility of
undetected data corruption or loss. This risk, however, cannot be eliminated. Users who experience
unplanned outages, system failures, power fluctuations or outages, or component failures must verify the
accuracy of operations performed and data saved or transmitted by the system at or near the time of the
outage or failure. In addition, users must establish procedures to ensure that there is independent data
verification before relying on such data in sensitive or critical operations. Users should periodically check
the manufacturer’s support websites for updated information and fixes applicable to the system and
related software.

Trademarks
The following terms are trademarks of International Business Machines Corporation in the United States,
other countries, or both:

AIX
Chipkill
Electronic Service Agent
HACMP
IBM
Micro-Partitioning
POWER5
POWER6
Resource Link
System p
ViaVoice

Microsoft®, Windows®, Windows NT®, and the Windows logo are trademarks of Microsoft Corporation in
the United States, other countries, or both.

InfiniBand and the InfiniBand design marks are trademarks and/or service marks of the InfiniBand Trade
Association.

Linux is a trademark of Linus Torvalds in the United States, other countries, or both.

Other company, product or service names may be trademarks or service marks of others.

46 System p: Overview
Terms and conditions
Permissions for the use of these publications is granted subject to the following terms and conditions.

Personal Use: You may reproduce these publications for your personal, noncommercial use provided that
all proprietary notices are preserved. You may not distribute, display or make derivative works of these
publications, or any portion thereof, without the express consent of the manufacturer.

Commercial Use: You may reproduce, distribute and display these publications solely within your
enterprise provided that all proprietary notices are preserved. You may not make derivative works of
these publications, or reproduce, distribute or display these publications or any portion thereof outside
your enterprise, without the express consent of the manufacturer.

Except as expressly granted in this permission, no other permissions, licenses or rights are granted, either
express or implied, to the publications or any data, software or other intellectual property contained
therein.

The manufacturer reserves the right to withdraw the permissions granted herein whenever, in its
discretion, the use of the publications is detrimental to its interest or, as determined by the manufacturer,
the above instructions are not being properly followed.

You may not download, export or re-export this information except in full compliance with all applicable
laws and regulations, including all United States export laws and regulations.

THE MANUFACTURER MAKES NO GUARANTEE ABOUT THE CONTENT OF THESE


PUBLICATIONS. THESE PUBLICATIONS ARE PROVIDED ″AS-IS″ AND WITHOUT WARRANTY OF
ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO IMPLIED
WARRANTIES OF MERCHANTABILITY, NON-INFRINGEMENT, AND FITNESS FOR A PARTICULAR
PURPOSE.

Notices 47
48 System p: Overview


Printed in USA

SA76-0087-01

You might also like