Performancetools PDF
Performancetools PDF
IBM
Note
Before using this information and the product it supports, read the information in “Notices” on page
331.
This edition applies to AIX Version 7.3 and to all subsequent releases and modifications until otherwise indicated in new
editions.
© Copyright International Business Machines Corporation 2021.
US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with
IBM Corp.
Contents
iii
The timing commands .......................................................................................................................226
The prof command ............................................................................................................................ 227
The gprof command........................................................................................................................... 228
The tprof command............................................................................................................................230
The svmon command.............................................................................................................................. 237
Security...............................................................................................................................................237
The svmon configuration file..............................................................................................................238
Summary report metrics.................................................................................................................... 238
Report formatting options..................................................................................................................239
Segment details and -O options........................................................................................................ 240
Additional -O options......................................................................................................................... 244
Reports details................................................................................................................................... 248
Remote Statistics Interface API Overview............................................................................................. 269
Remote Statistics Interface list of subroutines.................................................................................270
RSI Interface Concepts and Terms....................................................................................................271
A Simple Data-Consumer Program....................................................................................................276
Expanding the data-consumer program............................................................................................279
Inviting data suppliers....................................................................................................................... 281
A Full-Screen, character-based monitor........................................................................................... 282
List of RSI Error Codes....................................................................................................................... 282
The nmon recording tool......................................................................................................................... 286
Asynchronous I/O statistics...............................................................................................................286
nmon recording tool commands........................................................................................................286
I/O statistics....................................................................................................................................... 290
Kernel statistics..................................................................................................................................297
Memory statistics............................................................................................................................... 298
Network statistics.............................................................................................................................. 302
Process Statistics............................................................................................................................... 313
Processor statistics............................................................................................................................ 316
Recording configuration details......................................................................................................... 327
WLM statistics.................................................................................................................................... 329
Workload partition statistics..............................................................................................................329
Notices..............................................................................................................331
Privacy policy considerations.................................................................................................................. 332
Trademarks.............................................................................................................................................. 333
Index................................................................................................................ 335
iv
About this document
The Performance Tools Guide and Reference provides experienced system administrators, application
programmers, service representatives, system engineers, end users, and system programmers with
complete, detailed information about the various performance tools that are available for monitoring
and tuning AIX® systems and applications running on those systems.
The information contained in this document pertains to systems running AIX 7.1, or later. Any content that
is applicable to earlier releases will be noted as such.
Highlighting
The following highlighting conventions are used in this document:
Bold Identifies commands, subroutines, keywords, files, structures, directories, and other
items whose names are predefined by the system. Also identifies graphical objects
such as buttons, labels, and icons that the user selects.
Italics Identifies parameters whose actual names or values are to be supplied by the user.
Identifies examples of specific data values, examples of text similar to what you
Monospace
might see displayed, examples of portions of program code similar to what you
might write as a programmer, messages from the system, or information you should
actually type.
Case-sensitivity in AIX
Everything in the AIX operating system is case-sensitive, which means that it distinguishes between
uppercase and lowercase letters. For example, you can use the ls command to list files. If you type LS,
the system responds that the command is not found. Likewise, FILEA, FiLea, and filea are three
distinct file names, even if they reside in the same directory. To avoid causing undesirable actions to be
performed, always ensure that you use the correct case.
ISO 9000
ISO 9000 registered quality systems were used in the development and manufacturing of this product.
Item Descriptor
Throughput expectations A measure of the amount of work performed over a period of time
Response time expectations The elapsed time between when a request is submitted and when
the response from that request is returned
November 2023
The following information is a summary of the updates made to this topic collection:
• Removed Multiplexing of PMU context as the feature is no longer supported.
November 2022
The following information is a summary of the updates made to this topic collection:
• Added lcpuid_to_bindid subroutine under Global interfaces in Perfstat API Programming. It
returns the bind CPU ID.
• Added bindid_to_lcpuid subroutine under Global interfaces in Perfstat API Programming. It
returns the logical CPU ID.
December 2021
The following information is a summary of the updates made to this topic collection:
• You can now save and restore all AIX tunable parameter values after the next boot and next live update
operations. Updated SMIT instructions in the “Global manipulation of tuning parameters” on page 217
topic.
Item Descriptor
-i inputfile Specifies the input AIX trace file to be analyzed.
-o outputfile Specifies an output file (default is stdout).
-n gensymsfile Specifies a names file produced by gensyms.
Parameters
Item Descriptor
gensymsfile The names file as produced by the gensyms command.
inputfile The AIX trace file to be processed by the curt command.
outputfile The name of the output file created by the curt command.
pidnamefile If the trace process name table is not accurate, or if more descriptive names are
desired, use the -a flag to specify a PID to process name mapping file. This is a
file with lines consisting of a process ID (in decimal) followed by a space, then an
ASCII string to use as the name for that process.
timestamp The time in seconds at which to start and stop the trace file processing.
trcnmfile The names file as produced by the trcnmcommand.
PURR The name of the register that is used to calculate CPU times.
Trace hooks 119 and 135 are used to report on the time spent in the exit system call. Trace hooks 134,
139, 210, and 465 are used to keep track of TIDs, PIDs and process names.
Trace hook 492 is used to report on the time spent in the hypervisor.
Trace hooks 605 and 609 are used to report on the time spent in the pthreads library.
To get the PTHREAD hooks in the trace, you must execute your pthread application using the
instrumented libpthreads.a library.
# HOOKS="100,101,102,103,104,106,10C,119,134,135,139,200,210,215,38F,419,465,47F,488,489,48A,
48D,492,605,609"
# SIZE="1000000"
# export HOOKS SIZE
# trace -n -C all -d -j $HOOKS -L $SIZE -T $SIZE -afo trace.raw
# export LIBPATH=/usr/ccs/lib/perf:$LIBPATH
# trcon ; pthread.app ; trcstop
# unset HOOKS SIZE
# ls trace.raw*
trace.raw trace.raw-0 trace.raw-1 trace.raw-2 trace.raw-3
# trcrpt -C all -r trace.raw > trace.r
# rm trace.raw*
# ls trace*
trace.r
# gensyms > gensyms.out
# trcnm > trace.nm
General information
The general information displays the time and date when the report was generated, and is followed by the
syntax of the curt command line that was used to produce the report.
This section also contains some information about the AIX trace file that was processed by the curt
command. This information consists of the trace file's name, size, and its creation date. The command
used to invoke the AIX trace facility and gather the trace file is displayed at the end of the report.
The following is a sample of the general information section:
Run on Wed Apr 26 10:51:33 2XXX
Command line was:
curt -i trace.raw -n gensyms.out -o curt.out
----
AIX trace file name = trace.raw
AIX trace file size = 787848
Wed Apr 26 10:50:11 2XXX
System: AIX 5.3 Node: bu Machine: 00CFEDAD4C00
AIX trace file created = Wed Apr 26 10:50:11 2XXX
System summary
The system summary information produced by the curt command describes the time spent by the whole
system (all CPUs) in various execution modes.
The following is a sample of the System summary:
Item Descriptor
APPLICATION The sum of times spent by all processors in User (that is, non-privileged) mode.
SYSCALL The sum of times spent by all processors doing System Calls. This is the portion
of time that a processor spends executing in the kernel code providing services
directly requested by a user process.
HCALL The sum of times spent by all processors doing Hypervisor Calls. This is the
portion of time that a processor spends executing in the hypervisor code
providing services directly requested by the kernel.
KPROC The sum of times spent by all processors executing kernel processes other than
IDLE and NFS processes. This is the portion of time that a processor spends
executing specially created dispatchable processes that only execute kernel
code.
NFS The sum of times spent by all processors executing NFS operations. This is the
portion of time that a processor spends executing in the kernel code providing
NFS services directly requested by a kernel process.
FLIH The sum of times spent by all processors executing FLIHs.
The System Summary example indicates that the CPU is spending most of its time in application mode.
There is still 4234.76 ms of IDLE time so there is enough CPU to run applications. If there is insufficient
CPU power, do not expect to see any IDLE time. The Avg. Thread Affinity value is 0.99 showing good
processor affinity; that is, threads returning to the same processor when they are ready to be run again.
Item Descriptor
processing total time Total time in milliseconds for the corresponding processing category.
percent total time Time from the first column as a percentage of the sum of total trace elapsed
time for all processors. This includes whatever amount of time each processor
spent running the IDLE process.
percent application Time from the first column as a percentage of the sum of total trace elapsed
time application time for all processors
Avg. Pthread Affinity Probability that a pthread was dispatched on the same kernel thread on which
it last executed.
PROC 0 : 15
PROC 24 : 15
...(lines omitted)...
Item Descriptor
combined The total amount of CPU time, expressed in milliseconds, that the thread was
running in either application mode or system call mode.
application The amount of CPU time, expressed in milliseconds, that the thread spent in
application mode.
syscall The amount of CPU time, expressed in milliseconds, that the thread spent in system
call mode.
Item Descriptor
combined The amount of CPU time that the thread was running, expressed as percentage of
the total processing time.
application The amount of CPU time that the thread the thread spent in application mode,
expressed as percentage of the total processing time.
syscall The amount of CPU time that the thread spent in system call mode, expressed as
percentage of the total processing time.
In the example above, we can investigate why the system is spending so much time in application mode
by looking at the Application Summary (by Tid), where we can see the top three processes of the report
are named cpu, a test program that uses a great deal of CPU time. The report shows again that the
CPU spent most of its time in application mode running the cpu process. Therefore the cpu process is a
candidate to be optimized to improve system performance.
...(lines omitted)...
...(lines omitted)...
Kproc Types
-----------
Type Function Operation
==== ============================ ==========================
Item Descriptor
combined The amount of CPU time that the thread was running, expressed as percentage
of the total processing time.
kernel The amount of CPU time that the thread spent in unidentified kernel mode,
expressed as percentage of the total processing time.
operation The amount of CPU time that the thread spent in traced operations, expressed
as percentage of the total processing time.
Kproc Types
Item Descriptor
Type A single letter to be used as an index into this listing.
Function A description of the nominal function of this type of kernel process.
Operation A description of the traced operations for this type of kernel process.
...(lines omitted)...
Item Descriptor
name (Pid) (Pthread The name of the process associated with the process ID, and the number of
Count) pthreads of this process.
Item Descriptor
application The total amount of CPU time, expressed in milliseconds, that the process
was running in user mode.
pthread The amount of CPU time, expressed in milliseconds, that the process spent
in traced call to the pthreads library.
other The amount of CPU time, expressed in milliseconds, that the process spent
in non traced user mode.
Item Descriptor
application The amount of CPU time that the process was running in user mode,
expressed as percentage of the total application time.
pthread The amount of CPU time that the process spent in calls to the pthreads
library, expressed as percentage of the total application time.
other The amount of CPU time that the process spent in non traced user mode,
expressed as percentage of the total application time.
...(lines omitted)...
...(lines omitted)...
Item Descriptor
Accumulated Time The accumulated CPU time that the system spent processing the pending system
(msec) call, expressed in milliseconds.
SVC (Address) The name of the system call and its kernel address.
Procname (Pid Tid) The name of the process associated with the thread that made the system call, its
process ID, and the thread ID.
Item Descriptor
Accumulated Time The accumulated CPU time that the system spent processing the pending
(msec) hypervisor call, expressed in milliseconds.
HCALL (address) The name of the hypervisor call and the kernel address of its caller.
Procname (Pid Tid) The name of the process associated with the thread that made the
hypervisor call, its process ID, and the thread ID.
The Pending System NFS Calls Summary has the following fields:
Item Descriptor
Accumulated Time (msec) The accumulated CPU time that the system spent processing the
pending system NFS call, expressed in milliseconds.
Sequence Number The sequence number represents the transaction identifier (XID) of an
NFS operation. It is used to uniquely identify an operation and is used
in the RPC call/reply messages. This number is provided instead of the
operation name because the name of the operation is unknown until it
completes.
Opcode The name of pending operation NFS V4.
Procname (Pid Tid) The name of the process associated with the thread that made the
system NFS call, its process ID, and the thread ID.
Item Descriptor
Count The number of times that a pthread call of a certain type has been called during the
monitoring period.
Total Time (msec) The total CPU time that the system spent processing all pthread calls of this type,
expressed in milliseconds.
% sys time The total CPU time that the system spent processing all calls of this type,
expressed as a percentage of the total processing time.
Avg Time (msec) The average CPU time that the system spent processing one pthread call of this
type, expressed in milliseconds.
Min Time (msec) The minimum CPU time the system used to process one pthread call of this type,
expressed in milliseconds.
Pthread routine The name of the routine in the pthread library.
The Pending Pthread System Calls Summary has the following fields:
Item Descriptor
Accumulated Time The accumulated CPU time that the system spent processing the pending pthread
(msec) call, expressed in milliseconds.
Pthread Routine The name of the pthread routine of the libpthreads library.
Procname (Pid Tid The name of the process associated with the thread and the pthread which made
Ptid) the pthread call, its process ID, the thread ID and the pthread ID.
FLIH summary
The FLIH (First Level Interrupt Handler) summary lists all first level interrupt handlers that were called
during the monitoring period.
The Global FLIH Summary lists the total of first level interrupts on the system, while the Per CPU FLIH
Summary lists the first level interrupts per CPU.
CPU Number 1:
Count Total Time Avg Time Min Time Max Time Flih Type
(msec) (msec) (msec) (msec)
====== =========== =========== =========== =========== =========
4 0.2405 0.0601 0.0517 0.0735 3(DATA_ACC_PG_FLT)
258 49.2098 0.1907 0.0060 0.5076 5(IO_INTR)
515 55.3714 0.1075 0.0080 0.3696 31(DECR_INTR)
...(lines omitted)...
Item Descriptor
Count The number of times that a first level interrupt of a certain type (see Flih Type)
occurred during the monitoring period.
Total Time (msec) The total CPU time that the system spent processing these first level interrupts,
expressed in milliseconds.
Avg Time (msec) The average CPU time that the system spent processing one first level interrupt of
this type, expressed in milliseconds.
Item Descriptor
DATA_ACC_PG_FLT Data access page fault
QUEUED_INTR Queued interrupt
DECR_INTR Decrementer interrupt
IO_INTR I/O interrupt
SLIH summary
The Second level interrupt handler (SLIH) Summary lists all second level interrupt handlers that were
called during the monitoring period.
The Global Slih Summary lists the total of second level interrupts on the system, while the Per CPU Slih
Summary lists the second level interrupts per CPU.
...(lines omitted)...
Item Descriptor
Count The number of times that each second level interrupt handler was called during the
monitoring period.
Total Time (msec) The total CPU time that the system spent processing these second level interrupts,
expressed in milliseconds.
Avg Time (msec) The average CPU time that the system spent processing one second level interrupt
of this type, expressed in milliseconds.
...(lines omitted)...
...(lines omitted)...
...(lines omitted)...
The system call, hypervisor call, NFS call, and pthread call reports in the preceding example have the
following fields in addition to the default System Calls Summary, Hypervisor Calls Summary, System NFS
Calls Summary, and Pthread Calls Summary :
Item Descriptor
Tot ETime (msec) The total amount of time from when each instance of the call was started until
it completed. This time will include any time spent servicing interrupts, running
other processes, and so forth.
Avg ETime (msec) The average amount of time from when the call was started until it completed.
This time will include any time spent servicing interrupts, running other
processes, and so forth.
Min ETime (msec) The minimum amount of time from when the call was started until it completed.
This time will include any time spent servicing interrupts, running other
processes, and so forth.
The preceding example report shows that the maximum elapsed time for the kwrite system call was
422.2323 msec, but the maximum CPU time was 4.5626 msec. If this amount of overhead time is
unusual for the device being written to, further analysis is needed.
...(lines omitted)...
...(lines omitted)...
If a large number of errors of a specific type or on a specific system call point to a system or application
problem, other debug measures can be used to determine and fix the problem.
...(lines omitted)...
--------------------------------------------------------------------------------
Report for Thread Id: 48841 (hex bec9) Pid: 143984 (hex 23270)
Process Name: oracle
---------------------
Total Application Time (ms): 70.324465
Total System Call Time (ms): 53.014910
Total Hypervisor Call Time (ms): 0.077000
Count Total Time Avg Time Min Time Max Time SVC (Address)
(msec) (msec) (msec) (msec)
======== =========== =========== =========== =========== ================
69 34.0819 0.4939 0.1666 1.2762 kwrite(169ff8)
77 12.0026 0.1559 0.0474 0.2889 kread(16a01c)
510 4.9743 0.0098 0.0029 0.0467 times(f1e14)
73 1.2045 0.0165 0.0105 0.0306 select(1d1704)
68 0.6000 0.0088 0.0023 0.0445 lseek(16a094)
12 0.1516 0.0126 0.0071 0.0241 getrusage(f1be0)
...(lines omitted)...
If the thread belongs to an NFS kernel process, the report will include information on NFS operations
instead of System calls:
Report for Thread Id: 1966273 (hex 1e00c1) Pid: 1007854 (hex f60ee)
Process Name: nfsd
---------------------
Total Kernel Time (ms): 3.198998
Total Operation Time (ms): 28.839927
Total Hypervisor Call Time (ms): 0.000000
Item Descriptor
Thread ID The Thread ID of the thread.
Process ID The Process ID that the thread belongs to.
Process Name The process name, if known, that the thread belongs to.
Total Application Time The amount of time, expressed in milliseconds, that the thread spent in
(ms) application mode.
Total System Call Time The amount of time, expressed in milliseconds, that the thread spent in
(ms) system call mode.
Thread System Call A system call summary for the thread; this has the same fields as the
Summary global System Calls Summary. It also includes elapsed time if the -e flag is
specified and error information if the -s flag is specified.
Pending System Calls If the thread was executing a system call at the end of the trace, a pending
Summary system call summary will be printed. This has the Accumulated Time and
Supervisor Call (SVC Address) fields. It also includes elapsed time if the -e
flag is specified.
Thread Hypervisor Calls The hypervisor call summary for the thread; this has the same fields as the
Summary global Hypervisor Calls Summary. It also includes elapsed time if the -e flag
is specified.
Pending Hypervisor Calls If the thread was executing a hypervisor call at the end of the trace, a
Summary pending hypervisor call summary will be printed. This has the Accumulated
Time and Hypervisor Call fields. It also includes elapsed time if the -e flag is
specified.
Thread NFS Calls An NFS call summary for the thread. This has the same fields as the global
Summary System NFS Call Summary. It also includes elapsed time if the -e flag is
specified.
7 Tids for this Pid: 245889 245631 244599 82843 78701 75347 28941
9 Ptids for this Pid: 2057 1800 1543 1286 1029 772 515 258 1
Count Total Time % sys Avg Time Min Time Max Time SVC (Address)
(msec) time (msec) (msec) (msec)
======== =========== ====== ======== ======== ======== ================
93 3.6829 0.05% 0.0396 0.0060 0.3077 kread(19731c)
23 2.2395 0.03% 0.0974 0.0090 0.4537 kwrite(1972f8)
30 0.8885 0.01% 0.0296 0.0073 0.0460 select(208c5c)
...(omitted lines)...
Count Total Time % sys Avg Time Min Time Max Time Pthread Routine
(msec) time (msec) (msec) (msec)
======== =========== ====== ======== ======== ======== ================
19 0.0477 0.00% 0.0025 0.0017 0.0104 pthread_join
1 0.0065 0.00% 0.0065 0.0065 0.0065 pthread_detach
1 0.6208 0.00% 0.6208 0.6208 0.6208 pthread_kill
6 0.1261 0.00% 0.0210 0.0077 0.0779 pthread_cancel
21 0.7080 0.01% 0.0337 0.0226 0.1222 pthread_create
If the process is an NFS kernel process, the report will include information on NFS operations instead of
System and Pthread calls:
Item Descriptor
Total Application The amount of time, expressed in milliseconds, that the process spent in
Time (ms) application mode.
Total System Call The amount of time, expressed in milliseconds, that the process spent in
Time (ms) system call mode.
Item Descriptor
Total Pthread Call The amount of time, expressed in milliseconds, that the process spent in traced
Time pthread library calls.
Total Pthread The amount of time, expressed in milliseconds, that the process spent in
Dispatch Time libpthreads dispatch code.
Total Pthread Idle The amount of time, expressed in milliseconds, that the process spent in
Dispatch Time libpthreads vp_sleep code.
Total Other Time The amount of time, expressed in milliseconds, that the process spent in non-
traced user mode code.
Total number of The total number of times a pthread belonging to the process was dispatched
pthread dispatches by the libpthreads dispatcher.
Total number The total number of times a thread belonging to the process was in the
of pthread idle libpthreads vp_sleep code.
dispatches
Item Descriptor
Process System Calls A system call summary for the process; this has the same fields as the global
Summary System Call Summary. It also includes elapsed time information if the -e flag is
specified and error information if the -s flag is specified.
Pending System Calls If the process was executing a system call at the end of the trace, a pending
Summary system call summary will be printed. This has the Accumulated Time and
Supervisor Call (SVC Address) fields. It also includes elapsed time information
if the -e flag is specified.
Process Hypervisor A summary of the hypervisor calls for the process; this has the same fields as
Calls Summary the global Hypervisor Calls Summary. It also includes elapsed time information
if the -e flag is specified.
Pending Hypervisor If the process was executing a hypervisor call at the end of the trace, a pending
Calls Summary hypervisor call summary will be printed. This has the Accumulated Time and
Hypervisor Call fields. It also includes elapsed time information if the -e flag is
specified.
Process NFS Calls An NFS call summary for the process. This has the same fields as the global
Summary System NFS Call Summary. It also includes elapsed time information if the -e
flag is specified.
Pending NFS Calls If the process was executing an NFS call at the end of the trace, a pending NFS
Summary call summary will be printed. This has the Accumulated Time and Sequence
Number or, in the case of NFS V4, Opcode, fields. It also includes elapsed time
information if the -e flag is specified.
Pthread Calls A summary of the pthread calls for the process. This has the same fields as the
Summary global pthread Calls Summary. It also includes elapsed time information if the
-e flag is specified.
Pending Pthread Calls If the process was executing pthread library calls at the end of the trace, a
Summary pending pthread call summary will be printed. This has the Accumulated Time
and Pthread Routine fields. It also includes elapsed time information if the -e
flag is specified.
Count Total Time Avg Time Min Time Max Time SVC (Address)
(msec) (msec) (msec) (msec)
======== =========== ======== ======== ======== ================
1 3.3898 3.3898 3.3898 3.3898 _exit(409e50)
61 0.8138 0.0133 0.0089 0.0254 kread(5ffd78)
11 0.4616 0.0420 0.0262 0.0835 thread_create(407360)
22 0.2570 0.0117 0.0062 0.0373 mprotect(6d5bd8)
12 0.2126 0.0177 0.0100 0.0324 thread_setstate(40a660)
115 0.1875 0.0016 0.0012 0.0037 klseek(5ffe38)
12 0.1061 0.0088 0.0032 0.0134 sbrk(6d4f90)
23 0.0803 0.0035 0.0018 0.0072 trcgent(4078d8)
...(lines omitted)...
Count Total Time % sys Avg Time Min Time Max Time Pthread Routine
(msec) time (msec) (msec) (msec)
======== =========== ====== ======== ======== ======== ================
11 0.9545 0.01% 0.0868 0.0457 0.1833 pthread_create
8 0.0725 0.00% 0.0091 0.0064 0.0205 pthread_join
1 0.0553 0.00% 0.0553 0.0553 0.0553 pthread_detach
1 0.0341 0.00% 0.0341 0.0341 0.0341 pthread_cancel
1 0.0229 0.00% 0.0229 0.0229 0.0229 pthread_kill
The information in the application time details report includes the following:
Item Descriptor
Total Pthread Call The amount of time, expressed in milliseconds, that the pthread spent in traced
Time pthread library calls.
Total Pthread The amount of time, expressed in milliseconds, that the pthread spent in
Dispatch Time libpthreads dispatch code.
Total Pthread Idle The amount of time, expressed in milliseconds, that the pthread spent in
Dispatch Time libpthreads vp_sleep code.
Total Other Time The amount of time, expressed in milliseconds, that the pthread spent in non-
traced user mode code.
Total number of The total number of times a pthread belonging to the process was dispatched
pthread dispatches by the libpthreads dispatcher.
Total number The total number of times a thread belonging to the process was in the
of pthread idle libpthreads vp_sleep code.
dispatches
Item Descriptor
Pthread System A system call summary for the pthread; this has the same fields as the global
Calls Summary System Call Summary. It also includes elapsed time information if the -e flag is
specified and error information if the -s flag is specified.
Pending System If the pthread was executing a system call at the end of the trace, a pending
Calls Summary system call summary will be printed. This has the Accumulated Time and
Supervisor Call (SVC Address) fields. It also includes elapsed time information
if the -e flag is specified.
Pthread Hypervisor A summary of the hypervisor calls for the pthread. This has the same fields as the
Calls Summary global hypervisor calls summary. It also includes elapsed time information if the -e
flag is specified.
Item Descriptor
processor affinity Probability that for any dispatch of the pthread, the pthread was dispatched to
the same processor on which it last executed.
Processor Dispatch The number of times that the pthread was dispatched to each CPU in the
Histogram for pthread system.
avg. dispatch wait The average elapsed time for the pthread from being undispatched and its next
time dispatch.
Thread affinity The probability that for any dispatch of the pthread, the pthread was
dispatched to the same kernel thread on which it last executed
Thread Dispatch The number of times that the pthread was dispatched to each kernel thread in
Histogram for pthread the process.
total number of The total number of times the pthread was dispatched by the libpthreads
pthread dispatches dispatcher.
Data on Interrupts The number of times each type of FLIH occurred while the pthread was
that occurred while executing.
Pthread was Running
Parameters
CPUs The number of processors on the MP system that the trace was drawn from. The
default is 1. This value is overridden if more processors are observed to be reported in
the trace.
count The number of locks to report in the Lock Summary and Lock Detail reports, as well
as the number of functions to report in the Function Detail and threads to report in
the Thread detail. (The -s option specifies how the most significant locks, threads, and
functions are selected).
starttime The number of seconds after the first event recorded in the trace that the reporting
starts.
stoptime The number of seconds after the first event recorded in the trace that the reporting
stops.
topic Help topics, which are: all overview input names reports sorting
Trace discontinuities
The splat command uses the events in the trace to reconstruct the activities of threads and locks in the
original system.
If part of the trace is missing, it is because one of the following situations exists:
• Tracing was stopped at one point and restarted at a later point.
TRACE OFF record read at 0.567201 seconds. One or more of the CPUs has
stopped tracing. You might want to generate a longer trace using larger
buffers and re-run splat.
Some versions of the AIX kernel or PThread library might be incompletely instrumented, so the traces will
be missing events. The splat command might not provide correct results in this case.
Execution summary
The execution summary report is generated by default when you use the splat command.
The following example shows a sample of the execution summary.
*****************************************************************************************
splat Cmd: splat -p -sa -da -S100 -i trace.cooked -n gensyms -o splat.out
start stop
-------------------- --------------------
trace interval (absolute tics) 967436752 969072535
(relative tics) 0 1635783
(absolute secs) 58.057947 58.156114
(relative secs) 0.000000 0.098167
analysis interval (absolute tics) 967436752 969072535
(trace-relative tics) 0 1635783
(self-relative tics) 0 1635783
(absolute secs) 58.057947 58.156114
(trace-relative secs) 0.000000 0.098167
(self-relative secs) 0.000000 0.098167
**************************************************************************************
From the example above, you can see that the execution summary consists of the following elements:
• The splat version and build information, disclaimer, and copyright notice.
• The command used to run splat.
• The trace command used to collect the trace.
• The host on which the trace was taken.
• The date that the trace was taken.
• A sentence specifying whether the PURR register was used to calculate CPU times.
• The real-time duration of the trace, expressed in seconds.
• The maximum number of processors that were observed in the trace (the number specified in the trace
conditions information, and the number specified on the splat command line).
• The cumulative processor time, equal to the duration of the trace in seconds times the number of
processors that represents the total number of seconds of processor time consumed.
• A table containing the start and stop times of the trace interval, measured in tics and seconds, as
absolute timestamps, from the trace records, as well as relative to the first event in the trace
• The start and stop times of the analysis interval, measured in tics and seconds, as absolute timestamps,
as well as relative to the beginning of the trace interval and the beginning of the analysis interval.
***************************************************************************************
Unique Acquisitions Acq. or Passes Total System
Total Addresses (or Passes) per Second Spin Time
--------- --------- ------------ -------------- ------------
AIX (all) Locks: 523 523 1323045 72175.7768 0.003986
RunQ: 2 2 487178 26576.9121 0.000000
Simple: 480 480 824898 45000.4754 0.003986
Transformed: 22 18 234 352.3452
Krlock: 50 21 76876 32.6548 0.000458
Complex: 41 41 10969 598.3894 0.000000
PThread CondVar: 7 6 160623 8762.4305 0.000000
Mutex: 128 116 1927771 105165.2585 10.280745 *
RWLock: 0 0 0 0.0000 0.000000
The gross lock summary report table consists of the following columns:
Per-lock summary
The pre-locl summary report is generated by default when you use the splat command.
The following example shows a sample of the per-lock summary report.
************************************************************************************************
*********
100 max entries, Summary sorted by Acquisitions:
T Acqui- Wait
y sitions or Locks or Percent Holdtime
Lock Names, p or Trans- Passes Real Real
Comb
Class, or Address e Passes Spins form %Miss %Total / CSec CPU Elapse
Spin
********************** * ****** ***** **** ***** ****** ********* ******* ******
*******
PROC_INT_CLASS.0003 Q 486490 0 0 0.0000 36.7705 26539.380 5.3532 100.000
0.0000
THREAD_LOCK_CLASS.0012 S 323277 0 9468 0.0000 24.4343 17635.658 6.8216 6.8216
0.0000
THREAD_LOCK_CLASS.0118 D 323094 0 4568 0.0000 24.4205 17625.674 6.7887 6.7887
0.0000
ELIST_CLASS.003C S 80453 0 201 0.0000 6.0809 4388.934 1.0564 1.0564
0.0000
ELIST_CLASS.0044 S 80419 0 110 0.0000 6.0783 4387.080 1.1299 1.1299
0.0000
tod_lock C 10229 0 0 0.0000 0.7731 558.020 0.2212 0.2212
0.0000
LDATA_CONTROL_LOCK.0000 D 1833 0 10 0.0000 0.1385 99.995 0.0204 0.0204
The first line indicates the maximum number of locks to report (100 in this case, but we show only 14
of the entries here) as specified by the -S 100 flag. The report also indicates that the entries are sorted
by the total number of acquisitions or passes, as specified by the -sa flag. The various Kernel locks and
PThread synchronizers are treated as two separate lists in this report, so the report would produce the
top 100 Kernel locks sorted by acquisitions, followed by the top 100 PThread synchronizers sorted by
acquisitions or passes.
The per-lock summary table consists of the following columns:
Item Descriptor
Lock Names, Class, or The name, class, or address of the lock, depending on whether the splat
Address command could map the address from a name file.
Type The type of the lock, identified by one of the following letters:
Q
A RunQ lock
S
An enabled simple kernel lock
D
A disabled simple kernel lock
C
A complex kernel lock
M
A PThread mutex
V
A PThread condition-variable
L
A PThread read/write lock
Acquisitions or Passes The number of times that the lock was acquired or the condition passed, during
the analysis interval.
Spins The number of times that the lock (or condition-variable) was spun on during
the analysis interval.
Wait or Transform The number of times that a thread was driven into a wait state for that lock
or condition-variable during the analysis interval. When Krlocks are enabled, a
simple lock never enters the wait state and this value represents the number
of Krlocks that the simple lock has allocated, which is the transform count of
simple locks.
Acqui- Miss Spin Transf. Busy Percent Held of Total Time Process
ThreadID sitions Rate Count Count Count CPU Elapse Spin Transf. ProcessID Name
~~~~~~~~ ~~~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~~~~ ~~~~~~~~~~~~~
775 11548 0.34 39 0 0 0.06 0.10 0.00 0.00 774 wait
35619 3 25.00 1 0 0 0.00 0.00 0.00 0.00 18392 sleep
31339 21 4.55 1 0 0 0.00 0.00 0.00 0.00 7364 java
35621 2 0.00 0 0 0 0.00 0.00 0.00 0.00 18394 locktrace
Total Acquisitions The number of times that the lock was acquired in the analysis interval. This
includes successful simple_lock_try calls.
Acq. holding krlock The number of acquisitions made by threads holding a Krlock.
Transform count The number of Krlocks that have been used (allocated and freed) by the simple
lock.
SpinQ The minimum, maximum, and average number of threads spinning on the lock,
whether executing or suspended, across the analysis interval.
Krlocks SpinQ The minimum, maximum, and average number of threads spinning on a Krlock
allocated by the simple lock, across the analysis interval.
PROD The associated Krlocks prod calls count.
CONFER SELF The confer to self calls count for the simple lock and the associated Krlocks.
CONFER TARGET The confer to target calls count for the simple lock and the associated Krlocks
CONFER ALL The confer to all calls count for the simple lock and the associated Krlocks.
HANDOFF The associated Krlocks handoff calls count.
The Lock Activity with Interrupts Enabled (milliseconds) and Lock Activity with Interrupts Disabled
(milliseconds) sections contain information on the time that each lock state is used by the locks.
The states that a thread can be in (with respect to a given simple or complex lock) are as follows:
Item Descriptor
(no lock reference) The thread is running, does not hold this lock, and is not attempting to acquire this
lock.
LOCK The thread has successfully acquired the lock and is currently executing.
LOCK with The thread has successfully acquired the lock, while holding the associated Krlock,
KRLOCK and is currently executing.
SPIN The thread is executing and unsuccessfully attempting to acquire the lock.
KRLOCK LOCK The thread has successfully acquired the associated Krlock and is currently
executing.
KRLOCK SPIN The thread is executing and unsuccessfully attempting to acquire the associated
Krlock.
The Lock Activity sections of the report measure the intervals of time (in milliseconds) that each thread
spends in each of the states for this lock. The columns report the number of times that a thread entered
the given state, followed by the maximum, minimum, and average time that a thread spent in the state
once entered, followed by the total time that all threads spent in that state. These sections distinguish
whether interrupts were enabled or disabled at the time that the thread was in the given state.
A thread can acquire a lock prior to the beginning of the analysis interval and release the lock during
the analysis interval. When the splat command observes the lock being released, it recognizes that the
lock had been held during the analysis interval up to that point and counts the time as part of the
state-machine statistics. For this reason, the state-machine statistics might report that the number of
times that the lock state was entered might actually be larger than the number of acquisitions of the lock
that were observed in the analysis interval.
RunQ locks are used to protect resources in the thread management logic. These locks are acquired a
large number of times and are only held briefly each time. A thread need not be executing to acquire or
release a RunQ lock. Further, a thread might spin on a RunQ lock, but it will not go into an UNDISP or
WAIT state on the lock. You will see a dramatic difference between the statistics for RunQ versus other
simple locks.
Acqui- Miss Spin Wait Busy Percent Held of Total Time Process
ThreadID sitions Rate Count Count Count CPU Elapse Spin Wait ProcessID Name
~~~~~~~~ ~~~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~~~~ ~~~~~~~~~~~~~
775 11548 0.34 39 0 0 0.06 0.10 0.00 0.00 774 wait
35619 3 25.00 1 0 0 0.00 0.00 0.00 0.00 18392 sleep
31339 21 4.55 1 0 0 0.00 0.00 0.00 0.00 7364 java
35621 2 0.00 0 0 0 0.00 0.00 0.00 0.00 18394 locktrace
SpinQ The minimum, maximum, and average number of threads spinning on the lock,
whether executing or suspended, across the analysis interval.
WaitQ The minimum, maximum, and average number of threads waiting on the lock,
across the analysis interval.
The Lock Activity with Interrupts Enabled (milliseconds) and Lock Activity with Interrupts Disabled
(milliseconds) sections contain information on the time that each lock state is used by the locks.
The states that a thread can be in (with respect to a given simple or complex lock) are as follows:
Item Descriptor
(no lock reference) The thread is running, does not hold this lock, and is not attempting to acquire this
lock.
LOCK The thread has successfully acquired the lock and is currently executing.
SPIN The thread is executing and unsuccessfully attempting to acquire the lock.
UNDISP The thread has become undispatched while unsuccessfully attempting to acquire
the lock.
WAIT The thread has been suspended until the lock comes available. It does not
necessarily acquire the lock at that time, but instead returns to a SPIN state.
PREEMPT The thread is holding this lock and has become undispatched.
A thread can acquire a lock prior to the beginning of the analysis interval and release the lock during
the analysis interval. When the splat command observes the lock being released, it recognizes that the
lock had been held during the analysis interval up to that point and counts the time as part of the
state-machine statistics. For this reason, the state-machine statistics can report that the number of times
that the lock state was entered might actually be larger than the number of acquisitions of the lock that
were observed in the analysis interval.
RunQ locks are used to protect resources in the thread management logic. These locks are acquired a
large number of times and are only held briefly each time. A thread need not be executing to acquire or
release a RunQ lock. Further, a thread might spin on a RunQ lock, but it will not go into an UNDISP or
WAIT state on the lock. You will see a dramatic difference between the statistics for RunQ versus other
simple locks.
The functions are ordered by the same sorting criterion as the locks, controlled by the -s option of splat.
Further, the number of functions listed is controlled by the -S parameter. The default is the top ten
functions.
Thread Detail
The Thread Detail report is obtained by using the -dt or -da options of splat.
At any point in time, a single thread is either running or it is not. When a single thread runs, it only runs
on one processor. Some of the composite statistics are measured relative to the cumulative processor
time when they measure activities that can happen simultaneously on more than one processor, and
the magnitude of the measurements can be proportional to the number of processors in the system.
In contrast, the thread statistics are generally measured relative to the elapsed real time, which is the
amount of time that a single processor spends processing and the amount of time that a single thread
spends in an executing or suspended state.
Item Descriptor
ThreadID The thread identifier.
Acquisitions The number of times that this thread acquired the lock.
Miss Rate The percentage of acquisition attempts by the thread that failed to secure the
lock.
Spin Count The number of unsuccessful attempts by this thread to secure the lock.
Transf. Count The number of times that a simple lock has allocated a Krlock, while a thread
was trying to acquire the simple lock.
Wait Count The number of times that this thread was forced to wait until the lock came
available.
Busy Count The number of simple_lock_try() calls that returned busy.
Percent Held of Total Consists of the following sub-fields:
Time CPU
The percentage of the elapsed real time that this thread executed while
holding the lock.
Elapse(d)
The percentage of the elapsed real time that this thread held the lock while
running or suspended.
Spin
The percentage of elapsed real time that this thread executed while
spinning on the lock.
Wait
The percentage of elapsed real time that this thread spent waiting on the
lock.
Process ID The Process identifier (only for simple and complex lock report).
Process Name Name of the process using the lock (only for simple and complex lock report).
Complex-Lock report
AIX Complex lock supports recursive locking, where a thread can acquire the lock more than once before
releasing it, as well as differentiating between write-locking, which is exclusive, from read-locking, which
is not exclusive.
This report begins with [AIX COMPLEX Lock]. Most of the entries are identical to the simple lock report,
while some of them are differentiated by read/write/upgrade. For example, the SpinQ and WaitQ statistics
include the minimum, maximum, and average number of threads spinning or waiting on the lock. They
also include the minimum, maximum, and average number of threads attempting to acquire the lock for
reading versus writing. Because an arbitrary number of threads can hold the lock for reading, the report
includes the minimum, maximum, and average number of readers in the LockQ that holds the lock.
A thread might hold a lock for writing; this is exclusive and prevents any other thread from securing the
lock for reading or for writing. The thread downgrades the lock by simultaneously releasing it for writing
and acquiring it for reading; this permits other threads to also acquire the lock for reading. The reverse of
this operation is an upgrade; if the thread holds the lock for reading and no other thread holds it as well,
the thread simultaneously releases the lock for reading and acquires it for writing. The upgrade operation
might require that the thread wait until other threads release their read-locks. The downgrade operation
does not.
A thread might acquire the lock to some recursive depth; it must release the lock the same number of
times to free it. This is useful in library code where a lock must be secured at each entry-point to the
library; a thread will secure the lock once as it enters the library, and internal calls to the library entry-
Mutex reports
The PThread mutex is similar to an AIX simple lock in that only one thread can acquire the lock, and is
like an AIX complex lock in that it can be held recursively.
In addition to the common header information and the [PThread MUTEX] identifier, this report lists the
following lock details:
Item Descriptor
Parent Thread Pthread id of the parent pthread.
creation time Elapsed time in seconds after the first event recorded in trace (if available).
deletion time Elapsed time in seconds after the first event recorded in trace (if available).
PID Process identifier.
Process Name Name of the process using the lock.
Call-chain Stack of called methods (if available).
Acquisitions The number of times that the lock was acquired in the analysis interval.
Miss Rate The percentage of attempts that failed to acquire the lock.
Spin Count The number of unsuccessful attempts to acquire the lock.
Wait Count The number of times that a thread was forced into a suspended wait state
waiting for the lock to come available.
Busy Count The number of trylock calls that returned busy.
Seconds Held This field contains the following sub-fields:
CPU
The total number of processor seconds that the lock was held by an
executing thread.
Elapse(d)
The total number of elapsed seconds that the lock was held, whether the
thread was running or suspended.
Acquisitions Miss Spin Count Wait Count Busy Percent Held of Total Time
PthreadID Write Read Rate Write Read Write Read Count CPU Elapse Spin Wait
~~~~~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~
772 0 207 78.70 0 765 0 796 0 11.58 15.13 29.69 23.21
515 765 0 1.80 14 0 14 0 0 80.10 80.19 49.76 23.08
258 0 178 3.26 0 6 0 5 0 12.56 17.10 10.00 20.02
Acquisitions Miss Spin Count Wait Count Busy Percent Held of Total Time
Function Name Write Read Rate Write Read Write Read Count CPU Elapse Spin Wait Return Address
Start Address Offset
^^^^^^^^^^^^^^^^^^^^ ^^^^^^ ^^^^^^ ^^^^^^ ^^^^^^ ^^^^^^ ^^^^^^ ^^^^^^ ^^^^^^ ^^^^^^ ^^^^^^ ^^^^^^ ^^^^^^
^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^
In addition to the common header information and the [PThread RWLock] identifier, this report lists the
following lock details:
Item Descriptor
Parent Thread Pthread id of the parent pthread.
creation time Elapsed time in seconds after the first event recorded in trace (if available).
deletion time Elapsed time in seconds after the first event recorded in trace (if available).
PID Process identifier.
Process Name Name of the process using the lock.
Call-chain Stack of called methods (if available).
Acquisitions The number of times that the lock was acquired in the analysis interval.
Miss Rate The percentage of attempts that failed to acquire the lock.
Spin Count The number of unsuccessful attempts to acquire the lock.
Wait Count The current PThread implementation does not force pthreads to wait for read/
write locks. This reports the number of times a thread, spinning on this lock, is
undispatched.
Seconds Held This field contains the following sub-fields:
CPU
The total number of processor seconds that the lock was held by an executing
pthread. If the lock is held multiple times by the same pthread, only one hold
interval is counted.
Elapse(d)
The total number of elapsed seconds that the lock was held by any pthread,
whether the pthread was running or suspended.
Note: The pthread and function details for read/write locks are similar to the mutex detail reports, except
that they break down the acquisition, spin, and wait counts by whether the lock is to be acquired for
reading or writing.
Condition-Variable report
The PThread condition-variable is a synchronizer, but not a lock. A PThread is suspended until a signal
indicates that the condition now holds.
In addition to the common header information and the [PThread CondVar] identifier, this report lists the
following details:
Item Descriptor
Passes The number of times that the condition was signaled to hold during the analysis
interval.
Fail Rate The percentage of times that the condition was tested and was not found to be
true.
Item Descriptor
PThreadID The PThread identifier.
Passes The number of times that this pthread was notified that the condition passed.
Fail Rate The percentage of times that the pthread checked the condition and did not find
it to be true.
Spin Count The number of times that the pthread checked the condition and did not find it to
be true.
Wait Count The number of times that this pthread was forced to wait until the condition
became true.
Percent Total Time This field contains the following sub-fields:
Spin
The percentage of elapsed real time that this pthread spun while testing the
condition.
Wait
The percentage of elapsed real time that this pthread spent waiting for the
condition to hold.
Item Descriptor
Function Name The name of the function that passed or attempted to pass this condition.
Passes The number of times that this function was notified that the condition passed.
Thread context
Optional Performance Monitor contexts can also be associated with each thread. The AIX operating
system and the Performance Monitor kernel extension automatically maintain sets of 64-bit counters for
each of these contexts.
POWERCOMPAT events
The POWERCOMPAT events provide a list of hardware events that are available for processor compatibility
modes and are used as a subset of the actual processor events.
You can use the processor compatibility modes to move logical partitions between systems that have
different processor types without upgrading the operating system environments in the logical partition.
The processor compatibility mode allows the destination system to provide the logical partition with a
subset of processor capabilities that are supported by the operating systems environment in the logical
partition.
The following hardware events are supported in the POWERCOMPAT compatibility mode for different
versions of the AIX operating system.
A system administrator can disable or provide privileged access to the PMU by running chdev command
on the sys0 device.
The following example shows how to use the chdev command to modify the PMU access.
1. To get the current PMU access
lsattr -E -l sys0 -a pmuaccess
pmuaccess priv controls the PMU mode of operations True
2. To modify the PMU access
chdev -lsys0 -a pmuaccess = [none|priv|all]
RBAC Privileges
libpmapi APIs can perform privileged operations, Only privileged users can run privileged operations
when sys0 pmuaccess attribute is set to priv.
The following provides information on the privileges required to use libpmapi calls.
PV_PMU_THREAD pm_start_mythread_*
pm_start_thread_*
pm_start_group_*
pm_tstart_mythread_*
pm_tstart_group_*
pm_tstart_thread_*
pm_get_Tdata_*
pm_get_data_*
pm_get_tdata_*
pm_get_program_*
pm_get_data_generic
pm_set_counter_frequency_*
PV_PMU_USERMODE pm_set_ebb_handler
pm_clear_ebb_handler
Security considerations
The system-level APIs calls are only available from the root user except when the process tree option
is used. In that case, a locking mechanism prevents calls being made from more than one process. This
mechanism ensures ownership of the API and exclusive access by one process from the time that the
system-level contexts are created until they are deleted.
Enabling the process tree option results in counting for only the calling process and its descendants; the
default is to count all activities on each processor.
Because the system-level APIs would report bogus data if thread contexts where in use, system-level API
calls are not enabled at the same time as thread-level API calls. The allocation of the first thread context
will take the system-level API lock, which will not be released until the last context has been deallocated.
When using first party calls, a thread is only permitted to modify its own Performance Monitor context.
The only exception to this rule is when making group level calls, which obviously affect the group context,
but can also affect other threads' context. Deleting a group deletes all the contexts associated with the
group, that is, the caller context, the group context, and all the contexts belonging to all the threads in the
group.
Access to a Performance Monitor context not belonging to the calling thread or its group is available
only from the target process's debugger program. The third party API calls are only permitted when the
A system administrator can disable or provide privileged access to the PMU by running chdev command
on the sys0 device.
The following example shows how to use the chdev command to modify the PMU access.
1. To get the current PMU access
lsattr -E -l sys0 -a pmuaccess
pmuaccess priv controls the PMU mode of operations True
2. To modify the PMU access
chdev -lsys0 -a pmuaccess = [none|priv|all]
RBAC Privileges
PV_PMU_THREAD pm_start_mythread_*
pm_start_thread_*
pm_start_group_*
pm_tstart_mythread_*
pm_tstart_group_*
pm_tstart_thread_*
pm_get_Tdata_*
pm_get_data_*
pm_get_tdata_*
pm_get_program_*
pm_get_data_generic
pm_set_counter_frequency_*
PV_PMU_USERMODE pm_set_ebb_handler
pm_clear_ebb_handler
Item Descriptor
y A thresholdable event
g An event that can only be used in a group
G A thresholdable event that can only be used in a group
n A non-thresholdable event that is usable individually
On some platforms, use of event groups is required because all the events are marked g or G. Each of
the event groups that are returned includes a short name, a long name, and a description similar to those
associated with events, as well as a group identifier to be used in subsequent API calls and the events
contained in the group (in the form of an array of event identifiers).
The testing status of a group is defined as the lowest common denominator among the testing status of
the events that it includes. If at least one event has a testing status of caveat, the group testing status is
at best caveat, and if at least one event has a status of unverified, then the group status is unverified. This
is not returned as a group characteristic, but it is taken into account by the filter. Like events, only groups
with status matching the filter are returned.
Counter multi-mode
Counter multi-mode is similar to multiplexing mode. The counting mode in multiplexing mode is common
to all the event sets.
The multi-mode allows you to associate a counting mode with each event set, but as the counting mode
differs for an event set to another one, the results of the counting cannot be normalized on the complete
measurement interval.
Several basic pmapi calls have the following multi-mode variations indicated by the _mm suffix:
pm_set_program_mm
Sets the counting configuration. It differs from the pm_set_program_mx function in that it accepts a
set of groups and associated counting mode to be counted.
pm_get_program_mm
Retrieves the current Performance Monitor settings. It differs from the pm_get_program_mx function
in that it accepts a set of groups and associated counting mode.
WPAR counting
It is possible to monitor the system-wide activity of a specific WPAR from the Global WPAR. In this case,
only the activity of the processes running in this WPAR will be monitored.
Several basic pmapi calls have the following per-WPAR variations indicated by the _wp suffix:
pm_set_program_wp, pm_set_program_wp_mm
Same as the pm_set_program subroutine or the pm_set_program_mm subroutine, except that the
programming is set for the specified WPAR only (identified by its WPAR Configured ID). Notice that
there is no pm_set_program_wp_mx subroutine.
# include <pmapi.h>
main()
{
pm_info_t pminfo;
pm_prog_t prog;
pm_data_t data;
int filter = PM_VERIFIED; /* use only verified events */
pm_init(filter, &pminfo)
pm_set_program_mythread(&prog);
pm_start_mythread();
pm_stop_mythread();
pm_get_data_mythread(&data);
# include <pmapi.h>
main()
{
pm_info2_t pminfo;
pm_prog_t prog;
pm_groups_info_t pmginfo;
Get the information about all the event-groups for a specific processor example
The following example displays how to obtain all the event-groups that are supported for a specific
processor.
#include <stdio.h>
#include <stdlib.h>
#include <pmapi.h>
int main()
{
int rc = 0;
pm_info2_t events;
pm_groups_info_t groups;
int filter = 0;
/*
* Get the events and groups supported for POWER4.
* To get the events and groups supported for the current processor,
* use PM_CURRENT.
*/
int processor_type = PM_POWER4;
int group_idx = 0;
int counter_idx = 0;
int ev_count = 0;
int event_found = 0;
/*
* PM_VERIFIED - To get list of verified events
* PM_UNVERIFIED - To get list of unverified events
* PM_CAVEAT - To get list of events that are usable but with caveats
*/
filter |= PM_VERIFIED | PM_UNVERIFIED | PM_CAVEAT;
pm_initialize(filter);
(2) pm_get_program_pthread(pid, tid, ptid, &prog);
... display PM programmation ...
continue program
The preceding scenario would also work if the program being executed under the debugger did not have
any embedded Performance Monitor API calls. The only difference would be that the calls at (2) and (3)
would fail, and that when the program continues, it will be counting only event number 2 in counter 1, and
nothing in other counters.
main ()
{
pm_prog_t prog;
pm_wpar_ctx_info_t wp_list;
int nwpars = 1;
cid_t cid;
pm_start_wp(cid);
... workload ...
pm_stop_wp(cid);
Count all active WPARs from the Global WPAR and retrieve per-WPAR data
The following program is an example of a count of all active WPARS from the global WPAR and also
retrieves per-WPAR data.
main ()
{
pm_prog_t prog;
pm_wpar_ctx_info_t *wp_list;
int nwpars;
/* set programming */
...
prog.mode.b.wpar_all = 1; /* collect per-WPAR data */
pm_set_program(&prog);
pm_start();
... workload ...
pm_stop();
/* retrieve the number of WPARs that were active during the counting */
nwpars = 0;
pm_get_wplist(NULL, NULL, &nwpars);
/* allocate an array large enough to retrieve WPARs contexts */
wp_list = malloc(nwpars * sizeof (pm_wpar_ctx_info_t));
/* retrieve WPARs contexts */
pm_get_wplist(NULL, wp_list, &nwpars);
free(wp_list);
pm_delete_program();
}
# include <pmapi.h>
pm_data_t data2;
void *
doit(void *)
{
(1) pm_start_mythread();
pm_stop_mythread();
pm_get_data_mythread(&data2);
}
main()
{
pthread_t threadid;
pthread_attr_t attr;
pthread_addr_t status;
pm_program_mythread(&prog);
(2) pm_start_mythread();
pm_stop_mythread();
pm_get_data_mythread(&data);
pthread_join(threadid, &status);
In the preceding example, counting starts at (1) and (2) for the main and auxiliary threads respectively
because the initial counting state was off and it was inherited by the auxiliary thread from its creator.
main()
{
... same initialization as in previous example ...
(2) pm_start_mythread();
pm_stop_mythread();
pm_get_data_mythread(&data)
pthread_join(threadid, &status);
pm_get_data_mygroup(&data)
In the preceding example, the call in (2) is necessary because the call in (1) only turns on counting for the
group, not the individual threads in it. At the end, the group results are the sum of both threads results.
main()
{
pm_info2_t pminfo;
pm_groups_info_t pmginfo;
pm_prog_mx_r prog;
pm_events_prog_t event_set[2];
pm_data_mx_t data;
int filter = PM_VERIFIED; /* get list of verified events */
pm_initialize(filter, &pminfo, &pmginfo, PM_CURRENT )
prog.mode.w = 0; /* start with clean mode */
prog.mode.b.user = 1; /* count only user mode */
prog.mode.b.is_group = 1; /* specify event group */
prog.events_set = event_set;
prog.nb_events_prog = 2; /* two event group counted */
prog.slice_duration = 200; /* slice duration for each event group is 200ms */
for (i = 0; i < pminfo.maxpmcs; i++) {
event_set[0][i] = COUNT_NOTHING;
event_set[1][i] = COUNT_NOTHING;
}
main()
{
pm_info2_t pminfo;
pm_groups_info_t pmginfo;
pm_prog_mm_t prog;
pm_data_mx_t data;
pm_prog_t prog_set[2];
int filter = PM_VERIFIED; /* get list of verified events */
pm_initialize(filter, &pminfo, &pmginfo, PM_CURRENT );
prog.prog_set = prog_set;
prog.nb_set_prog = 2; /* two groups counted */
prog.slice_duration = 200; /* slice duration for each event group is 200ms */
prog_set[0].mode.w = 0; /* start with clean mode */
prog_set[0].mode.b.user = 1; /* grp 0: count only user mode */
prog_set[0].mode.b.is_group = 1; /* specify event group */
prog_set[0].mode.b.proctree = 1; /* turns process tree counting on:
this option is common to all counted groups */
prog_set[1].mode.w = 0; /* start with clean mode */
prog_set[1].mode.b.kernel = 1; /* grp 1: count only kernel mode */
prog_set[1].mode.b.is_group = 1; /* specify event group */
for (i = 0; i < pminfo.maxpmcs; i++) {
prog_set[0].events[i] = COUNT_NOTHING;
prog_set[1].events[i] = COUNT_NOTHING;
}
prog_set[0].events[0] = 1; /* count events in group 1 in the first set */
prog_set[1].events[0] = 3; /* count events in group 3 in the first set */
pm_set_program_mygroup_mm(&prog); /* create counting group */
pm_start_mygroup();
pthread_create(&threadid, &attr, doit, NULL);
pm_start_mythread();
... usefull work ....
pm_stop_mythread();
pm_get_data_mythread_mx(&data);
printf ("Main thread results:\n");
for (i = 0; i < 2 ; i++) {
group_number = prog_set[i].events[0];
printf ("Group #%d: %s\n", group_number,
main()
{
... same initialization as in previous example...
pm_stop_mythread()
pm_reset_data_mythread()
pthread_join(threadid, &status);
pm_get_data_mygroup(&data)
In the preceding example, the main thread and the group counting state are both on before the auxiliary
thread is created, so the auxiliary thread will inherit that state and start counting immediately.
At the end, data1 is equal to data because the pm_reset_data_mythread automatically subtracted the
main thread data from the group data to keep it consistent. In fact, the group data remains equal to the
sum of the auxiliary and the main thread data, but in this case, the main thread data is null.
MMCR0[PMCC] is set to 00
PMCs 1-6, MMCR0, MMCRA and MMCR2 registers are read only.
Access using pmc_read_1to4 , pmc_read_5to6 and mmcr_read returns 0
Access using pmc_write and mmcr_write returns -1
MMCR0[PMCC] is set to 00
PMCs 1-6, MMCR0, MMCRA and MMCR2 registers are read only.
Access using pmc_read_1to4 , pmc_read_5to6 and mmcr_read returns 0
Access using pmc_write and mmcr_write returns -1
• During LPM
Prior to the Mobility operation, any running PMU counting is stopped and MMCR0[PMCC] is set
to 00.
Post Mobility operation, PMCs 1-6, MMCR0, MMCRA and MMCR2 registers are read only.
Access using pmc_read_1to4 , pmc_read_5to6 and mmcr_read returns 0
Access using pmc_write and mmcr_write returns -1
Instead of using the libpmapi pragmas, if you use the mtspr and the mfspr instructions to access the PMU
registers, a SIGILL signal is generated for any write operations.
Sample programs are located in the /usr/samples/pmapi directory.
Related information
mmcr_read subroutine
mmcr_write subroutine
pmc_read_1to4 subroutine
pmc_read_5to6 subroutine
pmc_write subroutine
You can deactivate the procedure that attempts to remove measurement errors by setting the
HPM_WITH_MEASUREMENT_ERROR environment variable to TRUE (1).
Threaded applications
The T/tstart and T/tstop functions respectively start and stop the counters independently on each thread.
If two distinct threads use the same instID parameter, the output indicates multiple calls. However, the
counts are accumulated.
The instID parameter is always a constant variable or integer. It cannot be an expression because the
declarations in the libhpm.h, f_hpm.h, and f_hpm_i8.h header files that contain #define statements are
evaluated during the compiler pre-processing phase, which permits the collection of line numbers and
source file names.
HPM_EVENT_GROUP=pm_utilization:uk,pm_completion:u
To use the time slice functionality, specify a comma-separated list of sets instead of a single set
number. By default, the time slice duration for each set is 100 ms, but this can be modified with the
HPM_MX_DURATION environment variable. This value must be expressed in ms, and in the range 10 ms
to 30000 ms.
0 PM_LD_MISS_L2HIT
1 PM_TAG_BURSTRD_L2MISS
2 PM_TAG_ST_MISS_L2
3 PM_FPU0_DENORM
4 PM_LSU_IDLE
5 PM_LQ_FULL
6 PM_FPU_FMA
7 PM_FPU_IDLE
For POWER4 and later systems, the file contains the event group name, like in the following example:
pm_hpmcount1
Example
HPM_L2_LATENCY 12
HPM_EVENT_SET 5
MYDATE=$(date +"m%d:2/2/06M%S")
export HPM_OUTPUT_NAME=myprogram_$MYDATE
# pmlist -s
# pmlist -D -1 -p POWER5
Derived metrics supported:
PMD_UTI_RATE Utilization rate
PMD_MIPS MIPS
PMD_INST_PER_CYC Instructions per cycle
PMD_HW_FP_PER_CYC HW floating point instructions per Cycle
PMD_HW_FP_PER_UTIME HW floating point instructions / user time
PMD_HW_FP_RATE HW floating point rate
PMD_FX Total Fixed point operations
PMD_FX_PER_CYC Fixed point operations per Cycle
PMD_FP_LD_ST Floating point load and store operations
PMD_INST_PER_FP_LD_ST Instructions per floating point load/store
PMD_PRC_INST_DISP_CMPL % Instructions dispatched that completed
PMD_DATA_L2 Total L2 data cache accesses
PMD_PRC_L2_ACCESS % accesses from L2 per cycle
PMD_L2_TRAF L2 traffic
PMD_L2_BDW L2 bandwidth per processor
PMD_L2_LD_EST_LAT_AVG Estimated latency from loads from L2 (Average)
PMD_UTI_RATE_RC Utilization rate (versus run cycles)
PMD_INST_PER_CYC_RC Instructions per run cycle
PMD_LD_ST Total load and store operations
PMD_INST_PER_LD_ST Instructions per load/store
PMD_LD_PER_LD_MISS Number of loads per load miss
PMD_LD_PER_DTLB Number of loads per DTLB miss
PMD_ST_PER_ST_MISS Number of stores per store miss
PMD_LD_PER_TLB Number of loads per TLB miss
PMD_LD_ST_PER_TLB Number of load/store per TLB miss
PMD_TLB_EST_LAT Estimated latency from TLB miss
PMD_MEM_LD_TRAF Memory load traffic
PMD_MEM_BDW Memory bandwidth per processor
PMD_MEM_LD_EST_LAT Estimated latency from loads from memory
PMD_LD_LMEM_PER_LD_RMEM Number of loads from local memory per loads from remote
memory
PMD_PRC_MEM_LD_RC % loads from memory per run cycle
# hpmcount -m cpi_breakdown ls
bar foo
Workload context: ls (pid:42234)
Execution time (wall clock time): 0.004222 seconds
######## Resource Usage Statistics ########
Total amount of time in user mode : 0.001783 seconds
Total amount of time in system mode : 0.000378 seconds
Maximum resident set size : 220 Kbytes
Average shared memory use in text segment : 0 Kbytes*sec
Average unshared memory use in data segment : 0 Kbytes*sec
Number of page faults without I/O activity : 63
Number of page faults with I/O activity : 0
Number of times process was swapped out : 0
Number of times file system performed INPUT : 0
Number of times file system performed OUTPUT : 0
Number of IPC messages sent : 0
# hpmstat -s 7
Execution time (wall clock time): 1.003946 seconds
Counting mode: user
PM_TLB_MISS (TLB misses) : 260847
PM_CYC (Processor cycles) : 3013964331
PM_ST_REF_L1 (L1 D cache store references) : 161377371
PM_LD_REF_L1 (L1 D cache load references) : 255317480
PM_INST_CMPL (Instructions completed) : 1027391919
PM_RUN_CYC (Run cycles) : 1495147343
Derived metric group: default
Utilization rate : 181.243 %
Total load and store operations : 416.695 M
Instructions per load/store : 2.466
MIPS : 1023.354
Instructions per cycle : 0.341
# hpmstat -s 1,2 -d
Execution time (wall clock time): 2.129755 seconds
Set: 1
Counting duration: 1.065 seconds
PM_INST_CMPL (Instructions completed) : 244687
PM_FPU1_CMPL (FPU1 produced a result) : 0
PM_ST_CMPL (Store instruction completed) : 31295
PM_LD_CMPL (Loads completed) : 67414
PM_FPU0_CMPL (Floating-point unit produced a result) : 19
PM_CYC (Processor cycles) : 295427
PM_FPU_FMA (FPU executed multiply-add instruction) : 0
PM_TLB_MISS (TLB misses) : 788
Set: 2
Counting duration: 1.064 seconds
PM_TLB_MISS (TLB misses) : 379472
PM_ST_MISS_L1 (L1 D cache store misses) : 79943
PM_LD_MISS_L1 (L1 D cache load misses) : 307338
PM_INST_CMPL (Instructions completed) : 848578245
PM_LSU_IDLE (Cycles LSU is idle) : 229922845
PM_CYC (Processor cycles) : 757442686
PM_ST_DISP (Store instructions dispatched) : 125440562
PM_LD_DISP (Load instr dispatched) : 258031257
Counting mode: user
PM_TLB_MISS (TLB misses) : 380260
PM_ST_MISS_L1 (L1 D cache store misses) : 160017
PM_LD_MISS_L1 (L1 D cache load misses) : 615182
PM_INST_CMPL (Instructions completed) : 848822932
PM_LSU_IDLE (Cycles LSU is idle) : 460224933
PM_CYC (Processor cycles) : 757738113
PM_ST_DISP (Store instructions dispatched) : 251088030
PM_LD_DISP (Load instr dispatched) : 516488120
PM_FPU1_CMPL (FPU1 produced a result) : 0
PM_ST_CMPL (Store instruction completed) : 62582
PM_LD_CMPL (Loads completed) : 134812
PM_FPU0_CMPL (Floating-point unit produced a result) : 38
PM_FPU_FMA (FPU executed multiply-add instruction) : 0
Derived metric group: default
Utilization rate : 189.830 %
% TLB misses per cycle : 0.050 %
number of loads per TLB miss : 0.355
Total l2 data cache accesses : 0.775 M
% accesses from L2 per cycle : 0.102 %
L2 traffic : 47.276 MBytes
L2 bandwidth per processor : 44.431 MBytes/sec
Total load and store operations : 0.197 M
Instructions per load/store : 4300.145
number of loads per load miss : 839.569
number of stores per store miss : 1569.133
number of load/stores per D1 miss : 990.164
L1 cache hit rate : 0.999 %
% Cycles LSU is idle : 30.355 %
MIPS : 199.113
Instructions per cycle : 1.120
#include <sys/wait.h>
#include <unistd.h>
#include <stdio.h>
#include <libhpm.h>
void
do_work()
{
pid_t p, wpid;
int i, status;
float f1 = 9.7641, f2 = 2.441, f3 = 0.0;
p = fork();
if (p == -1) {
perror("Mike fork error");
exit(1);
}
if (p == 0) {
i = execl("/usr/bin/sh", "sh", "-c", "ls -R / 2>&1 >/dev/null", 0);
perror("Mike execl error");
exit(2);
}
else
wpid = waitpid(p, &status, WUNTRACED | WCONTINUED);
if (wpid == -1) {
perror("Mike waitpid error");
exit(3);
}
return;
}
hpmInit(taskID, "my_program");
hpmStart(1, "outer call");
do_work();
hpmStart(2, "inner call");
do_work();
hpmStop(2);
hpmStop(1);
hpmTerminate(taskID);
}
#include "f_hpm.h"
Fortran programs call functions that include the f_ prefix, as you can see in the following example:
!$OMP PARALLEL
!$OMP&PRIVATE (instID)
instID = 30+omp_get_thread_num()
call f_hpmtstart( instID, "computing meaning of life" )
!$OMP DO
do ...
do_work()
end do
call f_hpmtstop( instID )
!$OMP END PARALLEL
API characteristics
Five types of APIs are available. Global types return global metrics related to a set of components, while
individual types return metrics related to individual components. Both types of interfaces have similar
signatures, but slightly different behavior.
AIX supports different types of APIs such as WPAR and RSET. WPAR types return usage metrics related to
a set of components or individual components specific to a workload partition (WPAR). RSET types return
usage metrics of processors that belong to an RSET. With AIX Version 6.1 Technology Level (TL) 6, a new
type of APIs, called as NODE is available. The NODE types return usage metrics that re related to a set
of components or individual components specific to a remote node in a cluster. The perfstat_config
(PERFSTAT_ENABLE | PERFSTAT_CLUSTER_STATS, NULL) must be used to enable the remote node
statistics collection (that is available in a cluster environment).
All the interfaces return raw data; that is, values of running counters. Multiple calls must be made at
regular intervals to calculate rates.
Several interfaces return data retrieved from the ODM (object data manager) database. This information is
automatically cached into a dictionary that is assumed to be "frozen" after it is loaded. The perfstat_reset
subroutine must be called to clear the dictionary whenever the system configuration has changed. In
order to do a more selective reset, you can use the perfstat_partial_reset function. For more details, see
the “Cached metrics interfaces” on page 190 section.
Most types returned are unsigned long long; that is, unsigned 64 bit data.
Excessive and redundant calls to Perfstat APIs in a short time span can have a performance impact
because time-consuming statistics collected by them are not cached.
Global interfaces
Global interfaces report metrics related to a set of components on a system (such as processors, disks, or
memory).
The following are the global interfaces:
Item Descriptor
perfstat_cpu_total Retrieves global processor usage metrics
perfstat_memory_total Retrieves global memory usage metrics
perfstat_disk_total Retrieves global disk usage metrics
Note: This API does not return any data when started from an
application running inside WPAR.
Item Descriptor
perfstat_id_t *name Reserved for future use, must be NULL
perfstat_subsystem_total_t *userbuff A pointer to a memory area with enough space for the
returned structure
int sizeof_struct Should be set to sizeof(perfstat_subsystem_t)
int desired_number Reserved for future use, must be set to 0 or 1
The return value is -1 in case of errors. Otherwise, the number of structures copied is returned. This is
always 1.
The following sections provide examples of the type of data returned and code using each of the
interfaces.
#include <stdio.h>
#include <libperfstat.h>
input statistics:
number of packets : 306688
number of errors : 0
number of bytes : 24852688
output statistics:
number of packets : 63005
number of bytes : 11518591
number of errors : 0
The preceding program emulates ifstat's behavior and also shows how perfstat_netinterface_total is
used.
perfstat_cpu_total Interface
The perfstat_cpu_total interface returns a perfstat_cpu_total_t structure, which is defined in the
libperfstat.h file.
Selected fields from the perfstat_cpu_total_t structure include:
Item Descriptor
purr_coalescing PURR cycles consumes coalescing data if the calling partition is authorized to
see pool wide statistics, else set to zero.
spurr_coalescing SPURR cycles consumes coalescing data if the calling partition is authorized to
see pool wide statistics, else set to zero.
processorHz Processor speed in Hertz (from ODM)
description Processor type (from ODM)
CPUs Current number of active processors
ncpus_cfg Number of configured processors; that is, the maximum number of processors
that this copy of AIX can handle simultaneously
ncpus_high Maximum number of active processors; that is, the maximum number of active
processors since the last reboot
Note: Page coalescing is a transparent operation wherein the hypervisor detects duplicate pages, directs
all user reads to a single copy, and reclaims the other duplicate physical memory pages.
Several other processor-related counters (such as number of system calls, number of reads, write, forks,
execs, and load average) are also returned. For a complete list, see the perfstat_cpu_total_t section of
the libperfstat.h header file.
The following program emulates lparstat's behavior and also shows an example of how the
perfstat_cpu_total interface is used:
#include <stdio.h>
#include <sys/time.h>
#include <sys/errno.h>
#include <sys/proc.h>
#include <wpars/wparcfg.h>
#include <libperfstat.h>
#include <stdlib.h>
#define INTERVAL_DEFAULT 1
#define COUNT_DEFAULT 1
#define ACTIVE 0
#define NOTACTIVE 1
perfstat_id_wpar_t wparid;
perfstat_wpar_total_t wparinfo;
perfstat_wpar_total_t *wparlist;
cid_t cid;
/*
*Name: do_cleanup
* free all allocated data structures
*/
void do_cleanup(void)
{
if (wparlist)
free(wparlist);
/*
*Name: display_global_sysinfo_stat
* Function used when called from global.
* Gets all the system metrics using perfstat APIs and displays them
*
*/
void display_global_sysinfo_stat(void)
{
perfstat_cpu_total_t *cpustat,*cpustat_last;
perfstat_id_t first;
/* allocate memory for data structures and check for any error */
printf ("%10s %10s %10s %10s %10s %10s %10s %10s %10s %10s %10s %10s\n", "cswch", "scalls", "sread", "swrite", "fork", "exec",
"rchar", "wchar", "deviceint", "bwrite", "bread", "phread");
printf ("%10s %10s %10s %10s %10s %10s %10s %10s %10s %10s %10s %10s\n", "=====", "======", "=====", "======", "====", "====",
"=====", "=====", "=========", "======", "=====", "======");
while (count > 0){
sleep(interval);
if (perfstat_cpu_total(NULL ,cpustat, sizeof(perfstat_cpu_total_t), 1) <= 0){
perror("perfstat_cpu_total ");
exit(1);
}
/* print the difference between the old structure and new structure */
printf("%10llu %10llu %10llu %10llu %10llu %10llu %10llu %10llu %10llu %10llu %10llu %10llu\n",(cpustat->pswitch - cpustat_last-
>pswitch),
(cpustat->syscall - cpustat_last->syscall), (cpustat->sysread - cpustat_last-
>sysread ),
(cpustat->syswrite - cpustat_last->syswrite),(cpustat->sysfork - cpustat_last-
>sysfork),
(cpustat->sysexec - cpustat_last->sysexec ), (cpustat->readch - cpustat_last->readch),
(cpustat->writech - cpustat_last->writech ),(cpustat->devintrs - cpustat_last->devintrs),
(cpustat->bwrite - cpustat_last->bwrite), (cpustat->bread - cpustat_last->bread ),
(cpustat->phread - cpustat_last->phread ));
count--;
/*
*Name: display_wpar_sysinfo_stat
* Displays both wpar and global metrics
*
*/
void display_wpar_sysinfo_stat(void)
{
perfstat_wpar_total_t wparinfo;
perfstat_cpu_total_wpar_t cinfo_wpar, cinfo_wpar_last;
perfstat_cpu_total_t sysinfo, sysinfo_last;
/* display the difference between the current and old structure for the current wpar and system wide values*/
printf("%10s %10llu %10llu %10llu %10llu %10llu %10llu %10llu\n",wparinfo.name, (cinfo_wpar.pswitch - cinfo_wpar_last.pswitch),
(cinfo_wpar.syscall - cinfo_wpar_last.syscall), (cinfo_wpar.sysfork - cinfo_wpar_last.sysfork),
(cinfo_wpar.runque - cinfo_wpar_last.runque), (cinfo_wpar.swpque - cinfo_wpar_last.swpque),
(cinfo_wpar.runocc - cinfo_wpar_last.runocc), (cinfo_wpar.swpocc - cinfo_wpar_last.swpocc));
printf("%10s %10llu %10llu %10llu %10llu %10llu %10llu %10llu\n\n", "Global", (sysinfo.pswitch - sysinfo_last.pswitch),
(sysinfo.syscall - sysinfo_last.syscall), (sysinfo.sysfork - sysinfo_last.sysfork),
(sysinfo.runque - sysinfo_last.runque), (sysinfo.swpque - sysinfo_last.swpque),
(sysinfo.runocc - sysinfo_last.runocc), (sysinfo.swpocc - sysinfo_last.swpocc));
count--;
int display_wpar_total_sysinfo_stat(void)
{
int i, *status;
perfstat_wpar_total_t *wparinfo;
perfstat_cpu_total_wpar_t *cinfo_wpar, *cinfo_wpar_last;
/* allocate memory for the datastructures and check for any error */
status = (int *) calloc(totalwpar ,sizeof(int));
CHECK_FOR_MALLOC_NULL(status);
/*
*Name: showusage
* displays the usage message
*
*/
void showusage()
{
if (!cid)
printf("Usage:simplesysinfo [-@ { ALL | WPARNAME }] [interval] [count]\n ");
else
printf("Usage:simplesysinfo [interval] [count]\n");
exit(1);
}
/* NAME: main
* This function determines the interval, iteration count.
* Then it calls the corresponding functions to display
* the corresponding metrics
*/
if (argc > 2)
showusage();
if (argc){
if ((interval = atoi(argv[0])) <= 0)
showusage();
argc--;
}
if (argc){
if ((count = atoi(argv[1])) <= 0)
showusage();
}
}
do_cleanup();
return(0);
}
The program displays an output that is similar to the following example output:
cswch scalls sread swrite fork exec rchar wchar deviceint bwrite bread
phread
===== ====== ===== ====== ==== ==== ===== ===== ========= ====== =====
======
83 525 133 2 0 1 1009462 264 27 0
0 0
perfstat_memory_total Interface
The perfstat_memory_total interface returns a perfstat_memory_total_t structure, which is defined in
the libperfstat.h file.
Selected fields from the perfstat_memory_total_t structure include:
Item Descriptor
bytes_coalesced Number of bytes of the calling partition’s logical real memory coalesced
bytes_coalesced_mempool Number of bytes of logical real memory coalesced in the calling partition’s memory pool if the calling partition is authorized to see pool
wide statistics else, set to zero.
Note: Page coalescing is a transparent operation wherein the hypervisor detects duplicate pages, directs
all user reads to a single copy, and can reclaim other duplicate physical memory pages.
Several other memory-related metrics (such as amount of paging space paged in and out, and amount of
system memory) are also returned. For a complete list, see the perfstat_memory_total_t section of the
libperfstat.h header file in Files Reference.
The preceding program emulates vmstat's behavior and also shows an example of how the
perfstat_memory_total interface is used:
#include <stdio.h>
#include <libperfstat.h>
Memory statistics
-----------------
real memory size : 256 MB
reserved paging space : 512 MB
virtual memory size : 768 MB
number of free pages : 32304
number of pinned pages : 6546
number of pages in file cache : 12881
total paging space pages : 131072
free paging space pages : 129932
used paging space : 0.87%
number of paging space page ins : 0
number of paging space page outs : 0
The preceding program emulates vmstat's behavior and also shows how perfstat_memory_total is used.
perfstat_disk_total Interface
The perfstat_disk_total interface returns a perfstat_disk_total_t structure, which is defined in the
libperfstat.h file.
Selected fields from the perfstat_disk_total_t structure include:
Item Descriptor
number Number of disks
size Total disk size (in MB)
free Total free disk space (in MB)
xfers Total transfers to and from disk (in KB)
Several other disk-related metrics, such as number of blocks read from and written to disk, are also
returned. For a complete list, see the perfstat_disk_total_t section in the libperfstat.h header file in Files
Reference.
The following code shows an example of how perfstat_disk_total is used:
#include <stdio.h>
#include <libperfstat.h>
The preceding program emulates iostat's behavior and also shows how perfstat_disk_total is used.
perfstat_netinterface_total Interface
The perfstat_netinterface_total interface returns a perfstat_netinterface_total_t structure, which is
defined in the libperfstat.h file.
Selected fields from the perfstat_netinterface_total_t structure include:
Several other network interface-related metrics (such as number of bytes sent and received). For a
complete list, see the perfstat_netinterface_total_t section in the libperfstat.h header file in Files
Reference.
perfstat_partition_total Interface
The perfstat_partition_total interface returns a perfstat_partition_total_t structure, which is defined in
the libperfstat.h file.
Selected fields from the perfstat_partition_total_t structure include:
Item Descriptor
purr_coalescing PURR cycles consumes coalescing data if the calling partition is authorized
to see pool wide statistics, else set to zero
spurr_coalescing SPURR cycles consumes coalescing data if the calling partition is authorized
to see pool wide statistics, else set to zero
type Partition type
online_cpus Number of virtual processors currently allocated to the partition
online_memory Amount of memory currently allocated to the partition
Note: Page coalescing is a transparent operation wherein the hypervisor detects duplicate pages, directs
all user reads to a single copy, and reclaims duplicate physical memory pages
For a complete list, see the perfstat_partition_total_t section in the libperfstat.h header file.
The following code shows examples of how to use the perfstat_partition_total function.
The following example demonstrates how to emulate the lpartstat -i command:
#include <stdio.h>
#include <stdlib.h>
#include <libperfstat.h>
perfstat_partition_total_t pinfo;
int rc;
The program displays an output that is similar to the following example output:
The following example demonstrates emulating the lparstat command in default mode:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <libperfstat.h>
#include <sys/systemcfg.h>
#define INTERVAL_DEFAULT 2
#define COUNT_DEFAULT 10
#ifdef UTIL_AUTO
#define UTIL_MS 1
#define UTIL_PCT 0
#define UTIL_CORE 2
#define UTIL_PURR 0
#define UTIL_SPURR 1
void display_lpar_util_auto(int mode,int cpumode,int count,int interval);
#endif
void display_lpar_util(void);
if(collect_remote_node_stats)
{ /* perfstat_config needs to be called to enable cluster statistics collection */
rc = perfstat_config(PERFSTAT_ENABLE|PERFSTAT_CLUSTER_STATS, NULL);
if (rc == -1)
{
perror("cluster statistics collection is not available");
exit(-1);
}
}
#ifdef UTIL_AUTO
printf("Enter CPU mode.\n");
printf(" 0 PURR \n 1 SPURR \n");
scanf("%d",&cpumode);
printf("Enter print mode.\n");
printf(" 0 PERCENTAGE\n 1 MILLISECONDS\n 2 CORES \n");
scanf("%d",&mode);
if((mode>2)&& (cpumode>1))
{
#else
/* Iterate "count" times */
while (count > 0)
{
display_lpar_util();
sleep(interval);
count--;
}
#endif
if(collect_remote_node_stats)
{ /* Now disable cluster statistics by calling perfstat_config */
perfstat_config(PERFSTAT_DISABLE|PERFSTAT_CLUSTER_STATS, NULL);
}
return(0);
}
last_pcpu_user = lparstats->puser;
last_pcpu_sys = lparstats->psys;
last_pcpu_idle = lparstats->pidle;
last_pcpu_wait = lparstats->pwait;
last_lcpu_user = cpustats->user;
last_lcpu_sys = cpustats->sys;
last_lcpu_idle = cpustats->idle;
last_lcpu_wait = cpustats->wait;
last_busy_donated = lparstats->busy_donated_purr;
last_idle_donated = lparstats->idle_donated_purr;
last_busy_stolen = lparstats->busy_stolen_purr;
last_idle_stolen = lparstats->idle_stolen_purr;
}
printf("\n%5s %5s %6s %6s %5s %5s %5s %5s %4s %5s",
"-----", "----", "-----", "-----", "-----", "-----", "-----", "---", "----", "-----");
} else {
printf("\n%5s %5s %6s %6s %5s %5s %5s %4s %5s",
"%user", "%sys", "%wait", "%idle", "physc", "%entc", "lbusy", "vcsw", "phint");
disp_util_header = 0;
/* first iteration, we only read the data, print the header and save the data */
save_last_values(&cpustats, &lparstats);
return;
}
/* calculate physcial processor tics during the last interval in user, system, idle and wait mode */
delta_pcpu_user = lparstats.puser - last_pcpu_user;
delta_pcpu_sys = lparstats.psys - last_pcpu_sys;
delta_pcpu_idle = lparstats.pidle - last_pcpu_idle;
delta_pcpu_wait = lparstats.pwait - last_pcpu_wait;
/* calculate clock tics during the last interval in user, system, idle and wait mode */
delta_lcpu_user = cpustats.user - last_lcpu_user;
delta_lcpu_sys = cpustats.sys - last_lcpu_sys;
delta_lcpu_idle = cpustats.idle - last_lcpu_idle;
delta_lcpu_wait = cpustats.wait - last_lcpu_wait;
/* calculate entitlement for this partition - entitled physical processors for this partition */
entitlement = (double)lparstats.entitled_proc_capacity / 100.0 ;
/* distributed unused physical processor tics amoung wait and idle proportionally to wait and idle in clock tics */
delta_pcpu_wait += unused_purr * ((double)delta_lcpu_wait / (double)(delta_lcpu_wait + delta_lcpu_idle));
delta_pcpu_idle += unused_purr * ((double)delta_lcpu_idle / (double)(delta_lcpu_wait + delta_lcpu_idle));
/* far SPLPAR, consider the entitled physical processor tics as the actual delta physical processor tics */
pcputime = entitled_purr;
}
else if (lparstats.type.b.donate_enabled) { /* if donation is enabled for this DLPAR */
/* calculate busy stolen and idle stolen physical processor tics during the last interval */
/* these physical processor tics are stolen from this partition by the hypervsior
* which will be used by wanting partitions */
delta_busy_stolen = lparstats.busy_stolen_purr - last_busy_stolen;
delta_idle_stolen = lparstats.idle_stolen_purr - last_idle_stolen;
/* calculate busy donated and idle donated physical processor tics during the last interval */
/* these physical processor tics are voluntarily donated by this partition to the hypervsior
* which will be used by wanting partitions */
delta_busy_donated = lparstats.busy_donated_purr - last_busy_donated;
delta_idle_donated = lparstats.idle_donated_purr - last_idle_donated;
/* add busy donated and busy stolen to the kernel bucket, as cpu
* cycles were donated / stolen when this partition is busy */
delta_pcpu_sys += delta_busy_donated;
delta_pcpu_sys += delta_busy_stolen;
/* distribute idle stolen to wait and idle proportionally to the logical wait and idle in clock tics, as
* cpu cycles were stolen when this partition is idle or in wait */
delta_pcpu_wait += delta_idle_stolen *
((double)delta_lcpu_wait / (double)(delta_lcpu_wait + delta_lcpu_idle));
delta_pcpu_idle += delta_idle_stolen *
((double)delta_lcpu_idle / (double)(delta_lcpu_wait + delta_lcpu_idle));
/* distribute idle donated to wait and idle proportionally to the logical wait and idle in clock tics, as
* cpu cycles were donated when this partition is idle or in wait */
delta_pcpu_wait += delta_idle_donated *
((double)delta_lcpu_wait / (double)(delta_lcpu_wait + delta_lcpu_idle));
delta_pcpu_idle += delta_idle_donated *
((double)delta_lcpu_idle / (double)(delta_lcpu_wait + delta_lcpu_idle));
/* add donated to the total physical processor tics for CPU usage calculation, as they were
* distributed to respective buckets accordingly */
pcputime += (delta_idle_donated + delta_busy_donated);
/* add stolen to the total physical processor tics for CPU usage calculation, as they were
* distributed to respective buckets accordingly */
pcputime += (delta_idle_stolen + delta_busy_stolen);
if (lparstats.type.b.pool_util_authority) {
/* Available physical Processor units available in the shared pool (app) */
printf("%5.2f ", (double)(lparstats.pool_idle_time - last_pit) /
XINTFRAC*(double)delta_time_base);
}
save_last_values(&cpustats, &lparstats);
}
#ifdef UTIL_AUTO
void display_lpar_util_auto(int mode,int cpumode,int count,int interval)
{
float user_core_purr,kern_core_purr,wait_core_purr,idle_core_purr;
float user_core_spurr,kern_core_spurr,wait_core_spurr,idle_core_spurr,sum_core_spurr;
disp_util_header = 0;
/* first iteration, we only read the data, print the header and save the data */
}
while(count)
{
collect_metrics (&oldt, &lparstats);
sleep(interval);
collect_metrics (&newt, &lparstats);
data.type = UTIL_CPU_TOTAL;
data.curstat = &newt; data.prevstat= &oldt;
data.sizeof_data = sizeof(perfstat_cpu_total_t);
data.cur_elems = 1;
data.prev_elems = 1;
rc = perfstat_cpu_util(&data, &util,sizeof(perfstat_cpu_util_t), 1);
if(rc <= 0)
{
perror("Error in perfstat_cpu_util");
exit(-1);
}
delta_time_base = util.delta_time;
switch(mode)
{
case UTIL_PCT:
printf(" %5.1f %5.1f %5.1f %5.1f %5.4f \n",util.user_pct,util.kern_pct,util.wait_pct,util.idle_pct,util.physical_consumed);
break;
case UTIL_MS:
user_ms_purr=((util.user_pct*delta_time_base)/100.0);
kern_ms_purr=((util.kern_pct*delta_time_base)/100.0);
wait_ms_purr=((util.wait_pct*delta_time_base)/100.0);
idle_ms_purr=((util.idle_pct*delta_time_base)/100.0);
if(cpumode==UTIL_PURR)
{
printf(" %llu %llu %llu %llu %5.4f\n",user_ms_purr,kern_ms_purr,wait_ms_purr,idle_ms_purr,util.physical_consumed);
}
else if(cpumode==UTIL_SPURR)
{
user_ms_spurr=(user_ms_purr*util.freq_pct)/100.0;
kern_ms_spurr=(kern_ms_purr*util.freq_pct)/100.0;
wait_ms_spurr=(wait_ms_purr*util.freq_pct)/100.0;
sum_ms=user_ms_spurr+kern_ms_spurr+wait_ms_spurr;
idle_ms_spurr=delta_time_base-sum_ms;
}
break;
case UTIL_CORE:
user_core_purr=((util.user_pct*util.physical_consumed)/100.0);
kern_core_purr=((util.kern_pct*util.physical_consumed)/100.0);
wait_core_purr=((util.wait_pct*util.physical_consumed)/100.0);
idle_core_purr=((util.idle_pct*util.physical_consumed)/100.0);
user_core_spurr=((user_core_purr*util.freq_pct)/100.0);
kern_core_spurr=((kern_core_purr*util.freq_pct)/100.0);
wait_core_spurr=((wait_core_purr*util.freq_pct)/100.0);
if(cpumode==UTIL_PURR)
{
printf("%5.4f %5.4f %5.4f %5.4f
%5.4f\n",user_core_purr,kern_core_purr,wait_core_purr,idle_core_purr,util.physical_consumed);
}
else if(cpumode==UTIL_SPURR)
{
sum_core_spurr=user_core_spurr+kern_core_spurr+wait_core_spurr;
idle_core_spurr=util.physical_consumed-sum_core_spurr;
default:
printf("In correct usage\n");
}
count--;
}
}
#endif
The program displays an output that is similar to the following example output:
perfstat_tape_total Interface
The perfstat_tape_total interface returns a perfstat_tape_total_t structure, which is defined in the
libperfstat.h file.
Selected fields from the perfstat_tape_total_t structure include:
Item Descriptor
number Total number of tapes
size Total size of all tapes(in MB)
free Total free portion of all tapes (in MB)
rxfers Total number of read transfers from/to tape
xfers Total number of transfers from/to tape
Several other tape-related metrics (such as number of bytes sent and received). For a complete list, see
the perfstat_tape_total section in the libperfstat.h header file.
The following code shows examples of how to use the perfstat_tape_total function.
#include <stdio.h>
#include <stdlib.h>
#include <libperfstat.h>
int main(){
perfstat_tape_total_t *tinfo;
int rc,i;
if(rc==0){
printf("No tape found on the system\n");
exit(-1);
}
return(0);
}
The preceding program emulates diskstat behavior and also shows how perfstat_tape_total is used.
perfstat_partition_config interface
The perfstat_partition_config interface returns a perfstat_partition_config_t structure,
which is defined in the libperfstat.h file.
The selected fields from the perfstat_partition_config_t structure include:
Item Descriptor
partitionname Partition name
processorFamily Processor type
processorModel Processor model
machineID Machine ID
processorMHz Processor clock speed in megahertz
numProcessors Number of configured physical processors in frame
OSName Name of operating system
OSVersion Version of operating system
OSBuild Build of operating system
lcpus Number of logical CPUs
smtthreads Number of SMT threads
drives Total number of drives
nw_adapters Total number of network adapters
vcpus Minimum, maximum, and online virtual CPUs
cpucap Minimum, maximum, and online CPU capacity
entitled_proc_capacity Number of processor units that this partition is entitled to receive
cpucap_weightage Variable processor capacity weightage
mem_weightage Variable memory capacity weightage
cpupool_weightage Pool weightage
activecpusinpool Count of physical CPUs in the shared processor pool to which the
partition belongs
sharedpcpu Number of physical processors allocated for the use of the shared
processor
maxpoolcap Maximum processor capacity of partition's pool
entpoolcap Entitled processor capacity of partition's pool
For a complete list, see the perfstat_partition_config_t section in the libperfstat.h header file.
The usage of the code for the perfstat_partition_config API is as follows:
#include <libperfstat.h>
==================Hardware Configuration==================
Processor Type = POWER_5
Processor Model = IBM,9133-55A
Machine ID = 061500H
==================Software Configuration==================
OS Name = AIX
OS Version = 7.1
OS Build = Feb 17 2011 15:57:15 1107A_71D
====================LPAR Configuration====================
Number of Logical CPUs = 2
Number of SMT Threads = 2
Number of Drives = 2
Number of NW Adapters = 2
bindid_to_lcpuid Subroutine
Purpose
Returns the logical CPU ID.
Library
perfstat library (libperfstat.a)
Description
The bindid_to_lcpuid subroutine returns the logical CPU identifier of the bind CPU identifier. This
subroutine is defined in the libperfstat.h file.
Parameters
Item Description
id Specifies the bind CPU identifier that needs to be converted into the logical CPU
ID.
Return values
Upon successful completion, the logical CPU ID is returned. If unsuccessful, a value of -1 is returned and
the errno parameter is set to the appropriate error code.
Error codes
Error Description
EINVAL One of the parameters is not valid.
lcpuid_to_bindid Subroutine
Purpose
Returns the bind CPU ID.
Library
perfstat library (libperfstat.a)
Syntax
#include <libperfstat.h>
cpu_t lcpuid_to_bindid (id)
cpu_t id;
Description
The lcpuid_to_bindid subroutine returns the bind CPU identifier of the logical CPU identifier. This
subroutine is defined in the libperfstat.h file.
Parameters
Item Description
id Specifies the logical CPU identifier that needs to be converted into the bind CPU
identifier.
Error codes
Error Description
EINVAL One of the parameters is not valid.
Component-Specific interfaces
Component-specific interfaces report metrics related to individual components on a system (such as a
processor, disk, network interface, or paging space).
All of the following AIX interfaces use the naming convention perfstat_subsystem, and use a common
signature:
Item Descriptor
perfstat_cpu Retrieves individual processor usage metrics
Note: This interface returns global values when called by an
application running inside WPAR.
The common signature used by all the component interfaces except perfstat_memory_page and
perfstat_hfistat_window is as follows:
Item Descriptor
perfstat_id_t *name Enter the name of the first component (for example hdisk2 for
perfstat_disk()) to obtain the statistics. A structure containing a char
* field is used instead of directly passing a char * argument to the
function to avoid allocation errors and to prevent the user from giving
a constant string as parameter. To start from the first component
of a subsystem, set the char* field of the name parameter to ""
(empty string). You can use macros such as FIRST_SUBSYSTEM (for
example, FIRST_CPU) defined in the libperfstat.h file.
The return value is -1 in case of error. Otherwise, the number of structures copied is returned. The field
name is either set to NULL or to the name of the next structure available.
An exception to this scheme is when name=NULL, userbuff=NULL and desired_number=0, the total
number of structures available is returned.
To retrieve all structures of a given type, find the number of structures and allocate the required memory
to hold the structures. You must then call the appropriate API to retrieve all structures in one call. Another
method is to allocate a fixed set of structures and repeatedly call the API to get the next set of structures,
each time passing the name returned by the previous call. Start the process with the name set to "" or
FIRST_SUBSYSTEM, and repeat the process.
Minimizing the number of API calls, and the number of system calls, leads to more efficient code, so the
two-call approach is preferred. Some of the examples shown in the following sections illustrate the API
usage using the two-call approach. The two-call approach causes large amount of memory allocation, the
multiple-call approach is sometimes used, and is illustrated in the following examples.
The following sections provide examples of the type of data returned and the code used for each of the
interfaces.
perfstat_bio_stats interface
The perfstat_bio_stats interface returns a set of structures of type perfstat_bio_dev_t, which is defined
in the libperfstat.h file. The API is successful only if the biostat tunable is set to 1. To set the biostat
tunable to 1, you must input the following command, raso -o biostat=1.
Selected fields from the perfstat_bio_dev_t structure include:
Item Descriptor
name Device name
devid 64-bit device ID
rbytes Number of bytes read
wbytes Number of bytes written
rerrs Number of read errors
werrs Number of write errors
rtime Aggregate time for reads
wtime Aggregate time for writes
nread Number of reads
nwrite Number of writes
The following program emulates blockio stats behavior and shows an example of how the
perfstat_bio_stats interface is used:
#include <stdio.h>
#include <stdlib.h>
#include <libperfstat.h>
# define STDEV_UPPERWORD(__x) (int)((__x & 0x0FFFFFFF00000000LL)>>32)
# define STDEV_LOWERWORD(__x) (int)(__x & 0xFFFFFFFF)
The program displays an output that is similar to the following example output:
Item Descriptor
name Logical processor name (cpu0, cpu1, ...)
user Number of clock ticks spent in user mode
sys Number of clock ticks spent in system (kernel) mode
idle Number of clock ticks spent idle with no I/O pending
wait Number of clock ticks spent idle with I/O pending
syscall Number of system call executed
Several other CPU-related metrics (such as number of forks, read, write, and execs) are also returned. For
a complete list, see the perfstat_cpu_t section in the libperfstat.h header.
The following code shows an example of how the perfstat_cpu interface is used:
#include <stdio.h>
#include <stdlib.h>
#include <libperfstat.h>
The program displays an output that is similar to the following example output:
In an environment where dynamic logical partitioning is used, the number of perfstat_cpu_t structures
available is equal to the ncpus_high field in the perfstat_cpu_total_t. This number represents the highest
index of any active processor since the last reboot. Kernel data structures holding performance metrics
for processors are not deallocated when processors are turned offline or moved to a different partition
and it stops updating the information. The CPUs field of the perfstat_cpu_total_t structure represents
the number of active processors, but the perfstat_cpu interface returns ncpus_high structures.
Applications can detect offline or moved processors by checking clock-tick increments. If the sum of the
user, sys, idle, and wait fields is identical for a given processor between two perfstat_cpu calls, that
processor has been offline for the complete interval. If the sum multiplied by 10 ms (the value of a clock
tick) does not match the time interval, the processor has not been online for the complete interval.
The preceding program emulates mpstat behavior and also shows how perfstat_cpu is used.
perfstat_bridgedadapters Interface
The perfstat_bridgedadapters interface returns a set of structures of type perfstat_seachildren_t,
which is defined in the libperfstat.h file.
The following program shows an example of how the perfstat_bridgedadapters interface is used:
/*
* NAME: showusage
* to display the usage
*
*/
int do_initialization(void)
{
perfstat_id_t first;
exit(1);
}
/* allocate enough memory for all the structures */
statp = (perfstat_netadapter_t *)malloc(tot * sizeof(perfstat_netadapter_t));
CHECK_FOR_MALLOC_NULL(statp);
statq = (perfstat_netadapter_t *)malloc(tot * sizeof(perfstat_netadapter_t));
CHECK_FOR_MALLOC_NULL(statq);
return(0);
}
/*
*Name: display_metrics
* collect the metrics and display them
*
*/
void display_metrics()
{
if(collect_remote_node_stats) {
strncpy(nodeid.u.nodename, nodename, MAXHOSTNAMELEN);
nodeid.spec = NODENAME;
/*
*Name: main
*
*/
if(collect_remote_node_stats)
{ /* perfstat_config needs to be called to enable cluster statistics collection */
rc = perfstat_config(PERFSTAT_ENABLE|PERFSTAT_CLUSTER_STATS, NULL);
if (rc == -1)
{
do_initialization();
display_metrics();
if(collect_remote_node_stats)
{ /* Now disable cluster statistics by calling perfstat_config */
perfstat_config(PERFSTAT_DISABLE|PERFSTAT_CLUSTER_STATS, NULL);
}
free(statp);
free(statq);
return 0;
}
perfstat_cpu_util interface
The perfstat_cpu_util interface returns a set of structures of type perfstat_cpu_util_t, which is
defined in the libperfstat.h file
The perfstat_cpu_util interface includes the following fields:
Item Descriptor
cpu_id Holds CPU ID
entitlement Partition's entitlement
user_pct Percentage of utilization in user mode
kern_pct Percentage of utilization in kernel mode
idle_pct Percentage of utilization in idle mode
wait_pct Percentage of utilization in wait mode
physical_busy Physical CPU is busy
physical_consumed Total CPUs consumed by the partition
freq_pct Average frequency over the last interval in percentage
entitlement_pct Percentage of entitlement used
busy_pct Percentage of entitlement busy
idle_donated_pct Percentage of idle cycles donated
busy_donated_pct Percentage of busy cycles donated
idle_stolen_pct Percentage of idle cycles stolen
busy_stolen_pct Percentage of busy cycles stolen
float l_user_pct Percentage of utilization in user mode in terms of the logical
processor ticks
float l_kern_pct Percentage of utilization in kernel mode in terms of the logical
processor ticks
float l_idle_pct Percentage of utilization in idle mode in terms of the logical
processor ticks
float l_wait_pct Percentage of utilization in wait mode in terms of the logical
processor ticks
u_longlong_t delta_time Percentage of the delta time in milliseconds for which the utilization
is evaluated
#include <libperfstat.h>
#define PERIOD 5
void main()
{
perfstat_cpu_total_t *newt, *oldt;
perfstat_cpu_util_t *util;
perfstat_rawdata_t data;
int rc;
oldt = (perfstat_cpu_total_t*)malloc(sizeof(perfstat_cpu_total_t)*1);
if(oldt==NULL){
perror ("malloc");
exit(-1);
}
newt = (perfstat_cpu_total_t*)malloc(sizeof(perfstat_cpu_total_t)*1);
if(newt==NULL){
perror ("malloc");
exit(-1);
}
util = (perfstat_cpu_util_t*)malloc(sizeof(perfstat_cpu_util_t)*1);
if(util==NULL){
perror ("malloc");
exit(-1);
}
The example code to calculate system utilization per CPU, and CPU utilization, by using the
perfstat_cpu_util interface follows:
#include <libperfstat.h>
#define PERIOD 5
void main()
{
perfstat_rawdata_t data;
perfstat_cpu_util_t *util;
perfstat_cpu_t *newt,*oldt;
perfstat_id_t id;
int i,cpu_count,rc;
data.cur_elems = cpu_count;
if(data.prev_elems != data.cur_elems)
{
perror("The number of CPUs has become different for defined period");
exit(-1);
}
/* allocate enough memory */
newt = (perfstat_cpu_t *)calloc(cpu_count,sizeof(perfstat_cpu_t));
util = (perfstat_cpu_util_t *)calloc(cpu_count,sizeof(perfstat_cpu_util_t));
if(newt == NULL || util == NULL)
{
perror("Memory Allocation Error");
exit(-1);
}
data.curstat = newt;
rc = perfstat_cpu(&id, newt, sizeof(perfstat_cpu_t), cpu_count);
if(rc <= 0)
{
perror("Error in perfstat_cpu");
exit(-1);
}
/* Calculate CPU Utilization Metrics*/
rc = perfstat_cpu_util(&data, util, sizeof(perfstat_cpu_util_t), cpu_count);
if(rc <= 0)
{
perror("Error in perfstat_cpu_util");
exit(-1);
}
printf("========= Per CPU Utilization Metrics =========\n");
printf("Utilization Metrics for a period of %d seconds\n",PERIOD);
printf("===============================================\n");
for ( i = 0;i<cpu_count;i++)
{
printf("Utilization metrics for CPU-ID = %s\n",util[i].cpu_id);
printf("User Percentage = %f\n",util[i].user_pct);
printf("System Percentage = %f\n",util[i].kern_pct);
printf("Idle Percentage = %f\n",util[i].idle_pct);
printf("Wait Percentage = %f\n",util[i].wait_pct);
printf("Physical Busy = %f\n",util[i].physical_busy);
printf("Physical Consumed = %f\n",util[i].physical_consumed);
printf("Freq Percentage = %f\n",util[i].freq_pct);
printf("Entitlement Used Percentage = %f\n",util[i].entitlement_pct);
printf("Entitlement Busy Percentage = %f\n",util[i].busy_pct);
printf("Idle Cycles Donated Percentage = %f\n",util[i].idle_donated_pct);
printf("Busy Cycles Donated Percentage = %f\n",util[i].busy_donated_pct);
printf("Idle Cycles Stolen Percentage = %f\n",util[i].idle_stolen_pct);
printf("Busy Cycles Stolen Percentage = %f\n",util[i].busy_stolen_pct);
printf("system percentage for logical cpu in ticks = %f\n",util[i].l_kern_pct);
printf("idle percentage for logical cpu in ticks = %f\n",util[i].l_idle_pct);
printf("wait percentage for logical cpu in ticks = %f\n",util[i].l_wait_pct);
printf("delta time in milliseconds = %llu \n",util[i].delta_time);
printf("\n\n");
}
printf("===========================================\n");
}
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <libperfstat.h>
#include <sys/systemcfg.h>
#define INTERVAL_DEFAULT 2
#define COUNT_DEFAULT 10
#ifdef UTIL_AUTO
#define UTIL_MS 1
#define UTIL_PCT 0
#define UTIL_CORE 2
#define UTIL_PURR 0
#define UTIL_SPURR 1
void display_lpar_util(void);
if(collect_remote_node_stats)
{ /* perfstat_config needs to be called to enable cluster statistics collection */
rc = perfstat_config(PERFSTAT_ENABLE|PERFSTAT_CLUSTER_STATS, NULL);
if (rc == -1)
{
perror("cluster statistics collection is not available");
exit(-1);
}
}
#ifdef UTIL_AUTO
printf("Enter CPU mode.\n");
printf(" 0 PURR \n 1 SPURR \n");
scanf("%d",&cpumode);
printf("Enter print mode.\n");
printf(" 0 PERCENTAGE\n 1 MILLISECONDS\n 2 CORES \n");
scanf("%d",&mode);
if((mode>2)&& (cpumode>1))
{
#else
/* Iterate "count" times */
while (count > 0)
{
display_lpar_util();
sleep(interval);
count--;
}
#endif
if(collect_remote_node_stats)
{ /* Now disable cluster statistics by calling perfstat_config */
perfstat_config(PERFSTAT_DISABLE|PERFSTAT_CLUSTER_STATS, NULL);
}
return(0);
}
last_pcpu_user = lparstats->puser;
last_lcpu_user = cpustats->user;
last_lcpu_sys = cpustats->sys;
last_lcpu_idle = cpustats->idle;
last_lcpu_wait = cpustats->wait;
last_busy_donated = lparstats->busy_donated_purr;
last_idle_donated = lparstats->idle_donated_purr;
last_busy_stolen = lparstats->busy_stolen_purr;
last_idle_stolen = lparstats->idle_stolen_purr;
}
printf("\n%5s %5s %6s %6s %5s %5s %5s %5s %4s %5s",
"-----", "----", "-----", "-----", "-----", "-----", "-----", "---", "----", "-----");
} else {
printf("\n%5s %5s %6s %6s %5s %5s %5s %4s %5s",
"%user", "%sys", "%wait", "%idle", "physc", "%entc", "lbusy", "vcsw", "phint");
disp_util_header = 0;
/* first iteration, we only read the data, print the header and save the data */
save_last_values(&cpustats, &lparstats);
return;
}
/* calculate physcial processor tics during the last interval in user, system, idle and wait mode */
delta_pcpu_user = lparstats.puser - last_pcpu_user;
delta_pcpu_sys = lparstats.psys - last_pcpu_sys;
delta_pcpu_idle = lparstats.pidle - last_pcpu_idle;
delta_pcpu_wait = lparstats.pwait - last_pcpu_wait;
/* calculate clock tics during the last interval in user, system, idle and wait mode */
delta_lcpu_user = cpustats.user - last_lcpu_user;
delta_lcpu_sys = cpustats.sys - last_lcpu_sys;
delta_lcpu_idle = cpustats.idle - last_lcpu_idle;
delta_lcpu_wait = cpustats.wait - last_lcpu_wait;
/* calculate entitlement for this partition - entitled physical processors for this partition */
entitlement = (double)lparstats.entitled_proc_capacity / 100.0 ;
/* distributed unused physical processor tics amoung wait and idle proportionally to wait and idle in clock tics */
delta_pcpu_wait += unused_purr * ((double)delta_lcpu_wait / (double)(delta_lcpu_wait + delta_lcpu_idle));
delta_pcpu_idle += unused_purr * ((double)delta_lcpu_idle / (double)(delta_lcpu_wait + delta_lcpu_idle));
/* far SPLPAR, consider the entitled physical processor tics as the actual delta physical processor tics */
pcputime = entitled_purr;
}
else if (lparstats.type.b.donate_enabled) { /* if donation is enabled for this DLPAR */
/* calculate busy stolen and idle stolen physical processor tics during the last interval */
/* these physical processor tics are stolen from this partition by the hypervsior
* which will be used by wanting partitions */
delta_busy_stolen = lparstats.busy_stolen_purr - last_busy_stolen;
delta_idle_stolen = lparstats.idle_stolen_purr - last_idle_stolen;
/* calculate busy donated and idle donated physical processor tics during the last interval */
/* these physical processor tics are voluntarily donated by this partition to the hypervsior
* which will be used by wanting partitions */
delta_busy_donated = lparstats.busy_donated_purr - last_busy_donated;
delta_idle_donated = lparstats.idle_donated_purr - last_idle_donated;
/* add busy donated and busy stolen to the kernel bucket, as cpu
* cycles were donated / stolen when this partition is busy */
delta_pcpu_sys += delta_busy_donated;
delta_pcpu_sys += delta_busy_stolen;
/* distribute idle stolen to wait and idle proportionally to the logical wait and idle in clock tics, as
* cpu cycles were stolen when this partition is idle or in wait */
delta_pcpu_wait += delta_idle_stolen *
((double)delta_lcpu_wait / (double)(delta_lcpu_wait + delta_lcpu_idle));
delta_pcpu_idle += delta_idle_stolen *
((double)delta_lcpu_idle / (double)(delta_lcpu_wait + delta_lcpu_idle));
/* distribute idle donated to wait and idle proportionally to the logical wait and idle in clock tics, as
* cpu cycles were donated when this partition is idle or in wait */
delta_pcpu_wait += delta_idle_donated *
((double)delta_lcpu_wait / (double)(delta_lcpu_wait + delta_lcpu_idle));
delta_pcpu_idle += delta_idle_donated *
((double)delta_lcpu_idle / (double)(delta_lcpu_wait + delta_lcpu_idle));
/* add donated to the total physical processor tics for CPU usage calculation, as they were
* distributed to respective buckets accordingly */
pcputime += (delta_idle_donated + delta_busy_donated);
/* add stolen to the total physical processor tics for CPU usage calculation, as they were
* distributed to respective buckets accordingly */
pcputime += (delta_idle_stolen + delta_busy_stolen);
if (lparstats.type.b.pool_util_authority) {
/* Available physical Processor units available in the shared pool (app) */
printf("%5.2f ", (double)(lparstats.pool_idle_time - last_pit) /
XINTFRAC*(double)delta_time_base);
}
save_last_values(&cpustats, &lparstats);
}
#ifdef UTIL_AUTO
void display_lpar_util_auto(int mode,int cpumode,int count,int interval)
{
float user_core_purr,kern_core_purr,wait_core_purr,idle_core_purr;
float user_core_spurr,kern_core_spurr,wait_core_spurr,idle_core_spurr,sum_core_spurr;
u_longlong_t user_ms_purr,kern_ms_purr,wait_ms_purr,idle_ms_purr,sum_ms;
u_longlong_t user_ms_spurr,kern_ms_spurr,wait_ms_spurr,idle_ms_spurr;
perfstat_rawdata_t data;
u_longlong_t delta_purr, delta_time_base;
double phys_proc_consumed, entitlement, percent_ent, delta_sec;
perfstat_partition_total_t lparstats;
static perfstat_cpu_total_t oldt,newt;
perfstat_cpu_util_t util;
int rc;
disp_util_header = 0;
/* first iteration, we only read the data, print the header and save the data */
}
while(count)
{
collect_metrics (&oldt, &lparstats);
sleep(interval);
collect_metrics (&newt, &lparstats);
data.type = UTIL_CPU_TOTAL;
data.curstat = &newt; data.prevstat= &oldt;
data.sizeof_data = sizeof(perfstat_cpu_total_t);
data.cur_elems = 1;
data.prev_elems = 1;
rc = perfstat_cpu_util(&data, &util,sizeof(perfstat_cpu_util_t), 1);
if(rc <= 0)
{
perror("Error in perfstat_cpu_util");
exit(-1);
}
delta_time_base = util.delta_time;
switch(mode)
{
case UTIL_PCT:
printf(" %5.1f %5.1f %5.1f %5.1f %5.4f \n",util.user_pct,util.kern_pct,util.wait_pct,util.idle_pct,util.physical_consumed);
break;
case UTIL_MS:
user_ms_purr=((util.user_pct*delta_time_base)/100.0);
kern_ms_purr=((util.kern_pct*delta_time_base)/100.0);
wait_ms_purr=((util.wait_pct*delta_time_base)/100.0);
idle_ms_purr=((util.idle_pct*delta_time_base)/100.0);
if(cpumode==UTIL_PURR)
{
printf(" %llu %llu %llu %llu %5.4f\n",user_ms_purr,kern_ms_purr,wait_ms_purr,idle_ms_purr,util.physical_consumed);
}
else if(cpumode==UTIL_SPURR)
{
user_ms_spurr=(user_ms_purr*util.freq_pct)/100.0;
kern_ms_spurr=(kern_ms_purr*util.freq_pct)/100.0;
wait_ms_spurr=(wait_ms_purr*util.freq_pct)/100.0;
sum_ms=user_ms_spurr+kern_ms_spurr+wait_ms_spurr;
idle_ms_spurr=delta_time_base-sum_ms;
}
break;
case UTIL_CORE:
user_core_purr=((util.user_pct*util.physical_consumed)/100.0);
kern_core_purr=((util.kern_pct*util.physical_consumed)/100.0);
wait_core_purr=((util.wait_pct*util.physical_consumed)/100.0);
idle_core_purr=((util.idle_pct*util.physical_consumed)/100.0);
if(cpumode==UTIL_PURR)
{
printf("%5.4f %5.4f %5.4f %5.4f
%5.4f\n",user_core_purr,kern_core_purr,wait_core_purr,idle_core_purr,util.physical_consumed);
}
else if(cpumode==UTIL_SPURR)
{
sum_core_spurr=user_core_spurr+kern_core_spurr+wait_core_spurr;
idle_core_spurr=util.physical_consumed-sum_core_spurr;
default:
printf("In correct usage\n");
return;
}
count--;
}
}
#endif
The program displays an output that is similar to the following example output:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <libperfstat.h>
#include <errno.h>
#include <wpars/wparcfg.h>
static int disp_util_header = 1;
/* #define UTIL_AUTO */
#ifdef UTIL_AUTO
#define UTIL_MS 1
#define UTIL_PCT 0
#define UTIL_CORE 2
#define UTIL_PURR 0
#define UTIL_SPURR 1
void display_metrics_global_auto(int mode,int cpumode,int count,int interval);
#endif
/* Convert 4K pages to MB */
#define AS_MB(X) ((X) * 4096/1024/1024)
/* For WPAR, use NULL else use the actual WPAR ID (for global) */
#define INTERVAL_DEFAULT 1
#define COUNT_DEFAULT 1
void initialise(void)
{
totalcinfo = (perfstat_cpu_total_t *)malloc(sizeof(perfstat_cpu_total_t));
CHECK_FOR_MALLOC_NULL(totalcinfo);
/*
* NAME: display_metrics_global
* used to display the metrics when called from global
*
*/
void display_metrics_global(void)
{
int i;
perfstat_id_t first;
strcpy(first.name, FIRST_CPU);
if(nflag){
strncpy(nodeid.u.nodename, nodename, MAXHOSTNAMELEN);
nodeid.spec = NODENAME;
if (perfstat_cpu_total_node(&nodeid, totalcinfo_last, sizeof(perfstat_cpu_total_t), 1) <= 0){
perror("perfstat_cpu_total_node:");
exit(1);
}
while(count)
{
sleep(interval);
if(nflag){
if (perfstat_cpu_total_node(&nodeid, totalcinfo, sizeof(perfstat_cpu_total_t), 1) <= 0){
perror("perfstat_cpu_total_node:");
exit(1);
}
printf("%s\t%#4.1f\t%#4.1f\t%#4.1f\t%#4.1f\t%#4.1d\n",cinfo[i].name,
((double)(delta_user)/(double)(delta_total) * 100.0),
((double)(delta_sys)/(double)(delta_total) * 100.0),
((double)(delta_wait)/(double)(delta_total) * 100.0),
((double)(delta_idle)/(double)(delta_total) * 100.0),
cinfo[i].state);
}
delta_user = totalcinfo->puser - totalcinfo_last->puser;
delta_sys = totalcinfo->psys - totalcinfo_last->psys;
delta_wait = totalcinfo->pwait - totalcinfo_last->pwait;
delta_idle = totalcinfo->pidle - totalcinfo_last->pidle;
delta_total= delta_user + delta_sys + delta_idle + delta_wait;
printf("%s\t%#4.1f\t%#4.1f\t%#4.1f\t%#4.1f\n\n","ALL",((double)(delta_user)/(double)(delta_total) * 100.0),
((double)(delta_sys)/(double)(delta_total) * 100.0),
((double)(delta_wait)/(double)(delta_total) * 100.0),
((double)(delta_idle)/(double)(delta_total) * 100.0));
count--;
save_last_values();
}
}
/*
*NAME: display_metrics_wpar
* used to display the metrics when called from wpar
*
*/
void display_metrics_wpar(void)
{
int i;
char last[5];
perfstat_id_wpar_t first;
/*first.spec = WPARNAME;*/
strcpy(first.name,NULL );
if (perfstat_wpar_total( NULL, &winfo, sizeof(perfstat_wpar_total_t), 1) <= 0){
perror("perfstat_wpar_total:");
exit(1);
}
while(count)
{
sleep(interval);
printf("%s\t%#4.1f\t%#4.1f\t%#4.1f\t%#4.1f\n",cinfo[i].name,((double)(delta_user)/(double)(delta_total) *
100.0),
((double)(delta_sys)/(double)(delta_total) * 100.0),
((double)(delta_wait)/(double)(delta_total) * 100.0),
((double)(delta_idle)/(double)(delta_total) * 100.0));
}
if (winfo.type.b.cpu_rset)
strcpy(last,"RST");
else
strcpy(last,"ALL");
printf("%s\t%#4.1f\t%#4.1f\t%#4.1f\t%#4.1f\n\n",last,((double)(delta_user)/(double)(delta_total) * 100.0),
((double)(delta_sys)/(double)(delta_total) * 100.0),
((double)(delta_wait)/(double)(delta_total) * 100.0),
((double)(delta_idle)/(double)(delta_total) * 100.0));
count--;
save_last_values();
}
/*
* NAME: display_metrics_wpar_from_global
* display metrics of wpar when called from global
*
*/
void display_metrics_wpar_from_global(void)
{
char last[5];
int i;
if (perfstat_wpar_total( &wparid, &winfo, sizeof(perfstat_wpar_total_t), 1) <= 0){
perror("perfstat_wpar_total:");
exit(1);
}
if (winfo.type.b.cpu_rset)
strcpy(last,"RST");
else
strcpy(last,"ALL");
strcpy(wparid.u.wparname,wpar);
printf("\n cpu\tuser\tsys\twait\tidle\n\n");
while(count)
{
sleep(interval);
printf("%s\t%#4.1f\t%#4.1f\t%#4.1f\t%#4.1f\n",cinfo[i].name,((double)(delta_user)/(double)(delta_total) * 100.0),
((double)(delta_sys)/(double)(delta_total) * 100.0),
((double)(delta_wait)/(double)(delta_total) * 100.0),
((double)(delta_idle)/(double)(delta_total) * 100.0));
}
count--;
save_last_values();
}
#ifdef UTIL_AUTO
void display_metrics_global_auto(int mode,int cpumode,int count,int interval)
{
float user_core_purr,kern_core_purr,wait_core_purr,idle_core_purr;
float user_core_spurr,kern_core_spurr,wait_core_spurr,idle_core_spurr,sum_core_spurr;
u_longlong_t user_ms_purr,kern_ms_purr,wait_ms_purr,idle_ms_purr,sum_ms;
u_longlong_t user_ms_spurr,kern_ms_spurr,wait_ms_spurr,idle_ms_spurr;
perfstat_rawdata_t data;
u_longlong_t delta_purr;
double phys_proc_consumed, entitlement, percent_ent, delta_sec;
perfstat_partition_total_t lparstats;
static perfstat_cpu_t *oldt,*newt;
perfstat_cpu_util_t *util;
int rc,cpu_count,i;
perfstat_id_t id;
while(count) {
/* first iteration, we only read the data, print the header and save the data */
}
cpu_count = perfstat_cpu(NULL, NULL,sizeof(perfstat_cpu_t),0);
data.type = UTIL_CPU;
data.prevstat= oldt;
data.sizeof_data = sizeof(perfstat_cpu_t);
data.prev_elems = cpu_count;
sleep(interval);
/* Check how many perfstat_cpu_t structures are available after a defined period */
cpu_count = perfstat_cpu(NULL, NULL,sizeof(perfstat_cpu_t),0);
data.cur_elems = cpu_count;
if(data.prev_elems != data.cur_elems)
{
perror("The number of CPUs has become different for defined period");
exit(-1);
}
switch(mode)
{
case UTIL_PCT:
for(i=0;i<cpu_count;i++)
printf("%d %5.1f %5.1f %5.1f %5.1f %5.7f
\n",i,util[i].user_pct,util[i].kern_pct,util[i].wait_pct,util[i].idle_pct,util[i].physical_consumed);
break;
case UTIL_MS:
for(i=0;i<cpu_count;i++)
{
user_ms_purr=((util[i].user_pct*util[i].delta_time)/100.0);
kern_ms_purr=((util[i].kern_pct*util[i].delta_time)/100.0);
wait_ms_purr=((util[i].wait_pct*util[i].delta_time)/100.0);
idle_ms_purr=((util[i].idle_pct*util[i].delta_time)/100.0);
if(cpumode==UTIL_PURR)
{
printf("%d\t %llu\t %llu\t %llu\t %llu\t
%5.4f\n",i,user_ms_purr,kern_ms_purr,wait_ms_purr,idle_ms_purr,util[i].physical_consumed);
}
else if(cpumode=UTIL_SPURR)
}
}
break;
case UTIL_CORE:
for(i=0;i<cpu_count;i++)
{
user_core_purr=((util[i].user_pct*util[i].physical_consumed)/100.0);
kern_core_purr=((util[i].kern_pct*util[i].physical_consumed)/100.0);
wait_core_purr=((util[i].wait_pct*util[i].physical_consumed)/100.0);
idle_core_purr=((util[i].idle_pct*util[i].physical_consumed)/100.0);
user_core_spurr=((user_core_purr*util[i].freq_pct)/100.0);
kern_core_spurr=((kern_core_purr*util[i].freq_pct)/100.0);
wait_core_spurr=((wait_core_purr*util[i].freq_pct)/100.0);
if(cpumode==UTIL_PURR)
{
printf("%d %5.4f %5.4f %5.4f %5.4f
%5.4f\n",i,user_core_purr,kern_core_purr,wait_core_purr,idle_core_purr,util[i].physical_consumed);
}
else if(cpumode==UTIL_SPURR)
{
sum_core_spurr=user_core_spurr+kern_core_spurr+wait_core_spurr;
idle_core_spurr=util[i].physical_consumed-sum_core_spurr;
default:
printf("In correct usage\n");
return;
}
count--;
}
}
#endif
/*
*NAME: main
*
*/
cid = corral_getcid();
initialise();
display_configuration();
if(atflag)
display_metrics_wpar_from_global();
else if (cid)
display_metrics_wpar();
else
#ifdef UTIL_AUTO
printf("Enter CPU mode.\n");
printf(" 0 PURR \n 1 SPURR \n");
scanf("%d",&cpumode);
printf("Enter print mode.\n");
printf(" 0 PERCENTAGE\n 1 MILLISECONDS\n 2 CORES \n");
scanf("%d",&mode);
if((mode>2)&& (cpumode>1))
{
printf("Error: Invalid Input\n");
exit(0);
}
display_metrics_global_auto(mode,cpumode,count,interval);
#else
display_metrics_global();
#endif
if(nflag)
{ /* Now disable cluster statistics by calling perfstat_config */
perfstat_config(PERFSTAT_DISABLE|PERFSTAT_CLUSTER_STATS, NULL);
}
return(0);
}
The program displays an output that is similar to the following example output:
perfstat_diskadapter Interface
The perfstat_diskadapter interface returns a set of structures of type perfstat_diskadapter_t, which is
defined in the libperfstat.h file.
Selected fields from the perfstat_diskadapter_t structure include:
Several other disk adapter-related metrics (such as the number of blocks read from and written to the
adapter) are also returned. For a complete list, see the perfstat_diskadapter_t section in the libperfstat.h
header file.
The following program emulates the diskadapterstat behavior and also shows an example of how the
perfstat_diskadapter interface is used:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <libperfstat.h>
#include <errno.h>
#include <wpars/wparcfg.h>
/* Function prototypes */
/*
* NAME: do_initialization
* This function initializes the data structues.
* It also collects initial set of values.
*
* RETURNS:
* On successful completion:
* - returns 0.
* In case of error
* - exit with code 1.
*/
if (num_adapt == 0) {
printf("There are no disk adapters.\n");
exit(0);
}
if (num_adapt < 0) {
perror("perfstat_diskadapter: ");
exit(1);
}
return (0);
}
/*
*NAME: Showusage
* This function displays the usage
*/
/*
* NAME: do_cleanup
* This function frees the memory allocated for the perfstat structures.
*
*/
if (statq) {
free(statq);
}
}
/*
* NAME: collect_diskadapter_metrics
* This function collects the raw values in to
* the specified structures and derive the metrics from the
* raw values
*
*/
void collect_diskadapter_metrics(void)
{
perfstat_id_t first;
unsigned long long delta_read, delta_write,delta_xfers, delta_xrate;
if(collect_remote_node_stats) {
strncpy(nodeid.u.nodename, nodename, MAXHOSTNAMELEN);
nodeid.spec = NODENAME;
strcpy(nodeid.name, FIRST_DISKADAPTER);
rc = perfstat_diskadapter_node(&nodeid ,statq, sizeof(perfstat_diskadapter_t),num_adapt);
}
else {
strcpy(first.name, FIRST_DISKADAPTER);
rc = perfstat_diskadapter(&first ,statq, sizeof(perfstat_diskadapter_t),num_adapt);
}
printf("\n%-8s %7s %8s %8s %8s %8s\n", " Name ", " Disks ", " Size ", " Free ", " ARS ", " AWS ");
printf("%-8s %7s %8s %8s %8s %8s\n", "======", "======", "======", "======", "=====", "=====");
if(collect_remote_node_stats) {
rc = perfstat_diskadapter_node(&nodeid, statp, sizeof(perfstat_diskadapter_t), num_adapt);
}
else {
rc = perfstat_diskadapter(&first ,statp, sizeof(perfstat_diskadapter_t),num_adapt);
}
/*
*NAME: main
*
*/
if(collect_remote_node_stats)
{ /* perfstat_config needs to be called to enable cluster statistics collection */
rc = perfstat_config(PERFSTAT_ENABLE|PERFSTAT_CLUSTER_STATS, NULL);
if(collect_remote_node_stats)
{ /* Now disable cluster statistics by calling perfstat_config */
perfstat_config(PERFSTAT_DISABLE|PERFSTAT_CLUSTER_STATS, NULL);
}
return (0);
}
The program displays an output that is similar to the following example output:
perfstat_disk Interface
The perfstat_disk interface returns a set of structures of type perfstat_disk_t, which is defined in the
libperfstat.h file.
Selected fields from the perfstat_disk_t structure include:
Item Descriptor
name Disk name (from ODM)
description Disk description (from ODM)
vgname Volume group name (from ODM)
size Disk size (in MB)
free Free space (in MB)
xfers Transfers to/from disk (in KB)
Several other disk-related metrics (such as number of blocks read from and written to disk, and adapter
names) are also returned. For a complete list, see the perfstat_disk_t section in the libperfstat.h header
file in Files Reference.
The following program emulates diskstat behavior and also shows an example of how the perfstat_disk
interface is used:
#include <stdio.h>
#include <stdlib.h>
#include <libperfstat.h>
perfstat_diskpath Interface
The perfstat_diskpath interface returns a set of structures of type perfstat_diskpath_t, which is defined
in the libperfstat.h file.
Selected fields from the perfstat_diskpath_t structure include:
Item Descriptor
name Path name (<disk_name>_Path<path_id>)
xfers Total transfers through this path (in KB)
Several other disk path-related metrics (such as the number of blocks read from and written through
the path) are also returned. For a complete list, see the perfstat_diskpath_t section in the libperfstat.h
header file.
The following code shows an example of how the perfstat_diskpath interface is used:
#include <stdio.h>
#include <stdlib.h>
#include <libperfstat.h>
perror("perfstat_diskpath");
exit(-1);
}
if (tot == 0)
{
perror("perfstat_diskpath");
exit(-1);
}
if (ret <= 0)
{
perror("perfstat_diskpath");
exit(-1);
}
The program displays an output that is similar to the following example output:
perfstat_fcstat Interface
The perfstat_fcstat interface returns a set of structures of type perfstat_fcstat_t, which is defined in the
libperfstat.h file.
The following program is an example of how the perfstat_fcstat interface is used:
/* The sample program displays the metrics *
* related to every Individual *
* Fiber Channel adapter in the LPAR */
#include <stdio.h>
#include <stdlib.h>
#include <libperfstat.h>
* NAME: do_initialization
* This function initializes the data structures.
* It also collects the initial set of values.
*
* RETURNS:
* On successful completion:
* - returns 0.
* In case of error
* - exits with code 1.
*/
int do_initialization(void)
{
/* check how many perfstat_fcstat_t structures are available */
if(collect_remote_node_stats) {
strncpy(nodeid.u.nodename, nodename, MAXHOSTNAMELEN);
nodeid.spec = NODENAME;
tot = perfstat_fcstat_node(&nodeid, NULL, sizeof(perfstat_fcstat_t), 0)
;
}
else if(fc_flag == 1 && wwpn_flag == 1)
{
tot = perfstat_fcstat_wwpn(NULL, NULL, sizeof(perfstat_fcstat_t), 0);
if(tot >= 1)
{
tot = 1;
}
else
{
printf("There is no FC adapter \n");
exit(-1);
}
}
else
{
tot = perfstat_fcstat(NULL, NULL, sizeof(perfstat_fcstat_t), 0);
}
if (tot <= 0) {
printf("There is no FC adapter\n");
exit(0);
}
/*
*Name: display_metrics
* collect the metrics and display them
*
*/
void display_metrics()
{
perfstat_id_t first;
perfstat_wwpn_id_t wwpn;
int ret=0, i=0;
if(collect_remote_node_stats) {
strncpy(nodeid.u.nodename, nodename, MAXHOSTNAMELEN);
nodeid.spec = NODENAME;
strcpy(nodeid.name , FIRST_NETINTERFACE);
ret = perfstat_fcstat_node(&nodeid, statq, sizeof(perfstat_fcstat_t), tot);
} else if((fc_flag == 1) && (wwpn_flag == 1)) {
}
memcpy(statq, statp, (tot * sizeof(perfstat_fcstat_t)));
count--;
}
}
/*
*Name: main
*
*/
if((fc_flag == 1))
{
if(fcadapter_name == NULL )
{
fprintf(stderr, "FC adapter Name should not be NULL");
exit(-1);
}
}
if(wwpn_flag == 1)
{
if(wwpn_id < 0 )
{
fprintf(stderr, "WWPN id should not be negavite ");
if(collect_remote_node_stats)
{ /* perfstat_config needs to be called to enable cluster statistics collection */
rc = perfstat_config(PERFSTAT_ENABLE|PERFSTAT_CLUSTER_STATS, NULL);
if (rc == -1)
{
perror("cluster statistics collection is not available");
exit(-1);
}
}
do_initialization();
display_metrics();
if(collect_remote_node_stats)
{ /* Now disable cluster statistics by calling perfstat_config */
perfstat_config(PERFSTAT_DISABLE|PERFSTAT_CLUSTER_STATS, NULL);
}
free(statp);
free(statq);
return 0;
}
perfstat_hfistat_window Interface
The perfstat_hfistat_window interface returns a set of structures of type perfstat_hfistat_window_t,
which is defined in the libperfstat.h file.
Selected fields from the perfstat_hfistat_window_t structure include:
Item Descriptor
pkts_sent The number of packets sent (56 bit counter).
pkts_dropped_sending The number of packets that were dropped from
sending (40 bit counter).
pkts_received The number of the packets that were received (56
bit counter).
perfstat_hfistat Interface
The perfstat_hfistat interface returns a set of structures of type perfstat_hfistat_t, which is defined in
the libperfstat.h file.
Selected fields from the perfstat_hfistat_t structure include:
Item Descriptor
cycles_blocked_sending The cycles that are blocked from sending.
link_retries The number of retries at the Link Level.
pkts_sent The aggregate number of the packet sent.
pkts_dropped_sending The number of packets that were at the sent first in
first out (FIFO), but dropped (not sent), regardless
of window.
mmu_cache_hits The memory hits from the Nest Memory
Management Unit Cache.
mmu_cache_misses The hits that were missed from the Nest Memory
Management Unit Cache.
cycles_waiting_on_a_credit The cycles that are waiting on credit.
Item Descriptor
Ppsize Physical partition size (in MB)
Iocnt Number of read and write requests
Kbreads Number of kilobytes read
Kbwrites Number of kilobytes written
Several other paging-space-related metrics (such as name, type, and active) are also returned. For a
complete list of other paging-space-related metrics, see the perfstat_logicalvolume_t section in the
libperfstat.h header file in Files Reference.
Note: The perfstat_config (PERFSTAT_ENABLE | PERFSTAT_LV, NULL) must be used to enable the
logical volume statistical collection.
The following code shows an example of how the perfstat_logicalvolume interface is used:
#include <stdio.h>
#include <stdlib.h>
#include <libperfstat.h>
int main(){
int lv_count,i, rc;
perfstat_id_t first;
perfstat_logicalvolume_t *lv;
strcpy(first.name,NULL);
for(i=0;i<lv_count;i++){
printf("\n");
printf("Logical volume name=%s\n",lv[i].name);
printf("Volume group name=%s\n",lv[i].vgname);
printf("Physical partition size in MB=%lld\n",lv[i].ppsize);
printf("total number of logical paritions configured for this logical
volume=%lld\n",lv[i].logical_partitions);
printf("number of physical mirrors for each logical partition=%lu\n",lv[i].mirrors);
printf("Number of read and write requests=%lu\n",lv[i].iocnt);
The program displays an output that is similar to the following example output:
The preceding program emulates vmstat behavior and also shows how perfstat_logicalvolume is used.
perfstat_memory_page Interface
The perfstat_memory_page interface returns a set of structures of type perfstat_memory_page_t,
which is defined in the libperfstat.h file.
Selected fields from the perfstat_memory_page_t structure include:
Item Descriptor
psize Page size in bytes
real_total Amount of real memory (in units of psize)
real_freesiz Amount of free real memory (in units of psize)
e
real_pinned Amount of pinned memory (in units of psize multiplied by 4)
Pgins Number of pages paged in
Several other disk-adapter related metrics (such as the number of blocks read from and written to
the adapter) are also returned. For a complete list of other disk-adapter-related metrics, see the
perfstat_memory_page_t section in the libperfstat.h header file.
The following program shows an example of how the perfstat_memory_page interface is used:
#include <stdio.h>
#include <stdlib.h>
#include <libperfstat.h>
pagesize.psize = FIRST_PSIZE;
avail_psizes = perfstat_memory_page(&pagesize, psize_mem_values, sizeof(perfstat_memory_page_t),
total_psizes);
if(avail_psizes < 1)
{
perror("display_psize_memory_stats: Unable to retrieve memory "
"statistics for the available page sizes.");
exit(-1);
}
for(i=0;i<avail_psizes;i++){
printf("Page size in bytes=%llu\n",psize_mem_values[i].psize);
printf("Number of real memory frames of this page size=%lld\n",psize_mem_values[i].real_total);
printf("Number of pages on free list=%lld\n",psize_mem_values[i].real_free);
printf("Number of pages pinned=%lld\n",psize_mem_values[i].real_pinned);
printf("Number of pages in use=%lld\n",psize_mem_values[i].real_inuse);
printf("Number of page faults =%lld\n",psize_mem_values[i].pgexct);
printf("Number of pages paged in=%lld\n",psize_mem_values[i].pgins);
printf("Number of pages paged out=%lld\n",psize_mem_values[i].pgouts);
printf("\n");
}
return 0;
}
The program displays an output that is similar to the following example output:
perfstat_netbuffer Interface
The perfstat_netbuffer interface returns a set of structures of type perfstat_netbuffer_t, which is
defined in the libperfstat.h file.
Selected fields from the perfstat_netbuffer_t structure include:
Item Descriptoryes a
size Size of the allocation (string expressing size in bytes)
inuse Current allocation of this size
failed Failed allocation of this size
free Free list for this size
Several other allocation-related metrics (such as high-water mark and freed) are also returned. For a
complete list of other allocation-related metrics, see the perfstat_netbuffer_t section in the libperfstat.h
header file.
The following code shows an example of how the perfstat_netbuffer interface is used: The preceding
program produces the following output:
#include <stdio.h>
#include <stdlib.h>
#include <libperfstat.h>
The program displays an output that is similar to the following example output:
perfstat_netinterface Interface
The perfstat_netinterface interface returns a set of structures of type perfstat_netinterface_t, which is
defined in the libperfstat.h file.
Selected fields from the perfstat_netinterface_t structure include:
Several other network-interface related metrics (such as number of bytes sent and received, type,
and bitrate) are also returned. For a complete list of other network-interfaced related metrics, see the
perfstat_netinterface_t section in the libperfstat.h header file in Files Reference.
The following code shows an example of how perfstat_netinterface is used:
#include <stdio.h>
#include <stdlib.h>
#include <libperfstat.h>
#include <net/if_types.h>
char *
decode(uchar type) {
switch(type) {
case IFT_LOOP:
return("loopback");
case IFT_ISO88025:
return("token-ring");
case IFT_ETHER:
return("other");
}
perror("perfstat_netinterface");
exit(-1);
}
perror("perfstat_netinterface");
exit(-1);
}
input statistics:
number of packets : 306352
number of errors : 0
number of bytes : 24831776
output statistics:
number of packets : 62669
number of bytes : 11497679
number of errors : 0
input statistics:
number of packets : 336
number of errors : 0
number of bytes : 20912
output statistics:
number of packets : 336
number of bytes : 20912
number of errors : 0
The preceding program emulates diskadapterstat behavior and also shows how perfstat_netinterface is
used.
perfstat_netadapter Interface
The perfstat_netadpater interface returns a set of structures of type perfstat_netadapter_t, which is
defined in the libperfstat.h file.
Note: The perfstat_netadpater interface returns only the network Ethernet adapter statistics similar to
the entstat command.
The following program shows an example of how the perfstat_netadapter interface is used:
/* The sample program displays the metrics *
* related to every Individual *
* network adapter in the LPAR*/
#include <stdio.h>
#include <stdlib.h>
#include <libperfstat.h>
#include <net/if_types.h>
/* define default interval and count values */
#define INTERVAL_DEFAULT 1
#define COUNT_DEFAULT 1
/*
* NAME: showusage
* to display the usage
*
*/
/*
* NAME: do_initialization
* This function initializes the data structues.
* It also collects the initial set of values.
*
* RETURNS:
* On successful completion:
* - returns 0.
* In case of error
* - exits with code 1.
*/
int do_initialization(void)
{
/* check how many perfstat_netadapter_t structures are available */
if(collect_remote_node_stats) {
return(0);
}
/*
*Name: display_metrics
* collect the metrics and display them
*
*/
void display_metrics()
{
perfstat_id_t first;
int ret, i;
if(collect_remote_node_stats) {
strncpy(nodeid.u.nodename, nodename, MAXHOSTNAMELEN);
nodeid.spec = NODENAME;
strcpy(nodeid.name , FIRST_NETINTERFACE);
ret = perfstat_netadapter_node(&nodeid, statq, sizeof(perfstat_netadapter_t), tot);
}
else {
strcpy(first.name , FIRST_NETINTERFACE);
ret = perfstat_netadapter( &first, statq, sizeof(perfstat_netadapter_t), tot);
}
if (ret < 0){
free(statp);
free(statq);
perror("perfstat_netadapter: ");
exit(1);
}
while (count)
{
sleep (interval);
if(collect_remote_node_stats)
{
ret = perfstat_netadapter_node(&nodeid, statp, sizeof(perfstat_netadapter_t), tot);
}
else {
ret = perfstat_netadapter(&first, statp, sizeof(perfstat_netadapter_t), tot);
}
/* print statistics for each of the interfaces */
for (i = 0; i < ret; i++)
{
printf(" Adapter name: %s \n", statp[i].name);
printf(" ======================== Transmit Statistics=====================\n");
printf(" Transmit Packets: %lld \n",
statp[i].tx_packets - statq[i].tx_packets);
printf(" Transmit Bytes: %lld \n",
statp[i].tx_bytes - statq[i].tx_bytes);
printf(" Transfer Interrupts : %lld \n",
statp[i].tx_interrupts - statq[i].tx_interrupts);
printf(" Transmit Errors : %lld \n",
statp[i].tx_errors - statq[i].tx_errors);
printf(" Packets Dropped at the time of Data Transmission : %lld \n",
statp[i].tx_packets_dropped - statq[i].tx_packets_dropped);
printf(" Transmit Queue Size: %lld \n",
statp[i].tx_queue_size - statq[i].tx_queue_size);
printf(" Transmit Queue Length :%lld \n",
statp[i].tx_queue_len - statq[i].tx_queue_len);
printf(" Transmit Queue Overflow : %lld \n",
statp[i].tx_queue_overflow - statq[i].tx_queue_overflow);
printf(" Broadcast Packets Transmitted: %lld \n",
statp[i].tx_broadcast_packets - statq[i].tx_broadcast_packets);
printf(" Multicast packets Transmitted: %lld \n",
statp[i].tx_multicast_packets - statq[i].tx_multicast_packets);
printf(" Lost Carrier Sense signal count : %lld \n",
statp[i].tx_carrier_sense - statq[i].tx_carrier_sense);
}
memcpy(statq, statp, (tot * sizeof(perfstat_netadapter_t)));
count--;
}
}
/*
*Name: main
*
*/
do_initialization();
display_metrics();
if(collect_remote_node_stats)
{ /* Now disable cluster statistics by calling perfstat_config */
perfstat_config(PERFSTAT_DISABLE|PERFSTAT_CLUSTER_STATS, NULL);
}
free(statp);
free(statq);
return 0;
}
perfstat_protocol Interface
The perfstat_protocol interface returns a set of structures of type perfstat_protocol_t, which consists of
a set of unions to accommodate the different sets of fields needed for each protocol, as defined in the
libperfstat.h file.
Selected fields from the perfstat_protocol_t structure include:
Item Descriptor
name Protocol name, which can be any of the following values: ip, ip6, icmp, icmp6, udp, tcp,
rpc, nfs, nfsv2, or nfsv3.
ipackets Number of input packets received using this protocol. This field exists only for protocols
ip, ipv6, udp, and tcp.
opackets Number of output packets sent using this protocol. This field exists only for protocols ip,
ipv6, udp, and tcp.
received Number of packets received using this protocol. This field exists only for protocols icmp
and icmpv6.
Many other network-protocol related metrics are also returned. For a complete list of network-protocol
related metrics, see the perfstat_protocol_t section in the libperfstat.h header file.
The following code shows an example of how the perfstat_protocol interface is used:
#include <stdio.h>
#include <string.h>
#include <libperfstat.h>
if (ret < 0)
{
perror("perfstat_protocol");
exit(-1);
}
retrieved += ret;
do {
printf("\nStatistics for protocol : %s\n", pinfo.name);
printf("-----------------------\n");
if (!strcmp(pinfo.name,"ip")) {
printf("number of input packets : %llu\n", pinfo.u.ip.ipackets);
printf("number of input errors : %llu\n", pinfo.u.ip.ierrors);
printf("number of output packets : %llu\n", pinfo.u.ip.opackets);
printf("number of output errors : %llu\n", pinfo.u.ip.oerrors);
} else if (!strcmp(pinfo.name,"ipv6")) {
printf("number of input packets : %llu\n", pinfo.u.ipv6.ipackets);
printf("number of input errors : %llu\n", pinfo.u.ipv6.ierrors);
printf("number of output packets : %llu\n", pinfo.u.ipv6.opackets);
printf("number of output errors : %llu\n", pinfo.u.ipv6.oerrors);
} else if (!strcmp(pinfo.name,"icmp")) {
printf("number of packets received : %llu\n", pinfo.u.icmp.received);
printf("number of packets sent : %llu\n", pinfo.u.icmp.sent);
printf("number of errors : %llu\n", pinfo.u.icmp.errors);
} else if (!strcmp(pinfo.name,"icmpv6")) {
printf("number of packets received : %llu\n", pinfo.u.icmpv6.received);
printf("number of packets sent : %llu\n", pinfo.u.icmpv6.sent);
printf("number of errors : %llu\n", pinfo.u.icmpv6.errors);
} else if (!strcmp(pinfo.name,"udp")) {
printf("number of input packets : %llu\n", pinfo.u.udp.ipackets);
printf("number of input errors : %llu\n", pinfo.u.udp.ierrors);
printf("number of output packets : %llu\n", pinfo.u.udp.opackets);
} else if (!strcmp(pinfo.name,"tcp")) {
printf("number of input packets : %llu\n", pinfo.u.tcp.ipackets);
printf("number of input errors : %llu\n", pinfo.u.tcp.ierrors);
printf("number of output packets : %llu\n", pinfo.u.tcp.opackets);
} else if (!strcmp(pinfo.name,"rpc")) {
printf("client statistics:\n");
printf("number of connection-oriented RPC requests : %llu\n",
pinfo.u.rpc.client.stream.calls);
printf("number of rejected connection-oriented RPCs : %llu\n",
pinfo.u.rpc.client.stream.badcalls);
The program displays an output that is similar to the following example output:
server statistics:
number of connection-oriented RPC requests : 0
number of rejected connection-oriented RPCs : 0
number of connectionless RPC requests : 0
number of rejected connectionless RPCs : 0
The preceding program emulates protocolstat behavior and also shows how perfstat_protocol is used.
Item Descriptor
mb_size Size of the paging space in MB
lp_size Size of the paging space in logical partitions
mb_used Portion of the paging space used in MB
Several other paging-space-related metrics (such as name, type, and active) are also returned. For a
complete list of other paging-space-related metrics, see the perfstat_pagingspace_t section in the
libperfstat.h header file in Files Reference.
The following code shows an example of how perfstat_pagingspace is used:
#include <stdio.h>
#include <stdlib.h>
#include <libperfstat.h>
pinfo = calloc(tot,sizeof(perfstat_pagingspace_t));
strcpy(first.name, FIRST_PAGINGSPACE);
perfstat_process interfaces
The perfstat_process interface returns a set of structures of type perfstat_process_t, which is
defined in the libperfstat.h file.
The field of the perfstat_process_t structure includes:
Item Descriptor
pid Process ID
proc_name Name of the process
proc_priority Priority of the process
num_threads Thread count
proc_uid Information of the owner
proc_classid WLM class name
proc_size Virtual size of the process
proc_real_mem_data Real memory used for the data in kilobytes
proc_real_mem_text Real memory used for text in kilobytes
proc_virt_mem_data Virtual memory used for data in kilobytes
proc_virt_mem_text Virtual memory used for text in kilobytes
shared_lib_data_size Data size from shared library in kilobytes
heap_size Heap size in kilobytes
real_inuse The real memory in kilobytes used by the process including the
segments
virt_inuse The virtual memory in kilobytes used by the process including the
segments
pinned Pinned memory in kilobytes used for the process that is inclusive of
all segments
pgsp_inuse Paging space in kilobytes uses inclusive of all segments
filepages File pages in kilobytes used including shared pages
real_inuse_map Real memory in kilobytes used for shared memory and memory
mapped regions
virt_inuse_map Virtual memory in kilobytes used for shared memory and memory
mapped regions
pinned_inuse_map Pinned memory in kilobytes for shared memory and memory
mapped regions
ucpu_time User mode CPU time in milliseconds
scpu_time System mode CPU time in milliseconds
last_timebase Timebase counter
inBytes Bytes read from the disk
outBytes Bytes written to the disk
inOps In operations from disk
#include <libperfstat.h>
void main()
{
perfstat_process_t *proct;
perfstat_id_t id;
int i,rc,proc_count;
strcpy(id.name,"");
rc = perfstat_process(&id,proct,sizeof(perfstat_process_t),proc_count);
if(rc <= 0)
{
perror("Error in perfstat_process");
exit(-1) ;
}
Number of Processes = 77
Credential Information
Owner Info = 0
WLM Class Name = 257
The program displays an output that is similar to the following example output:
Number of Processes = 77
Credential Information
Owner Info = 0
WLM Class Name = 257
perfstat_process_util interface
The perfstat_process_util interface returns a set of structures of type perfstat_process_t,
which is defined in the libperfstat.h file.
The following is an example of code that uses the perfstat_process_util API:
#include <libperfstat.h>
#include <stdio.h>
#include <stdlib.h>
#define PERIOD 5
void main()
{
perfstat_process_t *cur, *prev;
perfstat_rawdata_t buf;
perfstat_process_t *proc_util;
perfstat_id_t id;
int cur_proc_count,prev_proc_count;
int i,rc;
prev_proc_count = perfstat_process(NULL, NULL,sizeof(perfstat_process_t),0);
if(prev_proc_count <= 0)
{
perror("Error in perfstat_process");
exit(-1) ;
}
prev = (perfstat_process_t *)calloc(prev_proc_count,sizeof(perfstat_process_t));
if(prev == NULL)
{
perror("Memory Allocation Error");
exit(-1) ;
}
strcpy(id.name,"");
rc = perfstat_process(&id,prev,sizeof(perfstat_process_t),prev_proc_count);
if(rc <= 0)
{
perror("Error in perfstat_process");
exit(-1) ;
}
sleep(PERIOD);
bzero(&buf, sizeof(perfstat_rawdata_t));
buf.type = UTIL_PROCESS;
buf.curstat = cur;
buf.prevstat = prev;
buf.sizeof_data = sizeof(perfstat_process_t);
buf.cur_elems = cur_proc_count;
buf.prev_elems = prev_proc_count;
The program displays an output that is similar to the following example output:
Process ID = 1
User Mode CPU time = 0.000000
System Mode CPU time = 0.000000
Bytes Written to Disk = 0
Bytes Read from Disk = 0
In Operations from Disk = 0
Out Operations from Disk = 0
=====================================
Process ID = 196614
User Mode CPU time = 0.000000
System Mode CPU time = 0.000000
Bytes Written to Disk = 0
Bytes Read from Disk = 0
In Operations from Disk = 0
Out Operations from Disk = 0
=====================================
Process ID = 262152
User Mode CPU time = 0.000000
System Mode CPU time = 0.000000
Bytes Written to Disk = 0
Bytes Read from Disk = 0
In Operations from Disk = 0
Out Operations from Disk = 0
=====================================
perfstat_processor_pool_util interface
The perfstat_processor_pool_util interface returns a set of structures of type
perfstat_processor_pool_util_t, which is defined in the libperfstat.h file
Item Descriptor
max_capacity Maximum pool processor capacity of the partition.
entitled_capacity Entitled pool processor capacity of the partition.
The use of the perfstat_processor_pool_util API for the system-level utilization follows:
#include <libperfstat.h>
#include <sys/dr.h>
#include <sys/types.h>
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
#define COUNT 2
#define INTERVAL 2
void main(int argc, char **argv)
{
perfstat_rawdata_t data;
perfstat_partition_total_t oldt,newt;
perfstat_processor_pool_util_t util,*uti;
static int once=0;
int rc;
u_longlong_t x=0;
int iInter=0,iCount=0;
int c;
while( (c = getopt(argc,argv,"i:c:"))!= EOF ){
switch(c) {
case 'i':
iInter=atoi(optarg);
break;
case 'c':
iCount=atoi(optarg);
break;
}
}
perfstat_tape Interface
The perfstat_tape interface returns a set of structures of type perfstat_tape_t, which is defined in the
libperfstat.h file.
Selected fields from the perfstat_tape_t structure include:
Item Descriptor
size Size of the tape (in MB)
free Free portion of the tape (in MB)
bsize Tape block size (in bytes)
paths_count Number of paths to the tape
Several other paging-space-related metrics (such as name, type, and active) are also returned.
For a complete list of paging-space-related metrics, see the perfstat_pagingspace_t section in the
libperfstat.h header file in Files Reference.
#include <stdio.h>
#include <stdlib.h>
#include <libperfstat.h>
int main(){
int ret, tot, i;
perfstat_tape_t *statp;
perfstat_id_t first;
for(i=0;i<ret;i++){
perfstat_thread interfaces
The perfstat_thread interface returns a set of structures of type perfstat_thread_t, which is
defined in the libperfstat.h file.
The field of the perfstat_thread_t structure includes the following:
#include <libperfstat.h>
void main()
{
perfstat_thread_t *threadt;
perfstat_id_t id;
int i,rc,thread_count;
strcpy(id.name,"");
rc = perfstat_thread(&id,threadt,sizeof(perfstat_thread_t),thread_count);
if(rc <= 0)
{
free(threadt);
perror("Error in perfstat_thread");
exit(-1) ;
}
The program displays an output that is similar to the following example output:
Process ID = 6553744
Thread ID = 12345
perfstat_thread_util interface
The perfstat_thread_util interface returns a set of structures of type perfstat_thread_t, which
is defined in the libperfstat.h file.
The following is an example of code for the perfstat_thread_util API:
#include <libperfstat.h>
#define PERIOD 5
void main()
{
perfstat_thread_t *cur, *prev;
perfstat_rawdata_t buf;
perfstat_thread_t *thread_util;
perfstat_id_t id;
int cur_thread_count,prev_thread_count;
int i,rc;
prev_thread_count = perfstat_thread(NULL, NULL,sizeof(perfstat_thread_t),0);
if(prev_thread_count <= 0)
{
perror("Error in perfstat_thread");
exit(-1) ;
}
prev = (perfstat_thread_t *)calloc(prev_thread_count,sizeof(perfstat_thread_t));
if(prev == NULL)
{
perror("Memory Allocation Error");
exit(-1) ;
}
strcpy(id.name,"");
prev_thread_count = perfstat_thread(&id,prev,sizeof(perfstat_thread_t),prev_thread_count);
if(prev_thread_count <= 0)
{
free(prev);
perror("Error in perfstat_thread");
exit(-1) ;
}
sleep(PERIOD);
bzero(&buf, sizeof(perfstat_rawdata_t));
buf.type = UTIL_PROCESS;
buf.curstat = cur;
buf.prevstat = prev;
buf.sizeof_data = sizeof(perfstat_thread_t);
buf.cur_elems = cur_thread_count;
buf.prev_elems = prev_thread_count;
/* Calculate Thread Utilization. This returns the number of thread_util structures that are
The program displays an output that is similar to the following example output:
Process ID = 6160532
Thread ID = 123456
User Mode CPU time = 21.824531
System Mode CPU time = 0.000000
Bound CPU Id = 1
Related information
libperfstat.h command
perfstat_volumegroup Interface
The perfstat_volumegroup interface returns a set of structures of type perfstat_logicalvolume_t, which
is defined in the libperfstat.h file.
Selected fields from the perfstat_logicalvolume_t structure include:
Item Descriptor
Total_disks Total number of disks in the volume group
Active_disks Total number of active disks in the volume group
Iocnt Number of read and write requests
The following code shows an example of how the perfstat_logicalvolume interface is used:
#include <stdio.h>
#include <stdlib.h>
#include <libperfstat.h>
int main(){
int vg_count, rc,i;
perfstat_id_t first;
perfstat_volumegroup_t *vg;
strcpy(first.name,NULL);
return 0;
}
The program displays an output that is similar to the following example output:
The preceding program emulates vmstat behavior and also shows how perfstat_volumegroup is used.
WPAR Interfaces
The following are two types of WPAR interfaces:
• The metrics related to a set of components for a WPAR (such as processors, or memory).
• The specific metrics related to individual components on a WPAR (such as a processor, network
interface, or memory page).
All of the following WPAR interfaces use the naming convention perfstat_subsystem_total_wpar, and
use a common signature:
Item Descriptor
perfstat_cpu_total_wpar Retrieves WPAR processor summary usage metrics
perfstat_memory_total_wpar Retrieves WPAR memory summary usage metrics
perfstat_wpar_total Retrieves WPAR information metrics
perfstat_memory_page_wpar Retrieves WPAR memory page usage metrics
perfstat_subsystem_total_t *userbuff A memory area with enough space for the returned
structure.
int sizeof_struct The size of the perfstat_memory_total_wpar_t
structure.
int desired_number The number of different page size statistics to be
collected.
The number of structures copied and returned without errors use the return value of 1. If there are errors,
the return value is -1.
An exception to this scheme is perfstat_wpar_total. For this function, when name=NULL, userbuff=NULL
and desired_number=0, the total number of perfstat_wpar_total_t structures available is returned.
To retrieve all perfstat_wpar_total_t structures, select one of the following methods:
• Determine the number of structures and allocate the required memory to hold all structure at one time.
You can then call the appropriate API to retrieve all structures using one call.
• Allocate a fixed set of structures and repeatedly call the API to get the next number of structures, each
time passing the name returned by the previous call. Start the process by using one of the following
queries:
– wparname set to ""
– FIRST_WPARNAME
– wpar_id set to -1
– FIRST_WPARID
Repeat the process until the wparname is returned equal to “ or the wpar_id is returned equal to -1.
The perfstat_id_wpar_total interface returns a set of structures of type perfstat_id_wpar_total_t, which
is defined in the libperfstat.h file. Selected fields from the perfstat_id_wpar_total_t structure include:
Item Descriptor
spec Select WPAR ID, WPAR Name, or the RSET Handle from the union
wpar_id Specifies the WPAR ID
wparname Specifies the WPAR Name
rset Specifies the RSET Handle of the rset associated with the WPAR
name Reserved for future use, must be NULL
perfstat_wpar_total Interface
The perfstat_wpar_total interface returns a set of structures of type perfstat_wpar_total_t, which is
defined in the libperfstat.h file.
Selected fields from the perfstat_wpar_total_t structure include:
Item Descriptor
Type WPAR type.
online_cpus The number of virtual processors currently allocated to the partition rset
or the number of virtual processors currently allocated to the system
partition.
online_memory The amount of memory currently allocated to the system partition.
cpu_limit The maximum limit of processor resources this WPAR consumes. The
processor limit is in 100ths of percentage units.
Several other paging-space-related metrics (such as number of system calls, number of reads, writes,
forks, execs, and load average) are also returned. For a complete list of other paging-space-relate
metrics, see the perfstat_wpar_total_t section in the libperfstat.h header file in Files Reference.
The following program emulates wparstat behavior and also shows an example of how
perfstat_wpar_total is used from the global environment:
#include <stdio.h>
#include <stdlib.h>
#include <libperfstat.h>
int main(){
perfstat_wpar_total_t *winfo;
perfstat_id_wpar_t wparid;
int tot, rc, i;
if (tot < 0) {
perror("Error in perfstat_wpar_total");
exit(-1);
}
if (tot == 0) {
printf("No WPARs found in the system\n");
exit(-1);
}
if (rc < 0) {
perror("Error in perfstat_wpar_total");
exit(-1);
}
for(i=0;i<tot;i++){
printf("Name of the Workload Partition=%s\n",winfo[i].name);
printf("Workload partition identifier=%u\n",winfo[i].wpar_id);
printf("Number of Virtual CPUs in partition rset=%d\n",winfo[i].online_cpus);
printf("Amount of memory currently online in Global Partition=%lld\n",winfo[i].online_memory);
printf("Number of processor units this partition is entitled to
return(0);
}
The program displays an output that is similar to the following example output:
The following code shows an example of how perfstat_wpar_total is used from the WPAR environment:
#include <stdio.h>
#include <stdlib.h>
#include <libperfstat.h>
int main(){
perfstat_wpar_total_t *winfo;
perfstat_id_wpar_t wparid;
int tot, rc, i;
if (tot < 0) {
perror("Error in perfstat_wpar_total");
exit(-1);
}
if (tot == 0) {
printf("No WPARs found in the system\n");
exit(-1);
}
if (rc < 0) {
perror("Error in perfstat_wpar_total");
exit(-1);
}
for(i=0;i<tot;i++){
printf("Name of the Workload Partition=%s\n",winfo[i].name);
printf("Workload partition identifier=%u\n",winfo[i].wpar_id);
printf("Number of Virtual CPUs in partition rset=%d\n",winfo[i].online_cpus);
printf("Amount of memory currently online in Global Partition=%lld\n",winfo[i].online_memory);
printf("Number of processor units this partition is entitled to
receive=%d\n",winfo[i].entitled_proc_capacity);
printf("\n");
}
return(0);
}
perfstat_cpu_total_wpar Interface
The perfstat_cpu_total_wpar interface returns a set of structures of type perfstat_cpu_total_wpar_t,
which is defined in the libperfstat.h file.
Selected fields from the perfstat_cpu_total_wpar_t structure include:
Item Descriptor
processorHz Processor speed in Hertz (from ODM)
Description Processor type (from ODM)
Several other paging-space-related metrics (such as number of system calls, number of reads, writes,
forks, execs, and load average) are also returned. For a complete list of other paging-space-related
metrics, see the perfstat_cpu_total_wpar_t section in the libperfstat.h header file.
The following program emulates wparstat behavior and also shows an example of how
perfstat_cpu_total_wpar_t is used from the global environment:
#include <stdio.h>
#include <stdlib.h>
#include <libperfstat.h>
int main(){
perfstat_cpu_total_wpar_t *cpustats;
perfstat_id_wpar_t wparid;
perfstat_wpar_total_t *winfo;
int i,j,rc,totwpars;
perror("Error in perfstat_wpar_total");
exit(-1);
}
if (totwpars == 0) {
printf("No WPARs found in the system\n");
exit(-1);
}
if (rc <= 0) {
perror("Error in perfstat_wpar_total");
exit(-1);
}
cpustats=calloc(1,sizeof(perfstat_cpu_total_wpar_t));
rc = perfstat_cpu_total_wpar(&wparid, cpustats, sizeof(perfstat_cpu_total_wpar_t), 1);
if (rc != 1) {
perror("perfstat_cpu_total_wpar");
exit(-1);
}
for(j=0;j<rc;j++){
printf("Number of active logical processors in Global=%d\n",cpustats[j].ncpus);
printf("Processor description=%s\n",cpustats[j].description);
printf("Processor speed in Hz=%lld\n",cpustats[j].processorHZ);
printf("Number of process switches=%lld\n",cpustats[j].pswitch);
printf("Number of forks system calls executed=%lld\n",cpustats[j].sysfork);
printf("Length of the run queue=%lld\n",cpustats[j].runque);
The program displays an output that is similar to the following example output:
The following code shows an example of how perfstat_cpu_total_wpar is used from the WPAR
environment:
#include <stdio.h>
#include <stdlib.h>
#include <libperfstat.h>
int main(){
perfstat_cpu_total_wpar_t *cpustats;
perfstat_id_wpar_t wparid;
perfstat_wpar_total_t *winfo;
int i,j,rc,totwpars;
perror("Error in perfstat_wpar_total");
exit(-1);
}
if (totwpars == 0) {
printf("No WPARs found in the system\n");
exit(-1);
}
if (rc <= 0) {
perror("Error in perfstat_wpar_total");
exit(-1);
}
cpustats=calloc(1,sizeof(perfstat_cpu_total_wpar_t));
rc = perfstat_cpu_total_wpar(NULL, cpustats, sizeof(perfstat_cpu_total_wpar_t), 1);
if (rc != 1) {
perror("perfstat_cpu_total_wpar");
exit(-1);
}
for(j=0;j<rc;j++){
printf("Number of active logical processors in Global=%d\n",cpustats[j].ncpus);
printf("Processor description=%s\n",cpustats[j].description);
printf("Processor speed in Hz=%lld\n",cpustats[j].processorHZ);
printf("Number of process switches=%lld\n",cpustats[j].pswitch);
printf("Number of forks system calls executed=%lld\n",cpustats[j].sysfork);
printf("Length of the run queue=%lld\n",cpustats[j].runque);
printf("Length of the swap queue=%lld\n",cpustats[j].swpque);
}
}
}
Item Descriptor
real_total Amount of Global real memory (in units of 4 KB pages)
real_free Amount of Global free real memory (in units of 4 KB pages)
real_pinned Amount of WPAR pinned memory (in units of 4 KB pages)
Pgins Number of WPAR pages paged in
Pgouts Number of WPAR pages paged out
Several other paging-space-related metrics (such as number of system calls, number of reads, writes,
forks, execs, and load average) are also returned. For a complete list of other paging-space-related
metrics, see the perfstat_memory_total_wpar_t section in the libperfstat.h header file.
The following program emulates wparstat behavior and also shows an example of how
perfstat_memory_total_wpar is used from the global environment:
#include <stdio.h>
#include <stdlib.h>
#include <libperfstat.h>
int main(){
perfstat_memory_total_wpar_t *memstats;
perfstat_id_wpar_t wparid;
perfstat_wpar_total_t *winfo;
int i,j,rc,totwpars;
perror("Error in perfstat_wpar_total");
exit(-1);
}
for(i=0; i < totwpars; i++)
{
bzero(&wparid, sizeof(perfstat_id_wpar_t));
wparid.spec = WPARID;
wparid.u.wpar_id = winfo[i].wpar_id;
memstats=calloc(1,sizeof(perfstat_memory_total_wpar_t));
rc = perfstat_memory_total_wpar(&wparid, memstats, sizeof(perfstat_memory_total_wpar_t), 1);
if (rc != 1) {
perror("perfstat_memory_total_wpar");
exit(-1);
}
for(j=0;j<rc;j++){
printf("Global total real memory=%lld\n",memstats[j].real_total);
printf("Global free real memory=%lld\n",memstats[j].real_free);
printf("Real memory which is pinned=%lld\n",memstats[j].real_pinned);
The following code shows an example of how perfstat_memory_total_wpar is used from the WPAR
environment:
#include <stdio.h>
#include <stdlib.h>
#include <libperfstat.h>
int main(){
perfstat_memory_total_wpar_t *memstats;
perfstat_id_wpar_t wparid;
perfstat_wpar_total_t *winfo;
int i,j,rc,totwpars;
perror("Error in perfstat_wpar_total");
exit(-1);
}
for(i=0; i < totwpars; i++)
{
bzero(&wparid, sizeof(perfstat_id_wpar_t));
wparid.spec = WPARID;
wparid.u.wpar_id = winfo[i].wpar_id;
memstats=calloc(1,sizeof(perfstat_memory_total_wpar_t));
rc = perfstat_memory_total_wpar(NULL, memstats, sizeof(perfstat_memory_total_wpar_t), 1);
if (rc != 1) {
perror("perfstat_memory_total_wpar");
exit(-1);
}
for(j=0;j<rc;j++){
printf("Global total real memory=%lld\n",memstats[j].real_total);
printf("Global free real memory=%lld\n",memstats[j].real_free);
printf("Real memory which is pinned=%lld\n",memstats[j].real_pinned);
printf("Real memory which is in use=%lld\n",memstats[j].real_inuse);
printf("Number of page faults=%lld\n",memstats[j].pgexct);
printf("Number of pages paged in=%lld\n",memstats[j].pgins);
printf("Number of pages paged out=%lld\n",memstats[j].pgouts);
}
}
}
Item Descriptor
Psize Page size in bytes
real_total Amount of Global real memory (in units of the psize)
real_pinned Amount of WPAR pinned memory (in units of psize)
Pgins Number of WPAR pages paged in
Pgouts Number of WPAR pages paged out
Several other paging-space-related metrics (such as number of system calls, number of reads, writes,
forks, execs, and load average) are also returned. For a complete list of other paging-space-related
metrics, see the perfstat_memory_page_wpar_t section in the libperfstat.h header file.
The following program emulates vmstat behavior and also shows an example of how
perfstat_memory_page_wpar is used from the global environment:
#include <stdio.h>
#include <stdlib.h>
#include <libperfstat.h>
int main(){
int i, psizes, rc;
perfstat_memory_page_wpar_t *pageinfo;
perfstat_id_wpar_t wparid;
wparid.spec = WPARNAME;
strcpy(wparid.u.wparname,"test");
perfstat_psize_t psize;
psize.psize = FIRST_PSIZE;
/* Get the number of page sizes */
psizes = perfstat_memory_page_wpar(&wparid, NULL, NULL, sizeof(perfstat_memory_page_wpar_t),0);
/*check for error */
if (psizes <= 0 ){
perror("perfstat_memory_page_wpar ");
exit(-1);
}
for(i=0;i<psizes;i++){
printf("Page size in bytes=%lld\n",pageinfo[i].psize);
printf("Number of real memory frames of this page size=%lld\n",pageinfo[i].real_total);
printf("Number of pages pinned=%lld\n",pageinfo[i].real_pinned);
printf("Number of pages in use=%lld\n",pageinfo[i].real_inuse);
printf("Number of page faults=%lld\n",pageinfo[i].pgexct);
printf("Number of pages paged in=%lld\n",pageinfo[i].pgins);
printf("Number of pages paged out=%lld\n",pageinfo[i].pgouts);
printf("Number of page ins from paging space=%lld\n",pageinfo[i].pgspins);
printf("Number of page outs from paging space=%lld\n",pageinfo[i].pgspouts);
printf("Number of page scans by clock=%lld\n",pageinfo[i].scans);
printf("Number of page steals=%lld\n",pageinfo[i].pgsteals);
The following code shows an example of how perfstat_memory_page_wpar is used from the WPAR
environment:
#include <stdio.h>
#include <stdlib.h>
#include <libperfstat.h>
int main(){
int i, psizes, rc;
perfstat_memory_page_wpar_t *pageinfo;
perfstat_id_wpar_t wparid;
perfstat_psize_t psize;
psize.psize = FIRST_PSIZE;
/* Get the number of page sizes */
psizes = perfstat_memory_page_wpar(&wparid, NULL, NULL, sizeof(perfstat_memory_page_wpar_t),0);
/*check for error */
if (psizes <= 0 ){
perror("perfstat_memory_page_wpar ");
exit(-1);
}
for(i=0;i<psizes;i++){
printf("Page size in bytes=%lld\n",pageinfo[i].psize);
printf("Number of real memory frames of this page size=%lld\n",pageinfo[i].real_total);
printf("Number of pages pinned=%lld\n",pageinfo[i].real_pinned);
printf("Number of pages in use=%lld\n",pageinfo[i].real_inuse);
printf("Number of page faults=%lld\n",pageinfo[i].pgexct);
printf("Number of pages paged in=%lld\n",pageinfo[i].pgins);
printf("Number of pages paged out=%lld\n",pageinfo[i].pgouts);
printf("Number of page ins from paging space=%lld\n",pageinfo[i].pgspins);
printf("Number of page outs from paging space=%lld\n",pageinfo[i].pgspouts);
printf("Number of page scans by clock=%lld\n",pageinfo[i].scans);
printf("Number of page steals=%lld\n",pageinfo[i].pgsteals);
}
}
RSET Interfaces
The RSET interface reports processor metrics related to an RSET.
All of the following AIX 6.1 RSET interfaces use the naming convention perfstat_subsystem[_total]_rset,
and use a common signature:
Item Descriptor
perfstat_cpu_total_rset Retrieves processor summary metrics of the processors in an
RSET
perfstat_cpu_rset Retrieves per processor metrics of the processors in an RSET
perfstat_cpu_t * userbuff,
int sizeof_struct,
int desired_number);
perfstat_cpu_total_t * userbuff,
int sizeof_struct,
int desired_number);
Item Descriptor
perfstat_id_wpar_t *name Specifies the RSET identifier and the name of the first component (for
example, cpu0) for which statistics are desired. A structure containing
the specifier, which can be an RSETHANDLE, WPARID, or WPARNAME, a
union to specify the wpar ID, or wpar name or rsethandle and a char *
field to specify the name of the first component. To start from the first
component of a subsystem, set the char* field of the name parameter to
"" (empty string). You can also use the macro FIRST_CPU defined in the
libperfstat.h file.
The number of structures copied and returned without errors uses the return value of 1. If there are
errors, the return value is -1. The field name is either set to NULL or to the name of the next structure
available.
An exception to this scheme is when name=NULL, userbuff=NULL, and desired_number=0, the total
number of structures available is returned.
To retrieve all structures of a given type, either ask first for their number, allocate enough memory to hold
them all at once, then call the appropriate API to retrieve them all in one call. Else, allocate a fixed set of
structures and repeatedly call the API to get the next such number of structures, each time passing the
name returned by the previous call. Start the process with the name set to "" or FIRST_CPU, and repeat
the process until the name returned is equal to "".
The following sections provide examples of the type of data returned and code using each of the
interfaces.
perfstat_cpu_rset interface
The perfstat_cpu_rset interface returns a set of structures of type perfstat_cpu_t, which is defined in the
libperfstat.h file.
Selected fields from the perfstat_cpu_t structure include:
Item Descriptor
name Logical processor name (cpu0, cpu1, and so on)
user Number of clock ticks spent in user mode
sys Number of clock ticks spent in system (kernel) mode
idle Number of clock ticks spent idle with no I/O pending
wait Number of clock ticks spent idle with I/O pending
syscall Number of system call executed
Several other paging-space-related metrics (such as number of forks, reads, writes, and execs) are also
returned. For a complete list of other paging-space-related metrics, see the perfstat_cpu_t section in the
libperfstat.h header file.
The following code shows an example of how perfstat_cpu_rset is used from the global environment:
#include <stdio.h>
#include <stdlib.h>
#include <libperfstat.h>
int main(){
int i, retcode, rsetcpus;
perfstat_id_wpar_t wparid;
perfstat_cpu_t *statp;
wparid.spec = WPARNAME;
strcpy(wparid.u.wparname,NULL);
if (rsetcpus < 0 ){
if(!statp){
perror("calloc");
}
for(i=0;i<retcode;i++){
printf("Logical processor name=%s\n",statp[i].name);
printf("Raw number of clock ticks spent in user mode=%lld\n",statp[i].user);
printf("Raw number of clock ticks spent in system mode=%lld\n",statp[i].sys);
printf("Raw number of clock ticks spent in idle mode=%lld\n",statp[i].idle);
printf("Raw number of clock ticks spent in wait mode=%lld\n",statp[i].wait);
}
return 0;
}
The program displays an output that is similar to the following example output:
The following code shows an example of how perfstat_cpu_rset is used from the WPAR environment:
#include <stdio.h>
#include <stdlib.h>
#include <libperfstat.h>
int main(){
int i, retcode, rsetcpus;
perfstat_id_wpar_t wparid;
if (rsetcpus < 0 ){
perror("perfstat_cpu_rset");
exit(-1);
}
if(!statp){
perror("calloc");
}
for(i=0;i<retcode;i++){
printf("Logical processor name=%s\n",statp[i].name);
printf("Raw number of clock ticks spent in user mode=%lld\n",statp[i].user);
printf("Raw number of clock ticks spent in system mode=%lld\n",statp[i].sys);
printf("Raw number of clock ticks spent in idle mode=%lld\n",statp[i].idle);
printf("Raw number of clock ticks spent in wait mode=%lld\n",statp[i].wait);
}
return 0;
}
perfstat_cpu_total_rset interface
The perfstat_cpu_total_rset interface returns a set of structures of type perfstat_cpu_total_t, which is
defined in the libperfstat.h file.
Selected fields from the perfstat_cpu_t structure include:
Item Descriptor
processorHz Processor speed in Hertz (from ODM)
description Processor type (from ODM)
CPUs Current number of active processors
ncpus_cfg Number of configured processors (maximum number of processors that this
copy of AIX can handle simultaneously)
ncpus_high Maximum number of active processors; that is, the maximum number of
active processors since the last reboot
User Total number of clock ticks spent in user mode
Sys Total number of clock ticks spent in system (kernel) mode
Idle Total number of clock ticks spent idle with no I/O pending
Wait Total number of clock ticks spent idle with I/O pending
Several other paging-space-related metrics (such as number of forks, read, writes, and execs) are also
returned. For a complete list of other paging-space-related metrics, see the perfstat_cpu_total_t section
in the libperfstat.h header file.
The following code shows an example of how the perfstat_cpu_total_rset interface is used from the
global environment:
#include <stdio.h>
#include <stdlib.h>
#include <libperfstat.h>
rc = perfstat_cpu_total_rset(NULL,NULL,sizeof(perfstat_cpu_total_t),0);
if (rc <= 0) {
perror("perfstat_cpu_total_rset");
exit(-1);
}
cpustats=calloc(rc,sizeof(perfstat_cpu_total_t));
if(cpustats==NULL){
perror("MALLOC error:");
exit(-1);
}
strcpy(wparid.u.wparname,"test");
rc = perfstat_cpu_total_rset(&wparid, cpustats, sizeof(perfstat_cpu_total_t), rc);
if (rc <= 0) {
perror("perfstat_cpu_total_rset");
exit(-1);
}
for(i=0;i<rc;i++){
printf("Number of active logical processors=%d\n",cpustats[i].ncpus);
printf("Number of configured processors=%d\n",cpustats[i].ncpus_cfg);
printf("Processor description=%s\n",cpustats[i].description);
printf("Processor speed in Hz=%lld\n",cpustats[i].processorHZ);
printf("Raw total number of clock ticks spent in user mode=%lld\n",cpustats[i].user);
printf("Raw total number of clock ticks spent in system mode=%lld\n",cpustats[i].sys);
printf("Raw total number of clock ticks spent idle=%lld\n",cpustats[i].idle);
printf("Raw total number of clock ticks spent wait=%lld\n",cpustats[i].wait);
}
return 0;
}
The following code shows an example of how perfstat_cpu_total_rset is used from the WPAR
environment:
#include <stdio.h>
#include <stdlib.h>
#include <libperfstat.h>
int main(){
perfstat_cpu_total_t *cpustats;
perfstat_id_wpar_t wparid;
int rc,i;
rc = perfstat_cpu_total_rset(NULL,NULL,sizeof(perfstat_cpu_total_t),0);
if (rc <= 0) {
perror("perfstat_cpu_total_rset");
exit(-1);
}
cpustats=calloc(rc,sizeof(perfstat_cpu_total_t));
if(cpustats==NULL){
perror("MALLOC error:");
exit(-1);
}
if (rc <= 0) {
perror("perfstat_cpu_total_rset");
exit(-1);
}
for(i=0;i<rc;i++){
You can use the following AIX interfaces to refresh the cached metrics:
Parameter Usage
char *name Identifies the name of the component of the cached metric
that must be reset from the libperfstat API cache. If the value
of the parameter is NULL, this signifies all of the components.
u_longlong_t resetmask Identifies the category of the component if the value of the
name parameter is not NULL. The possible values are:
• FLUSH_CPUTOTAL
• FLUSH_DISK
• RESET_DISK_MINMAX
• FLUSH_DISKADAPTER
• FLUSH_DISKPATH
• FLUSH_NETINTERFACE
• FLUSH_PAGINGSPACE
• FLUSH_LOGICALVOLUME
• FLUSH_VOLUMEGROUP
If the value of the name parameter is NULL,
the resetmask parameter value consists of a
combination of values. For example: RESET_DISK_MINMAX|
FLUSH_CPUTOTAL|FLUSH_DISK
perfstat_partial_reset Interface
The perfstat_partial_reset interface resets the specified cached metrics that are stored by the
libperfstat API.
The perfstat_partial_reset interface can also reset the system's minimum and maximum
counters related to disks and paths. The following table summarizes the various actions of the
perfstat_partial_reset interface:
You can see how to use the perfstat_partial_reset interface in the following example code:
#include <stdio.h>
#include <stdlib.h>
#include <libperfstat.h>
/* At this point, we assume the disk free part changes due to chfs for example */
/* if we get disk metrics here, the free field will be wrong as it was
* cached by the libperfstat.
*/
for(i=0;i<retcode;i++){
printf("Name of the disk=%s\n",statp[i].name);
printf("Disk description=%s\n",statp[i].description);
printf("Volume group name=%s\n",statp[i].vgname);
printf("Size of the disk=%lld\n",statp[i].size);
printf("Free portion of the disk=%lld\n",statp[i].free);
printf("Disk block size=%lld\n",statp[i].bsize);
}
}
The program displays an output that is similar to the following example output:
The following common signature is used by the perfstat_subsystem_node interface except the
perfstat_memory_page_node interface:
The following table describes the usage of the parameters of the perfstat_subsystem_node interface:
Item Descriptor
perfstat_id_node_t Specify the name of the node in name->u.nodenameformat. The name
*name must contain the name of the first component. For example, hdisk2 for
perfstat_disk_node(), where hdisk 2 is the name of the disk for which you
require the statistics.
Note: When you specify a nodename, it must be initialized as NODENAME.
perfstat_subsystem_t Points to a memory area that has enough space for the returned structure.
*userbuff
int sizeof_struct Sets this parameter to the size of perfstat_subsystem_t.
int desired_number Specifies the number of structures of type perfstat_subsystem_t to
return to a userbuff field.
The perfstat_subsystem_node interface return -1 value for error. Otherwise it returns the number
of structures copied. The field namename is set to the name of the next available structure, and an
exceptional case when userbuff equals NULL and desired_number equals 0, the total number
of structures available is returned.
The following example shows the usage of the perfstat_disk_node interface:
#include <stdio.h>
#include <stdlib.h>
#include <libperfstat.h>
#define INTERVAL_DEFAULT 2
#define COUNT_DEFAULT 10
if(collect_remote_node_stats)
{
/* perfstat_config needs to be called to enable cluster statistics collection */
ret = perfstat_config(PERFSTAT_ENABLE|PERFSTAT_CLUSTER_STATS, NULL);
if (ret == -1)
{
perror("cluster statistics collection is not available");
exit(-1);
}
}
if(collect_remote_node_stats)
{
/* Remember nodename is already set */
/* Now set name to first interface */
strcpy(nodeid.name, FIRST_DISK);
if(collect_remote_node_stats) {
/* Now disable cluster statistics by calling perfstat_config */
perfstat_config(PERFSTAT_DISABLE|PERFSTAT_CLUSTER_STATS, NULL);
}
}
The program displays an output that is similar to the following example output:
The following program shows the usage of the vmstat command and an example of using the
perfstat_memory_total_node interface to retrieve the virtual memory details of the remote node:
#include <stdio.h>
#include <libperfstat.h>
#define INTERVAL_DEFAULT 2
#define COUNT_DEFAULT 10
if(collect_remote_node_stats)
{
/* perfstat_config needs to be called to enable cluster statistics collection */
rc = perfstat_config(PERFSTAT_ENABLE|PERFSTAT_CLUSTER_STATS, NULL);
if (rc == -1)
{
perror("cluster statistics collection is not available");
exit(-1);
}
}
if(collect_remote_node_stats)
{
strncpy(nodeid.u.nodename, nodename, MAXHOSTNAMELEN);
nodeid.spec = NODENAME;
rc = perfstat_memory_total_node(&nodeid, &minfo, sizeof(perfstat_memory_total_t), 1);
}
else
{
rc = perfstat_memory_total(NULL, &minfo, sizeof(perfstat_memory_total_t), 1);
}
if (rc != 1) {
perror("perfstat_memory_total");
exit(-1);
}
printf("Memory statistics\n");
printf("-----------------\n");
printf("real memory size : %llu MB\n",
minfo.real_total*4096/1024/1024);
printf("reserved paging space : %llu MB\n",minfo.pgsp_rsvd);
printf("virtual memory size : %llu MB\n",
minfo.virt_total*4096/1024/1024);
printf("number of free pages : %llu\n",minfo.real_free);
printf("number of pinned pages : %llu\n",minfo.real_pinned);
printf("number of pages in file cache : %llu\n",minfo.numperm);
printf("total paging space pages : %llu\n",minfo.pgsp_total);
printf("free paging space pages : %llu\n", minfo.pgsp_free);
printf("used paging space : %3.2f%%\n",
(float)(minfo.pgsp_total-minfo.pgsp_free)*100.0/
(float)minfo.pgsp_total);
printf("number of paging space page ins : %llu\n",minfo.pgspins);
printf("number of paging space page outs : %llu\n",minfo.pgspouts);
printf("number of page ins : %llu\n",minfo.pgins);
printf("number of page outs : %llu\n",minfo.pgouts);
if(collect_remote_node_stats) {
/* Now disable cluster statistics by calling perfstat_config */
perfstat_config(PERFSTAT_DISABLE|PERFSTAT_CLUSTER_STATS, NULL);
}
}
The program displays an output that is similar to the following example output:
Memory statistics
-----------------
real memory size : 4096 MB
reserved paging space : 512 MB
virtual memory size : 4608 MB
number of free pages : 768401
number of pinned pages : 237429
number of pages in file cache : 21473
total paging space pages : 131072
free paging space pages : 128821
used paging space : 1.72%
number of paging space page ins : 0
Item Descriptor
name Specifies the name of the cluster.
Type Specifies the set of bits that describes the cluster.
num_nodes Specifies the number of nodes in the cluster.
node_data Points to a memory area that describes the details of all the nodes.
num_disks Specifies the number of disks in the cluster.
disk_data Points to a memory area that describes the details of all the disks.
For a complete list of parameters related to the perfstat_cluster_total_t structure, see the
libperfstat.h header file.
The following code example shows the usage of the perfstat_cluster_total interface:
#include <stdio.h>
#include <libperfstat.h>
typedef enum {
DISPLAY_DEFAULT = 0,
DISPLAY_NODE_DATA = 1,
DISPLAY_DISK_DATA = 2
} display_t;
The perfstat_node_list interface is used to retrieve the list of nodes in the perfstat_node_t
structure, which is defined in the libperfstat.h file. The following selected fields are from the
perfstat_node_t structure:
Item Descriptor
nodeid Specifies the identifier of the node.
nodename Specifies the name of the node.
#include <stdio.h>
#include <libperfstat.h>
return (0);
}
#include <stdio.h>
#include <libperfstat.h>
typedef enum {
DISPLAY_NODE_DATA = 1,
DISPLAY_DISK_DATA = 2,
} display_t;
nodeid.spec = NODENAME;
/*Get the number of disks for that node */
Interface changes
With the following filesets the rblks and wblks fields of libperfstat are represented by blocks of
512 bytes in the perfstat_disk_total_t, perfstat_diskadapter_t and perfstat_diskpath_t structures,
regardless of the actual block size used by the device for which metrics are being retrieved.
• bos.perf.libperfstat 4.3.3.4
• bos.perf.libperfstat 5.1.0.50
• bos.perf.libperfstat 5.2.0.10
Interface additions
Review the specific interfaces that are available for a fileset.
The following interfaces were added in the bos.perf.libperfstat 5.2.0 file set:
• perfstat_netbuffer
• perfstat_protocol
• perfstat_pagingspace
• perfstat_diskadapter
• perfstat_reset
The perfstat_diskpath interface was added in the bos.perf.libperfstat 5.2.0.10 file set.
The perfstat_partition_total interface was added in the bos.perf.libperfstat 5.3.0.0 file set.
Theperfstat_partial_reset interface was added in the bos.perf.libperfstat 5.3.0.10 file set.
The following interfaces were added in the bos.perf.libperfstat 6.1.2 file set:
• perfstat_cpu_total_wpar
• perfstat_memory_total_wpar
• perfstat_cpu_total_rset
• perfstat_cpu_rset
Field additions
The following additions have been made to the specified file set levels.
u_longlong_t bread
u_longlong_t bwrite
u_longlong_t lread
u_longlong_t lwrite
u_longlong_t phread
u_longlong_t phwrite
u_longlong_t bread
u_longlong_t bwrite
u_longlong_t lread
u_longlong_t lwrite
u_longlong_t phread
u_longlong_t phwrite
u_longlong_t iget
u_longlong_t namei
u_longlong_t dirblk
u_longlong_t msg
u_longlong_t sema
The name field which returns the logical processor name is now of the form cpu0, cpu1, instead of proc0,
proc1 as it was in previous releases.
The following fields were added to perfstat_cpu_total_t:
u_longlong_t runocc
u_longlong_t swpocc
u_longlong_t iget
u_longlong_t namei
u_longlong_t dirblk
u_longlong_t msg
u_longlong_t sema
u_longlong_t rcvint
u_longlong_t xmtint
u_longlong_t mdmint
u_longlong_t tty_rawinch
u_longlong_t tty_caninch
u_longlong_t tty_rawoutch
u_longlong_t ksched
u_longlong_t koverf
u_longlong_t kexit
u_longlong_t rbread
u_longlong_t rcread
u_longlong_t rbwrt
u_longlong_t rcwrt
u_longlong_t traps
int ncpus_high
char adapter[IDENTIFIER_LENGTH]
u_longlong_t bitrate
u_longlong_t real_system
u_longlong_t real_user
u_longlong_t real_process
uint paths_count
u_longlong_t puser
u_longlong_t psyss
u_longlong_t pidle
u_longlong_t pwait
u_longlong_t redisp_sd0
u_longlong_t redisp_sd1
u_longlong_t redisp_sd2
u_longlong_t redisp_sd3
u_longlong_t redisp_sd4
u_longlong_t redisp_sd5
u_longlong_t migration_push
u_longlong_t migration_S3grq
u_longlong_t migration_S3pul
u_longlong_t invol_cswitch
u_longlong_t vol_cswitch
u_longlong_t runque
u_longlong_t bound
u_longlong_t decrintrs
u_longlong_t mpcrintrs
u_longlong_t mpcsintrs
u_longlong_t devintrs
u_longlong_t softintrs
u_longlong_t phantintrs
u_longlong_t puser
u_longlong_t psys
u_longlong_t pidle
u_longlong_t pwait
u_longlong_t decrintrs
u_longlong_t mpcrintrs
u_longlong_t mpcsintrs
u_longlong_t phantintrs
u_longlong_t q_full
u_longlong_t rserv
u_longlong_t rtimeout
u_longlong_t rfailed
u_longlong_t min_rserv
u_longlong_t max_rserv
u_longlong_t wserv
u_longlong_t wtimeout
u_longlong_t wfailed
u_longlong_t min_wserv
u_longlong_t max_wserv
u_longlong_t wq_depth
u_longlong_t wq_sampled
u_longlong_t wq_time
u_longlong_t wq_min_time
u_longlong_t wq_max_time
u_longlong_t q_sampled
perfstat_disk_t
perfstat_disk_total_t
perfstat_diskadapter_t
perfstat_diskpath_t
#define FLUSH_CPUTOTAL
#define FLUSH_DISK
#define RESET_DISK_MINMAX
#define FLUSH_DISKADAPTER
#define FLUSH_DISKPATH
#define FLUSH_PAGINGSPACE
#define FLUSH_NETINTERFACE
u_longlong_t reserved_pages
u_longlong_t reserved_pagesize
u_longlong_t idle_donated_purr
u_longlong_t idle_donated_spurr
u_longlong_t busy_donated_purr
u_longlong_t busy_donated_spurr
u_longlong_t idle_stolen_purr
u_longlong_t idle_stolen_spurr
u_longlong_t busy_stolen_purr
u_longlong_t busy_stolen_spurr
unsigned donate_capable
unsigned donate_enabled
u_longlong_t version
Structure additions
Review the specific structure additions that are available for different file sets.
The following structures are added in the bos.perf.libperfstat 6.1.2.0 file set:
perfstat_cpu_total_wpar_t
perfstat_cpu_total_rset_t
perfstat_cpu_rset_t
perfstat_wpar_total_t
perfstat_tape_t
perfstat_tape_total_t
perfstat_memory_page_t
perfstat_memory_page_wpar_t
perfstat_logicalvolume_t
perfstat_volumegroup_t
The following structures are added in the bos.perf.libperfstat 6.1.6.0 file set:
The following structures are added in the bos.perf.libperfstat 6.1.7.0 file set:
perfstat_hfistat_t
perfstat_hfistat_window_t
Kernel tuning
You can make permanent kernel-tuning changes without having to edit any rc files. This is achieved by
centralizing the reboot values for all tunable parameters in the /etc/tunables/nextboot stanza file. When
a system is rebooted, the values in the /etc/tunables/nextboot file are automatically applied.
The following commands are used to manipulate the nextboot file and other files containing a set of
tunable parameter values:
• The tunchange command is used to change values in a stanza file.
• The tunsave command is used to save values to a stanza file.
• The tunrestore is used to apply a file; that is, to change all tunables parameter values to those listed in
a file.
• The tuncheck command must be used to validate a file created manually.
• The tundefault is available to reset tunable parameters to their default values.
The preceding commands work on both current and reboot values.
All six tuning commands (no, nfso, vmo, ioo, raso, and schedo) use a common syntax and are available to
directly manipulate the tunable parameter values. Available options include making permanent changes
and displaying detailed help on each of the parameters that the command manages. A large majority
of tunable parameter values are not modifiable when the login session is initiated outside of the global
WPAR partition. Attempts to modify such a read only tunable parameter value is refused by the command
and a diagnostic message written to standard error output.
SMIT panel is also available to manipulate the current and reboot values for all tuning parameters, as well
as the files in the /etc/tunables directory.
Related information
bosboot command
no command
tunables command
lsattr -E -l sys0
OR
use SMIT panel.
-p When used in combination with -o, -d or -D, makes changes apply to both current
and reboot values; that is, turns on the updating of the /etc/tunables/nextboot
file in addition to the updating of the current value. This flag cannot be used
on Reboot and Bosboot type parameters because their current value cannot be
changed.
When used with -a or -o flag without specifying a new value, values are displayed
only if the current and next boot values for a parameter are the same. Otherwise,
NONE is displayed as the value.
-r When used in combination with -o, -d or -D flags, makes changes apply to reboot
values only; that is, turns on the updating of the /etc/tunables/nextboot file.
If any parameter of type Bosboot is changed, the user will be prompted to run
bosboot.
When used with -a or -o without specifying a new value, next boot values for
tunables are displayed instead of current values.
tunable,current,default,reboot, min,max,unit,type,{dtunable }
where:
-L [tunable] Lists the characteristics of one or all tunables, one per line, using the following
format:
where:
CUR =
current value
DEF =
default value
BOOT =
reboot value
MIN =
minimal value
MAX =
maximum value
UNIT =
tunable unit of measure
TYPE =
parameter type: D (for Dynamic),S (for Static),
R (for Reboot),B (for Bosboot),
M (for Mount), I (for Incremental),
C (for Connect), and d (for Deprecated)
DEPENDENCIES = list of dependent tunable parameters,
one per line
Any change (with -o, -d or -D) to a restricted tunable parameter will result in a message being displayed
to warn the user that a tunable of the restricted use type has been modified and, if the -r or -p options are
also specified on the command line, the user will be prompted for confirmation of the change. In addition,
at system reboot, the presence of restricted tunables modified to a value different from their default using
a command line specifying the -r or -p options will cause the addition of an error log entry identifying the
list of these modified tunables.
Any change (with -o, -d or -D flags) to a parameter of type Mount will result in a message displays to warn
the user that the change is only effective for future mountings.
Any change (with -o, -d or -D flags) to a parameter of type Connect will result in the inetd daemon
being restarted, and a message will display to warn the user that the change is only effective for socket
connections.
Any attempt to change (with -o, -d or -D flags ) a parameter of type Bosboot or Reboot without -r, will
result in an error message.
tunchange Command
The tunchange command is used to update one or more tunable stanzas in a file.
The following is the syntax for the tunchange command:
The following is an example of how to unconditionally update the pacefork parameter in the /etc/
tunables/nextboot directory. This should be done with caution because no warning will be printed if a
parameter of type bosboot was changed.
The following is an example of how to clear the schedo stanza in the nextboot file.
The following is an example of how to merge the /home/admin/schedo_conf file with the current
nextboot file. If the file to merge contains multiple entries for a parameter, only the first entry will be
applied. If both files contain an entry for the same tunable, the entry from the file to merge will replace
the current nextboot file's value.
The tunchange command is called by the tuning commands to implement the -p and -r flags using -f
nextboot.
tuncheck Command
The tuncheck command is used to validate a file.
The syntax of the tuncheck command is as follows:
The following example validates the /etc/tunables/mytunable file for usage on current values.
tuncheck -f mytunable
tuncheck -r -f nextboot
tuncheck -r -f /home/bill/my_nextboot
tuncheck -r -K -f nextliveupdate
tuncheck -r -K -f /home/bill/my_nextliveupdate
All tunable parameters in the specified nextliveupdate or my_nextliveupdate file are checked for
range, and dependencies, and if a problem is detected, a message similar to: Parameter X is out
of range or Dependency problem between parameter A and B is issued. The -r and -p options
control the values that are used in dependency checking for parameters that are not listed in the file and
the handling of proposed changes to parameters of type Incremental, Bosboot, and Reboot. When
you use the -K flag with the -r flag, the tuncheck command validates the file that contains the tunable
parameter values. The validated file is used during the next boot and Live Update operations.
Except when used with the -r option, checking is performed on parameter of type Incremental to make
sure that the value in the file is not less than the current value. If one or more parameters of type Bosboot
are listed in the file with a different value than its current value, the user will either be prompted to run
bosboot (when -r is used) or an error message will display.
Parameters having dependencies are checked for compatible values. When one or more parameters in a
set of interdependent parameters is not listed in the file being checked, their values are assumed to either
be set at their current value (when the tuncheck command is called without -p or -r), or their default
value. This is because when called without -r, the file is validated to be applicable on the current values,
while with -r, it is validated to be used during reboot when parameters not listed in the file will be left at
their default value. Calling this command with -p is the same as calling it twice; once with no argument,
and once with the -r flag. This checks whether a file can be used both immediately, and at reboot time.
Note: Users creating a file with an editor, or copying a file from another machine, must run the tuncheck
command to validate their file.
tunrestore Command
The tunrestore command is used to restore all the parameters from a file.
The following is the syntax for the tunrestore command:
The following example changes the current values for all tunable parameters present in the file if ranges,
dependencies, and incremental parameter rules are all satisfied.
tunrestore -f mytunable
tunrestore -f /etc/tunables/mytunable
tunrestore -r -f mytunable
tunrestore -r -K -f mytunable
If changes to parameters of type bosboot are detected, the user will be prompted to run the bosboot
command.
The following command can only be called from the /etc/inittab file and changes tunable parameters to
values from the /etc/tunables/nextboot file.
tunrestore -R
Any problem found or change made is logged in the /etc/tunables/lastboot.log file. A new /etc/
tunables/lastboot file is always created with the list of current values for all parameters. Any change
to restricted tunable parameters from their default values will cause the addition of an error log entry
identifying the list of these modified tunable parameters.
If filename does not exist, an error message displays. If the nextboot file does not exist, an error message
displays when you use the -r flag. If the nextliveupdate file does not exist, an error message is
displayed when you use the -K flag. If you use the -R flag, all the tuning parameters of a type other
than bosboot will be set to their default value, and a nextboot file containing only an info stanza will be
created. A warning will also be logged in the lastboot.log file.
Except when -r is used, parameters requiring a call to bosboot and a reboot are not changed, but an error
message is displayed to indicate they could not be changed. When -r is used, if any parameter of type
bosboot needs to be changed, the user will be prompted to run bosboot. Parameters missing from the
file are simply left unchanged, except when -R is used, in which case missing parameters are set to their
default values. If the file contains multiple entries for a parameter, only the first entry will be applied, and
a warning will be displayed or logged (if called with -R).
tunsave Command
The tunsave command is used to save current tunable parameter values into a file.
The following is the syntax for the tunsave command:
For example, the following saves all of the current tunable parameter values that are different from their
default into the /etc/tunables/mytunable file.
tunsave -f mytunable
If the file already exists, an error message is printed instead. The -F flag must be used to overwrite an
existing file.
For example, the following saves all of the current tunable parameter values different from their default
into the /etc/tunables/nextboot file.
tunsave -f nextboot
If necessary, the tunsave command will prompt the user to run bosboot.
For example, the following saves all of the current tunable parameters values (including parameters for
which default is their value) into the mytunable file.
tunsave -A -f mytunable
tunsave -a -f mytunable
tunsave -a -f ./mytunable
For the parameters that are set to default values, a line using the keyword DEFAULT will be put in the
file. This essentially saves only the current changed values, while forcing all the other parameters to their
default values. This permits you to return to a known setup later using the tunrestore command.
tundefault Command
The tundefault command is used to force all tuning parameters to be reset to their default value. The
-p flag makes changes permanent, while the -r flag defers changes until the next reboot operation. The
-K flag defers changes until the next Live Update operation.
The following is the syntax for the tundefault command:
The following example resets all tunable parameters to their default value, except the parameters of type
Bosboot and Reboot, and parameters of type Incremental set at values bigger than their default value.
tundefault
Error messages will be displayed for any parameter change that is not permitted.
The following example resets all the tunable parameters to their default value. It also updates the /etc/
tunables/nextboot file, and if necessary, offers to run bosboot, and displays a message warning that
rebooting is needed for all the changes to be effective.
tundefault -p
This command permanently resets all tunable parameters to their default values, returning the system to
a consistent state and making sure the state is preserved after the next reboot.
The following example clears all the stanzas in the /etc/tunables/nextboot file, and proposes bosboot if
necessary.
tundefault -r
When you use the -K flag with the -r flag as shown in the following example, the tundefault
command resets all tunable parameters in the specified nextboot and nextliveupdate files to its
default value. The -r flag clears the stanzas in the /etc/tunables/nextboot file, and prompts for
a bosboot operation if necessary. The -K flag sets the tunable parameters in the /etc/tunables/
nextliveupdate file to the default values if a non-default value was set.
tundefault -r -K
This entry sets the reboot value of all tunable parameters to their default. For more information about
migration from a previous version of AIX and the compatibility mode automatically setup in case of
migration, see the Files Reference guide.
Recovery Procedure
If the machine becomes unstable with a given nextboot file, users should put the system into
maintenance mode, make sure the sys0 pre520tune attribute is set to disable, delete the nextboot
Select Save/Restore All Kernel & Network Parameters to manipulate all tuning parameter values at the
same time. To individually change tuning parameters managed by one of the tuning commands, select any
of the other lines.
Each of the options in this panel are explained in the following sections.
View Last Boot Parameters
All last boot parameters are listed stanza by stanza, retrieved from the /etc/tunables/lastboot file.
View Last Boot Log File
Displays the content of the file /etc/tunables/lastboot.log.
Save All Current Parameters for Next Boot
After selecting yes and pressing ENTER, all the current tuning parameter values are saved in the /etc/
tunables/nextboot file. Bosboot will be offered if necessary.
File name []
Description []
• File name: F4 displays the list of existing files. This list contains all the files in the /etc/tunables
directory except the special files nextboot, lastboot, lastboot.log, nextliveupdate, and
any special file that has the file name ending with .lvup. You must not specify any of these
reserved names for the files.
• Description: This field is written in the info stanza of the selected file.
After pressing ENTER, all of the current tuning parameter values will be saved in the selected stanza
file of the /etc/tunables directory.
Restore All Current Parameters from Last Boot Values
After selecting yes and pressing ENTER, all the tuning parameters will be set to values from the /etc/
tunables/lastboot file. Error messages will be displayed if any parameter of type Bosboot or Reboot
would need to be changed, which can only be done when changing reboot values.
Restore All Current Parameters from Saved Values
A select menu shows existing files in the /etc/tunables directory, except the special files
nextboot, lastboot, lastboot.log, nextliveupdate, and any special file that has the file
name ending with .lvup. After you press ENTER, the parameters present in the selected file in
the /etc/tunables directory will be set to the value listed if possible. Error messages will be
displayed if any parameter of type Bosboot or Reboot would need to be changed, which can't be done
on the current values. Error messages will also be displayed for any parameter of type Incremental
when the value in the file is smaller than the current value, and for out of range and incompatible
values present in the file. All possible changes will be made.
Reset All Current Parameters To Default Value
After pressing ENTER, each tunable parameter will be reset to its default value. Parameters of type
Bosboot and Reboot, are never changed, but error messages are displayed if they should have been
changed to get back to their default values.
Save All Next Boot Parameters
File name []
Restore All Next Boot Kernel Tuning Parameters from Last Boot Values
After selecting yes and pressing ENTER, all values from the lastboot file will be copied to the
nextboot file. If necessary, the user will be prompted to run bosboot, and warned that for all the
changes to be effective, the machine must be rebooted.
Restore All Next Boot Parameters from Saved Values
Restore All Next Boot Kernel Tuning Parameters from Saved Values
A select menu shows existing files in the /etc/tunables directory, except the special files nextboot,
lastboot, lastboot.log, nextliveupdate, and any special file that has the file name ending
with .lvup. After you select a file and press ENTER, all values from the selected file will be copied
to the nextboot file, if the file was successfully tunchecked first. If necessary, the user will be
prompted to run bosboot, and warned that for all the changes to be effective, rebooting the machine
is necessary.
Reset All Next Boot Parameters To Default Value
After hitting ENTER, the /etc/tunables/nextboot file will be cleared. If necessary bosboot will be
proposed and a message indicating that a reboot is needed will be displayed.
Save All Next Boot and Next Live Update Parameters
File name []
Type or select values for the file name field. When you press F4, you can view a list of existing files
in the /etc/tunables directory except the special files nextboot, lastboot, lastboot.log,
nextliveupdate, and any special file that has the file name ending with .lvup. You must not
specify any of these reserved names for the files. After you press ENTER, the nextboot file is copied
to the specified /etc/tunables directory with the .lvup extension if the tuncheck command ran
successfully on the file.
Restore All Next Boot and Next Live Update Parameters from Saved Values
A select menu shows existing files in the /etc/tunables directory, except the special files
nextboot, lastboot, lastboot.log, nextliveupdate, and any special file that has the file
name ending with .lvup. You are prompted to select two files: nextboot and nextliveupdate
files. After you select the files and press ENTER, all values from the selected file for next boot are
copied to the nextboot file, and all values from the selected file for next Live Update operation
are copied to the nextliveupdate file, if the tuncheck command ran successfully on the files.
If necessary, you are prompted to run the bosboot operation. You are also warned that for all the
changes to be effective, you must reboot the system.
Each of the sub-panels behavior is explained in the following sections using examples of the scheduler
and memory load control sub-panels:
1. List All Characteristics of Tuning Parameters The output of schedo -L is displayed.
2. Change/Show Current Scheduler and Memory Load Control Parameters
[Entry Field]
affinity_lim [7]
idle_migration_barrier [4]
fixed_pri_global [0]
maxspin [1]
pacefork [10]
sched_D [16]
sched_R [16]
timeslice [1]
%usDelta [100]
v_exempt_secs [2]
v_min_process [2]
v_repage_hi [2]
v_repage_proc [6]
v_sec_wait [4]
This panel is initialized with the current schedo values (output from the schedo -a command).
Any parameter of type Bosboot, Reboot or Static is displayed with no surrounding square bracket
indicating that it cannot be changed. From the F4 list, type or select values for the entry fields
corresponding to parameters to be changed. Clearing a value results in resetting the parameter to
its default value. The F4 list also shows minimum, maximum, and default values, the unit of the
parameter and its type. Selecting F1 displays the help associated with the selected parameter. The
text displayed will be identical to what is displayed by the tuning commands when called with the
-h option. Press ENTER after making all the required changes. Doing so will launch the schedo
command to make the changes. Any error message generated by the command, for values out of
range, incompatible values, or lower values for parameter of type Incremental, will be displayed to the
user.
3. The following is an example of the Change / Show Scheduler and Memory Load Control Parameters for
next boot panel.
[Entry Field]
affinity_lim [7]
idle_migration_barrier [4]
fixed_pri_global [0]
maxpin [1]
pacefork [10]
sched_D [16]
sched_R [16]
timeslice [1]
%usDelta [100]
v_exempt_secs [2]
v_min_process [2]
v_repage_hi [2]
v_repage_proc [6]
v_sec_wait [4]
This panel is similar to the previous panel, in that, any parameter value can be changed except for
parameters of type Static. It is initialized with the values listed in the /etc/tunables/nextboot file,
completed with default values for the parameter not listed in the file. Type or select (from the F4 list)
values for the entry field corresponding to the parameters to be changed. Clearing a value results in
resetting the parameter to its default value. The F4 list also shows minimum, maximum, and default
values, the unit of the parameter and its type. Pressing F1 displays the help associated with the
selected parameter. The text displayed will be identical to what is displayed by the tuning commands
when called with the -h option. Press ENTER after making all desired changes. Doing so will result in
the/etc/tunables/nextboot file being updated with the values modified in the panel, except for out of
range, and incompatible values for which an error message will be displayed instead. If necessary, the
user will be prompted to run bosboot.
4. The following is an example of the Save Current Scheduler and Memory Load Control Parameters for
Next Boot panel.
Save Current Scheduler and Memory Load Control Parameters for Next Boot
After pressing ENTER on this panel, all the current schedo parameter values will be saved in the /etc/
tunables/nextboot file . If any parameter of type Bosboot needs to be changed, the user will be
prompted to run bosboot.
5. The following is an example of the Reset Current Scheduler and Memory Load Control Parameters to
Default Values
Reset Current Scheduler and Memory Load Control Parameters to Default Value
After selecting yes and pressing ENTER on this panel, all the tuning parameters managed by the
schedo command will be reset to their default value. If any parameter of type Incremental, Bosboot
or Reboot should have been changed, and error message will be displayed instead.
6. The following is an example of the Reset Scheduler and Memory Load Control Next Boot Parameters
To Default Values
After pressing ENTER, the schedo stanza in the /etc/tunables/nextboot file will be cleared. This will
defer changes until next reboot. If necessary, bosboot will be proposed.
Item Descriptor
PPID Parent process identifier
NICE Nice value for the process
PRI Priority of the process
DRSS Data resident set size
TRSS Text resident set size
STARTTIME Time when the command started
EUID Effective user identifier
RUID Real user identifier
EGID Effective group identifier
RGID Real group identifier
THCOUNT Number of threads used
CLASSID Identifier of the class which pertains to the WLM process
CLASSNAME Name of the class which pertains to the WLM process
TOTDISKIO Disk I/O for that process
NVCSW N voluntary context switches
NIVCSW N involuntary context switches
MINFLT Minor page faults
MAJFLT Major page faults
INBLK Input blocks
OUBLK Output blocks
You can use either the table properties or preference to display the metrics you are interested in. If you
choose to change the table properties, the new configuration values are set for the current session only.
If you change the preferences, the new configuration values are set for the next session of the procmon
tool.
There are two types of values listed in the process table:
• Real values
• Delta values
Real values are retrieved from the kernel and displayed in the process table. An example of a real value is
the PID, PPID, or TTY.
Delta values are values that are computed from the last-stored measurements. An example of a delta
value is the CPU percent for each process, which is computed using the values measured between
refreshes.
Below the process table, there is another table that displays the sum of the values for each column of the
process table. For example, this table might provide a good idea of the percentage of total CPU used by
the top 20 CPU-consuming processes.
You can refresh the data by either clicking on the Refresh button in the menu bar or by activating the
automatic refresh option through the menu bar. To save the statistics information, you can export the
table to any of the following file formats:
• XML
• HTML
• CSV
Item Descriptor
Name WPAR name
Hostname WPAR hostname
Type WPAR type, either System or Application
State WPAR state–this can have one of the following values:
Active, Defined, Transitional, Broken, Paused, Loaded, Error
Directory WPAR root directory
Nb. virtual PIDs Number of virtual PIDs running in this WPAR
Profiling tools
You can use profiling tools to identify which portions of the program are executed most frequently or
where most of the time is spent.
Profiling tools are typically used after a basic tool, such as the vmstat or iostat commands, shows that a
CPU bottleneck is causing a performance problem.
Before you begin locating hot spots in your program, you need a fully functional program and realistic data
values.
real 0m26.72s
user 0m26.53s
sys 0m0.03s
real 26.70
user 26.55
sys 0.02
In this example, we see many calls to the mod8() and mod9() routines. As a starting point, examine the
source code to see why they are used so much. Another starting point could be to investigate why a
routine requires so much time.
Note: If the program you want to monitor uses a fork() system call, be aware that the parent and the child
create the same file (mon.out). To avoid this problem, change the current directory of the child process.
called/total parents
index %time self descendents called+self name index
called/total children
-----------------------------------------------
<spontaneous>
[2] 64.6 0.00 40.62 .__start [2]
19.44 21.18 1/1 .main [1]
0.00 0.00 1/1 .exit [37]
-----------------------------------------------
Usually the call graph report begins with a description of each column of the report, but it has been
deleted in this example. The column headings vary according to type of function (current, parent of
current, or child of current function). The current function is indicated by an index in brackets at the
beginning of the line. Functions are listed in decreasing order of CPU time used.
To read this report, look at the first index [1] in the left-hand column. The .main function is the current
function. It was started by .__start (the parent function is on top of the current function), and it, in turn,
calls .mod8 and .mod9 (the child functions are beneath the current function). All the accumulated time
of .main is propagated to .__start. The self and descendents columns of the children of the current
function add up to the descendents entry for the current function. The current function can have more
than one parent. Execution time is allocated to the parent functions based on the number of times they
are called.
Flat profile
The flat profile sample is the second part of the cwhet.gprof file.
The following is an example of the cwhet.gprof file:
granularity: each sample hit covers 4 byte(s) Total time: 62.85 seconds
Note: If the program you want to monitor uses a fork() system call, be aware that by default, the parent
and the child create the same file, gmon.out. To avoid this problem, use the GPROF environment variable.
You can also use the GPROF environment variable to profile multi-threaded applications.
Event-based profiling
Event-based profiling is triggered by any one of the software-based events or any Performance Monitor
event that occurs on the processor.
The primary advantages of event-based profiling over time-based profiling are the following:
• The routine addresses are visible when interrupts are disabled.
• The ability to vary the profiling event
• The ability to vary the sampling frequency
With event-based profiling, ticks that occur while interrupts are disabled are charged to the proper
routines. Also, you can select the profiling event and sampling frequency. The profiling event determines
the trigger for the interrupt and the sampling frequency determines how often the interrupt occurs. After
the specified number of occurrences of the profiling event, an interrupt is generated and the executing
instruction is recorded.
The default type of profiling event is processor cycles. The following are various types of software-based
events:
• Emulation interrupts (EMULATION)
• Alignment interrupts (ALIGNMENT)
• Instruction Segment Lookaside Buffer misses (ISLBMISS)
• Data Segment Lookaside Buffer misses (DSLBMISS)
The sampling frequency for the software-based events is specified in milliseconds and the supported
range is 1 to 500 milliseconds. The default sampling frequency is 10 milliseconds.
The following command generates an interrupt every 5 milliseconds and retrieves the record for the last
emulation interrupt:
# tprof -E EMULATION -f 5
The following command generates an interrupt every 100 milliseconds and records the contents of the
Sampled Instruction Address Register, or SIAR:
# tprof -E -f 100
Event-based profiling uses the SIAR, which contains the address of an instruction close to the executing
instruction. For example, if the profiling event is PM_FPU0_FIN, which means the floating point unit 0
The translation miss score differs from the actual translation miss rate because it is based on sampled
references. Sampling has the effect of reducing the denominator (Number of translation buffer accesses)
in the above equation faster than the numerator (Number of translation misses). As a result, the
translation miss score tends to overestimate the actual translation miss rate at increasing sampling rates.
Thus, the translation score should be interpreted as a relative measure for comparing the effectiveness of
different projections rather than as a predictor of actual translation miss rates.
The translation miss score is directly affected by larger page sizes: growing the page size reduces the
translation miss score. The performance projection report includes both a cold translation miss score
(such as compulsory misses) and a total translation miss score (such as compulsory and capacity misses).
The cold translation miss score provides a useful lower bound; if growing the page size has reduced the
translation miss score to the cold translation miss score, then all capacity translation misses have been
eliminated and further increases in page size can only have negligible additional benefits.
The performance projection for a process would appear similar to the following:
Data profiling
The tprof –b command turns on basic data profiling and collects data access information.
The summary section reports access information across kernel data, library data, user global data, and
stackheap sections for each process, as shown in the following example:
Table 8. An example of the data profiling report for the /usr/bin/dd process.
Process PID TID Total Kernel User Shared Other
tlbref 327688 757943 60.49 0/07 59.71 0.38 0.00
Kernel: 0.04%
lib: 0.00%
u_global: 0.00%
When used with the-s, -u, -k and -e flags, the tprof command's data profiling reports most-used data
structures (exported data symbols) in shared library, binary, kernel and kernel extensions. The -B flag also
reports the functions that use data structures.
The second table shown is an example of the data profiling report for the /usr/bin/dd process.. The
example report shows that __start data structure is the most used data structure in the /usr/bin/dd
process, based on the samples collected. The data structure is a list of functions (right aligned) that use
the data structure, reported along with their share and source as shown in the following example:
Subroutine % Source
.noconv 11.29 /usr/bin/dd
.main 0.14 /usr/bin/dd
.read 0.07 glink.s
.setobuf 0.05 /usr/bin/dd
.rpipe 0.04 /usr/bin/dd
.flsh 0.04 /usr/bin/dd
.write 0.04 glink.s
.wbuf 0.02 /usr/bin/dd
.rbuf 0.02 /usr/bin/dd
Data % Source
__start 7.80 /usr/bin/dd
.noconv 6.59 /usr/bin/dd
The version1.prof file reports how many CPU ticks for each of the programs that were running on the
system while the version1 program was running.
The following is an example of what the version1.prof file contains:
Profile: ./version1
Total Ticks For All Processes (./version1) = 1637
Profile: ./version1
Total Ticks For ./version1[245974] (./version1) = 1637
The first section of the report summarizes the results by program, regardless of the process ID, or PID. It
shows the number of different processes, or Freq, that ran each program at some point.
The second section of the report displays the number of ticks consumed by, or on behalf of, each process.
In the example, the version1 program used 1637 ticks itself and 35 ticks occurred in the kernel on behalf
of the version1 process.
The third section breaks down the user ticks associated with the executable program being profiled. It
reports the number of ticks used by each function in the executable program and the percentage of the
total run's CPU ticks (7504) that each function's ticks represent. Since the system's CPUs were mostly
idle, most of the 7504 ticks are idle ticks.
To see what percentage of the busy time this program took, subtract the wait thread's CPU ticks, which
are the idle CPU ticks, from the total and then divide the difference from the total number of ticks.
Total number of ticks / (Total - Idle CPU ticks) = % busy time of program
1637 / (7504 - 5810) =
1637 / 1694 = 0.97
The example above creates a trace1.prof file, which gives you a CPU profile of the system while the trace
command was running.
Security
Any user of the machine can run the svmon command. It uses two different mechanisms to allow two
different views for a non-root user.
The following will create the views:
• When RBAC authorization is used, the user will have the same view as the root user if their role is
defined with aix.system.stat authorization.
• When RBAC is not used or when the user does not have the aix.system.stat authorization, the user's
reports are limited to its environment or processes.
You can view the complete details of the RBAC in Files Reference.
# cat .svmonrc
summary=basic
segment=category
pgsz=on
Note:
• When an option is not recognized in the file, it is ignored.
• When an option is defined more than once, only the last value will be used.
Item Descriptor
Inuse Number of frames containing pages (expressed in <unit>) used by the report entities.
Pin Number of frames containing pinned pages (expressed in <unit>) used by the report entities.
Pin Number of pages (expressed in <unit>) allocated in the paging space by the report entities.
Virtual Number of pages (expressed in <unit>) allocated in the virtual space by the report entities.
# svmon -G -i 5 3
Example:
# svmon -G -O commandline=on
• -O unit=[auto,page,KB,MB,GB]: this option is set to page by default. In this case, the reported metrics
for each segment are in the segment page size:
– s are 4 KB pages
– m are 64 KB pages
– L are 16 MB pages
– S are 16 GB pages
When auto,KB, MB, or GB are used, only the 3 most significant digits are displayed. You should be
careful when interpreting the results with a unit other than page. When the auto setting is selected, the
abbreviated units are specified immediately after each metric (K for kilobytes, M for megabytes, or G for
gigabytes).
Examples:
This is the same report using different unit options:
# svmon -G -O unit=page
Unit: page
==============================================================================
size inuse free pin virtual available
memory 1048576 220617 827959 113371 194382 819969
pg space 131072 1280
# svmon -G -O unit=GB
Unit: GB
==============================================================================
size inuse free pin virtual available
memory 4.00 0.84 3.16 0.43 0.74 3.13
pg space 0.50 0
# svmon -G -O unit=auto
Unit: auto
==============================================================================
size inuse free pin virtual available
memory 4.00G 860.78M 3.16G 442.86M 758.29M 3.13G
pg space 512.00M 5.00M
# svmon -P 1 -O range=on
Unit: page
-------------------------------------------------------------------------------
Pid Command Inuse Pin Pgsp Virtual
1 init 16874 8052 0 16858
• -O pidlist=on and -O pidlist=number: adds either the list of PIDs of processes or the number of
processes using this segment. It also adds either the user name or the command name corresponding
to each PID. When the -@ flag is added, the WPAR name is also added.
Example:
• -O filename=on: Each persistent segment's complete, corresponding file name is shown. Note that
because files can be deeply nested, running the svmon command with this flag, or with the -S and -i
flags, can take significantly more time.
Example:
• -O mapping=on: adds information about the source segment and the mapping segment when a
segment is used to map another segment. If this option is used, source segments not belonging to
the process address space are listed in the report and marked with an asterisk (*). Note that they are
also taken into account in the process-level summary's number calculations.
Example:
In these examples, the mapping option adds or removes the mapping source segments which are not
in the address space of the process number 266414. There is a difference of four pages (three pages
from segment 191338, and one page from segment 131332) in the Inuse consumption between -O
mapping=off and -O mapping=on.
• -O sortseg=[inuse | pin | pgsp | virtual]: by default, , all segments are sorted in decreasing order
of real memory usage (the Inuse metric) for each entity (user, process, command, segment). Sorting
options for the report include the following:
– Inuse: real memory used
– Pin: pinned memory used
– Pgsp: paging space memory used
– Virtual: virtual memory used
Examples:
# svmon -P 1 -O unit=KB,segment=on
Unit: KB
-------------------------------------------------------------------------------
Pid Command Inuse Pin Pgsp Virtual
1 init 67752 32400 0 67688
# svmon -P 1 -O unit=KB,segment=on,sortseg=pin
Unit: KB
-------------------------------------------------------------------------------
Pid Command Inuse Pin Pgsp Virtual
1 init 67752 32400 0 67688
• -O mpss=[on | off]: breaks down the metrics for multiple page size segments, by page size.
Examples:
# svmon -P 1 -O segment=on,mpss=on
Unit: page
-------------------------------------------------------------------------------
Pid Command Inuse Pin Pgsp Virtual
1 init 14557 5492 0 14541
sm pages are separated into s and m pages. The metrics reported are in the unit of the page size: s
pages are 4 KB and m pages are 64 KB.
• -O shmid=[on | off]: displays shared memory IDs associated with shared memory segments. This
option does not work you run it in inside a WPAR.
Examples:
Unit: page
-------------------------------------------------------------------------------
Pid Command Inuse Pin Pgsp Virtual
221326 java 20619 6326 9612 27584
Additional -O options
Review the additional -O options for the svmon command.
The following additional options are:
• -O process=on: adds, for a given entity, the memory statistics of the processes belonging to the entity
(user name or command name). If you specify the -@ flag, each process report is followed by a line that
shows the WPAR name. This option is only valid for the User and the Command reports.
All reports containing two or more entities can be filtered and/or sorted with the following options:
• -O filtercat=[off | exclusive | kernel | shared | unused | unattached]: this option filters the output by
segment category. You can specify more than one filter at a time.
Note: Use the unattached filter value with the -S report because unattached segments cannot be
owned by a process or command.
Examples:
# svmon -P 1 -O unit=KB,segment=on,sortseg=pin,filtercat=off
Unit: KB
-------------------------------------------------------------------------------
Pid Command Inuse Pin Pgsp Virtual
1 init 58684 28348 0 58616
# svmon -P 1 -O unit=KB,segment=on,sortseg=pin,filtercat=shared
Unit: KB
-------------------------------------------------------------------------------
Pid Command Inuse Pin Pgsp Virtual
1 init 58684 28348 0 58616
• -O filtertype=[off | working | persistent | client]: this option allows you to filter on the Type column of
the segment details. You can specify more than one filter at a time.
Examples:
Unit: page
===============================================================================
Command Inuse Pin Pgsp Virtual
yes 16256 6300 80 16271
...............................................................................
SYSTEM segments Inuse Pin Pgsp Virtual
7088 6288 64 7104
...............................................................................
EXCLUSIVE segments Inuse Pin Pgsp Virtual
112 12 0 111
...............................................................................
SHARED segments Inuse Pin Pgsp Virtual
9056 0 16 9056
Unit: page
===============================================================================
Command Inuse Pin Pgsp Virtual
yes 1 0 0 0
...............................................................................
EXCLUSIVE segments Inuse Pin Pgsp Virtual
1 0 0 0
Unit: page
===============================================================================
Command Inuse Pin Pgsp Virtual
yes 16255 6300 80 16271
...............................................................................
SYSTEM segments Inuse Pin Pgsp Virtual
7088 6288 64 7104
...............................................................................
EXCLUSIVE segments Inuse Pin Pgsp Virtual
111 12 0 111
...............................................................................
SHARED segments Inuse Pin Pgsp Virtual
9056 0 16 9056
• -O filterpgsz=[off | s m | L | S]: this option filters the segment based on their page size. Multiple
page size segments can be selected using multiple code letters in the form <min_size><max_size>: -O
filterpgsz="sm s" filters the small page segments and the multiple page size segments with small and
medium pages.
For the -P report however, the behavior is slightly different. Indeed, the report contains all the
processes having at least one page of the size specified with the -O filterpgsz option, and for these
processes, svmon displays all their segments (whatever their page size).
Examples:
# svmon -P -O segment=on,filterpgsz=L
Unit: page
-------------------------------------------------------------------------------
Pid Command Inuse Pin Pgsp Virtual
270450 ptxtst_shm_al 21674 17136 0 21658
Unit: page
===============================================================================
User Inuse Pin Pgsp Virtual
root 12288 12288 0 12288
The previous two examples illustrate the difference of behavior with -P. In these examples, for the given
entity, only the pages of the given size are kept in the report.
Reports details
Review the output for the svmon command reports.
To display compact report of memory expansion information (in a system with Active Memory Expansion
enabled), enter:
# svmon -G -O summary=longame
Unit: page
------------------------------------------------------------------------------------------------
-----
Active Memory Expansion
------------------------------------------------------------------------------------------------
-----
Size Inuse Free DXMSz UCMInuse CMInuse TMSz TMFr
CPSz
262144 152625 43055 67640 98217 54408 131072 6787
26068
Global report
To print the Global report, specify the -G flag. The Global report displays a system-wide detailed real
memory view of the machine. This report contains various summaries, only the memory and inuse
summaries are always displayed.
When the -O summary option is not used, or when it is set to -O summary=basic, the column headings
used in global reports summaries are:
memory
Specifies statistics describing the use of memory, including:
size
Number of frames (size of real memory)
Tip: This does not include the free frames that have been made unusable by the memory sizing
tool, the rmss command.
CurSz
Current size of the uncompressed pool.
%Cur
Percentage of true memory used by the uncompressed pool
TgtSz
Target size of the uncompressed pool needed to achieve the target memory expansion factor.
% Tgt
Percentage of true memory that will be used by the uncompressed pool when the target memory
expansion factor is achieved.
comprsd
Displays detailed information about the compressed pool, including:
CurSz
Current size of the compressed pool
%Cur
Percentage of true memory used by the compressed pool.
TgtSz
Target size of the compressed pool needed to achieve the target memory expansion factor.
% Tgt
Percentage of true memory that will be used by the compressed pool when the target memory
expansion factor is achieved
% Max
Percentage of true memory that will be used by the compressed pool when the compressed pool
achieves maximum size.
CRatio
Compression ratio
AME
Displays the following information
txf
Target Memory Expansion Factor
cxf
Current Memory Expansion Factor
dxf
Deficit factor to reach the target expansion factor
dxm
Deficit memory to reach the target expansion
# svmon -O summary=basic,unit=auto,pgsz=on
or
# svmon -G -O unit=auto,pgsz=on
Unit: auto
-------------------------------------------------------------------------------
size inuse free pin virtual available
memory 31.0G 2.85G 28.1G 1.65G 2.65G 27.3G
pg space 512.00M 13.4M
The memory size of the system is 31GB. This size is split into the in-used frames for 2.85 GB and into
the free frames for 28.1 GB. 1.65 GB are pinned in memory, 2.65 GB are allocated in the system virtual
space and 27.3 GB are available to be used as computational data by new processes.
The inuse and pin values include the pages reserved for the 16 MB page memory pool (80 MB).
The size of the paging space is 512 MB, where 13.4 MB are used.
The pinned frames (1.65 GB) is composed of working segment pinned pages (688.57 MB) and 924.95
MB of other pin pages (can be used by the kernel for example), not counting the memory not used but
pinned by the 16 MB page pool.
The number of frames containing pages (2.85 GB) is composed of working segment pages (2.65 GB)
and client segment pages (124.55 MB), not counting the memory that is only reserved but counted
inuse from the 16 MB pool.
# svmon -G -O unit=MB,pgsz=on,affinity=on
Unit: MB
-------------------------------------------------------------------------------
size inuse free pin virtual available
memory 31744.00 3055.36 28688.64 1838.84 2859.78 27911.33
pg space 512.00 14.7
In this example taken on a dedicated LPAR partition, we added the domain affinity metrics. The 31744
MB of memory are split into 2 memory affinity domain:
– The domain 0 contains 15606.18 MB of memory with 1475.13 MB used, and 14131.05 MB free.
– The domain 1 contains 16128 MB of memory with 1538.35 MB used and 14589.65 MB free.
• To display detailed affinity domain information, enter:
# svmon -G -O unit=MB,pgsz=on,affinity=detail
Unit: MB
-------------------------------------------------------------------------------
size inuse free pin virtual available
memory 31744.00 3055.70 28688.30 1838.91 2860.11 27910.99
pg space 512.00 14.7
In this example, we can see that the breakdown by affinity domain is also shown in the per-page size
report. This option takes some time to execute.
• On a shared partition, attempting to display affinity domain information, results in:
# svmon -G -O unit=MB,pgsz=on,affinity=on
Unit: MB
-------------------------------------------------------------------------------
# svmon -O summary=longreal
Unit: page
------------------------------------------------------------------------
Memory
-----------------------------------------------------------------------
Size Inuse Free Pin Virtual Available Pgsp
262144 187219 74925 82515 149067 101251 131072
The metrics reported here are identical to the metrics in the basic format. There is a memory size of
262144 frames with 187219 frames inuse and 74925 remaining frames. 149067 pages are allocated in
the virtual memory and 101251 frames are available.
• To display global memory statistics in MB units at interval, enter:
# svmon -G -O unit=MB,summary=shortreal -i 60 5
Unit: MB
-------------------------------------------------------------------------------
Size Inuse Free Pin Virtual Available Pgsp
1024.00 709.69 314.31 320.89 590.74 387.95 512.00
1024.00 711.55 312.39 320.94 592.60 386.02 512.00
1024.00 749.10 274.89 322.89 630.15 348.53 512.00
1024.00 728.08 295.93 324.57 609.11 369.57 512.00
1024.00 716.79 307.21 325.66 597.50 381.16 512.00
This example shows how to monitor the whole system by taking a memory snapshot every 60 seconds
for 5 minutes.
• To display detailed memory expansion information (in a system with Active Memory Expansion
enabled), enter:
# svmon -G -O summary=ame
Unit: page
--------------------------------------------------------------------------------------
size inuse free pin virtual available mmode
memory 262144 152619 43061 73733 154779 41340 Ded-E
ucomprsd - 98216 -
comprsd - 54403 -
pg space 131072 1212
• To display memory expansion information with true memory snapshot turned-off (in a system with
Active Memory Expansion enabled), enter:
# svmon -G -O summary=ame,tmem=off
Unit: page
--------------------------------------------------------------------------------------
size inuse free pin virtual available mmode
memory 262144 152619 43061 73733 154779 41340 Ded-E
ucomprsd - 98216 -
comprsd - 54403 -
pg space 131072 1212
User report
The User report displays the memory usage statistics for all specified login name or when no argument is
specified for all users.
To print the user report, specify the -U flag. This report contains all the columns detailed in the common
summary metrics as well as its own defined here:
User
Indicates the user name
If processes owned by this user use pages of a size other than the base 4 KB page size, and the -O
pgsz=on option is set, these statistics are followed by breakdown statistics for each page size. The
metrics reported in this per-page size summary are reported in the page size unit by default.
Note:
• If you specify the -@ flag without an argument, these statistics will be followed by the users
assignments to WPARs. This information is shown with an additional WPAR column displaying the WPAR
name where the user was found.
• If you specify the -O activeusers=on option, users which do not use memory (Inuse memory is 0 page)
are not shown in the report.
Examples
1. To display per user memory consumption statistics, enter:
# svmon -U
Unit: page
===============================================================================
User Inuse Pin Pgsp Virtual
root 56007 16070 0 54032
daemon 14864 7093 0 14848
guest 14705 7087 0 14632
bin 0 0 0 0
sys 0 0 0 0
adm 0 0 0 0
uucp 0 0 0 0
nobody 0 0 0 0
This command gives a summary of all the users using memory on the system. This report uses the
default sorting key: the Inuse column. Since no -O option was specified, the default unit (page) is used.
Each page is 4 KB.
The Inuse column, which is the total number of pages in real memory from segments that are used by
all the processes of the root user, shows 56007 pages. The Pin column, which is the total number of
Unit: auto
###############################################################################
######## WPAR : Global
###############################################################################
===============================================================================
User Inuse Pin Pgsp Virtual
root 155.49M 49.0M 0K 149.99M
daemon 69.0M 34.8M 0K 68.9M
###############################################################################
######## WPAR : wp0
###############################################################################
===============================================================================
User Inuse Pin Pgsp Virtual
root 100.20M 35.4M 0K 96.4M
###############################################################################
######## WPAR : wp1
###############################################################################
===============================================================================
User Inuse Pin Pgsp Virtual
root 100.20M 35.4M 0K 96.4M
###############################################################################
######## WPAR : wp2
###############################################################################
===============================================================================
User Inuse Pin Pgsp Virtual
root 100.14M 35.4M 0K 96.3M
In this case, we run in each WPAR context and we want some details about every users in all the
WPARs running on the system. Since there are users that are not active, we want to keep only the
active user by adding the -O activeusers=on option on the command line. Each WPAR has a root user,
which in this example consumes the same amount of memory since each one runs the exact same list
of processes. The root user of the Global WPAR uses more memory since more processes are running
in the Global than in a WPAR.
Command report
The Command report displays the memory usage statistics for the specified command names. To print the
command report, specify the -C flag.
This report contains all the columns detailed in the common summary metrics as well as its own defined
here:
Command
Indicates the command name.
If processes running this command use pages of size other than the base 4KB page size, and the -O
pgsz=on option is set, these statistics are followed by breakdown statistics for each page size. The
metrics reported in this per-page size summary are reported in the page size unit by default.
Examples:
1. To display memory statistics about the yes command, with breakdown by process and categorized
detailed statistics by segment, enter:
Unit: page
===============================================================================
...............................................................................
SYSTEM segments Inuse Pin Pgsp Virtual
6336 5488 0 6336
...............................................................................
EXCLUSIVE segments Inuse Pin Pgsp Virtual
37 4 0 36
...............................................................................
SHARED segments Inuse Pin Pgsp Virtual
8032 0 0 8032
In this example, we are looking at the yes command. The report is divided in several sub-reports. The
summary line for the command displays the Inuse memory, the Pin pages in memory, the paging space
and virtual pages used by the command. The -O process=on option adds the process section, where
we have the list of the processes for this command.
2. To display memory statistics about the yes command, with breakdown by process and statistics by
segment including file names, enter:
Unit: page
===============================================================================
Command Inuse Pin Pgsp Virtual
yes 14405 5492 0 14404
This report displays for each segment its list of pids when the segment is in a process address space.
It also displays the filename of all client and persistent segments.
3. To display memory statistics about the init command, with breakdown by process, enter:
In a WPAR context, the -@ flag combined with the -O process=on flag, adds WPAR information in the
report. This example shows which init process belongs to which WPAR.
Process report
The process report displays the memory usage statistics for all or the specified process names. To print
the process report, specify the -P flag.
This report contains all the columns detailed in the common summary metrics as well as its own defined
here:
Pid
Indicates the process ID.
Command
Indicates the command the process is running.
If processes use pages of size other than the base 4KB page size, and the -O pgsz=on option is set, these
statistics are followed by breakdown statistics for each page size. The metrics reported in this per-page
size summary are reported in the page size unit by default.
After process information is displayed, svmon displays information about all the segments that the
process used. Information about segments are described in the paragraph Segment Report.
Note:
• If you specify the -@ flag, the svmon command displays two additional lines that show the virtual pid
and the WPAR name of the process. If the virtual pid is not valid, a dash sign (-) is displayed.
• The -O affinity flag supported by the -P option, gives details on domain affinity for the process when set
to on and for each of the segments when set to detail. Note that the Memory affinity information is not
available for the shared partitions.
Examples:
1. To display the top 10 list of processes in terms of real memory usage in KB unit, enter:
# svmon -P -O unit=KB,summary=basic,sortentity=inuse -t 10
Unit: KB
-------------------------------------------------------------------------------
Pid Command Inuse Pin Pgsp Virtual
344254 java 119792 22104 0 102336
209034 xmwlm 68612 21968 0 68256
262298 IBM.CSMAgentR 60852 22032 0 60172
270482 rmcd 60844 21996 0 60172
336038 IBM.ServiceRM 59588 22032 0 59344
225432 IBM.DRMd 59408 22040 0 59284
204900 sendmail 59240 21968 0 58532
266378 rpc.statd 59000 21980 0 58936
168062 snmpdv3ne 58700 21968 0 58508
131200 errdemon 58496 21968 0 58108
This example gives the top 10 processes consuming the most real memory. The report is sorted by the
inuse count, 119792 KB for the java process, 68612 KB for the xmwlm daemon and so on. The other
metrics are: KB pinned in memory, KB of paging space and virtual memory.
2. To display information about all the non empty segments of a process, enter:
Unit: page
-------------------------------------------------------------------------------
Pid Command Inuse Pin Pgsp Virtual
221326 java 20619 6326 9612 27584
The detailed section displays information about each non empty segment used by process 221326.
This includes the virtual, Vsid, and effective, Esid, segment identifiers. The type of the segment is also
displayed along with its description that consists of a textual description of the segment, including the
volume name and i-node of the file for persistent segments.
The report also details the size of the pages the segment is backed by (Psize column), where s denotes
4 KB pages and L denotes 16 MB pages, and sm a multi size page (small and medium page in this
case) the number of pages in memory (Inuse column), the number of pinned pages (Pin column),
the number of pages used in the paging space (Pgsp column), and the number of virtual pages
(Virtual column).
3. To display information about all the non empty segments used by a process, including the
corresponding shared memory ids and affinity domain data, enter:
# svmon -P 221326 -O
commandline=on,segment=on,affinity=on,shmid=on,filterprop=notempty
Unit: page
-------------------------------------------------------------------------------
Pid Command Inuse Pin Pgsp Virtual
221326 java 20619 6326 9612 27584
Domain affinity Npages
0 29345
1 11356
Unit: page
-------------------------------------------------------------------------------
Pid Command Inuse Pin Pgsp Virtual
209034 xmwlm 15978 5492 0 15929
5. To only display non empty segments and add per page size breakdown for segments with multiple
page sizes, enter:
The 2 previous examples show the difference of the values reported in the Inuse, Pin, Pgsp and
Virtual columns with MPSS pages. On this system sm pages are used by the process 209034, the
metrics reported in the first report are in 4KB pages (in the smaller page size) while when the break
down by page size is displayed with the -O mpss=on option, s pages are in 4KB page and m pages
are in 64KB pages. So, for the segment 19288 this gives 1477*4=5908KB in the first example, and
5*4*1024 + 92*64*1024 =5908KB in the second example. Dashes are put on the Pgsp and Virtual
memory columns for the client segments because it is meaningless for this type of segment.
6. To display detailed information about mapping segments for a process, in KB unit, enter:
Unit: KB
-------------------------------------------------------------------------------
Pid Command Inuse Pin Pgsp Virtual
274676 ptxtstmmap 57276 21968 0 57256
The mapping option is used in this case to also show mmaped segments which are not in the address
space of the process. The process 274676 has created a shared memory file (client segment b2ba),
this segment is used by mmap segments (11660, d65c, 13662, 4655, 1350) which are not in the
address space of the process. The mmap segment of the process gives the list of all mmaped segment
and their associated source (b2ba/13662, ...).
The process 340216 has created a private memory file, no extra mmap segments are displayed since
all segments which are using this resource are private to the process and are already so shown by
default.
# svmon -W -O unit=page,commandline=on,timestamp=on
In this example, all the WLM classes of the system are reported. Since no sort option was specified,
the Inuse metric (real memory usage) is the sorting key. The class System uses 121231 pages in real
memory. 94597 frames are pinned. The number of pages reserved or used in paging space is 19831.
The number of pages allocated in the virtual space is 135505.
2. To display memory statistics about all WLM classes and subclasses in the system, enter:
In this example, all the WLM classes and sub-classes of the system are reported. Since the no sort
option was specified, the Inuse metric (real memory usage) is the sorting key. The class System uses
120928 pages in real memory, they are split into 120928 pages in the System Default sub-class, and
no pages in the Shared sub-class.
# svmon -T -O unit=page
Unit: page
===============================================================================
Tier Inuse Pin Pgsp Virtual
0 137187 61577 2282 110589
===============================================================================
Superclass Inuse Pin Pgsp Virtual
System 81655 61181 2282 81570
Unclassified 26797 384 0 2107
Default 16863 12 0 15040
Shared 11872 0 0 11872
Unmanaged 0 0 0 0
1 9886 352 0 8700
===============================================================================
Superclass Inuse Pin Pgsp Virtual
myclass 9886 352 0 8700
All the superclasses of all the defined tiers are reported. Each Tier has a summary header with the
Inuse, Pin, Paging space, and Virtual memory, and then the list of all its classes.
2. To display memory statistics about all WLM tiers, superclasses and classes in the system, enter:
Details at sub-class level can also be displayed for each class of each Tier.
3. To display memory statistics about a particular WLM superclass in a tier, with segment and per page
size details, enter:
Unit: page
===============================================================================
Tier Superclass Inuse Pin Pgsp Virtual
0 myclass2 36 4 0 36
The statistics of all the subclasses, in the tier 0, of the superclass myclass2 are reported. The
distribution between the different page sizes is displayed by the -O pgsz=on option. Then, as -O
segment=on is specified, the subclass statistics are followed by its segments statistics. Finally, as -O
pidlist=on' is specified for each segment, the list of process which uses it, is displayed.
Segment report
To print the segment report, specify the -S flag.
This report contains all the columns detailed in the common summary metrics as well as its own defined
here:
Vsid
Indicates the virtual segment ID. Identifies a unique segment in the VMM.
Esid
Indicates the effective segment ID. The Esid is only valid when the segment belongs to only one
process (i.e: only one address space). When provided, it indicates how the segment is used by the
process. If the Vsid segment is mapped by several processes (i.e: several address spaces), then this
Unit: page
Information about each segment in the list is displayed. The Esid column contains information only
when -O pidlist=on is specified because the Esid has a meaning only in the address space of a
process. In this case, since the segment 3393e5 belongs to the process 168138, the Esid is reported,
in all other cases no information is displayed. The segments 11c02 is the kernel pinned heap. The
segment 2c4158 has no special characteristics. The segment 2c10da is relative to a file whose device
is /dev/hd2 and whose inode number is 4183. The Paging space and Virtual fields of the segment
2c10da are not meaningful (because it is a client segment). The segment 1b1a34 is a 16 MB page
segment which contains 2 pages of 16 MB (equivalent to 8192 pages of 4KB).
2. To display information about all unattached segments in the system, enter:
# svmon -S -O filtercat=unattached
Unit: page
In this example, the report contains all the segments coming from processes which have allocated
shared memory areas, and which have exited without freeing these memory areas.
3. To display the top 10 (in real memory consumption or sorted by the inuse field) text segments with
their corresponding file name, enter:
# svmon -S -t 10 -O unit=auto,filterprop=text,filename=on
Unit: auto
The -O filename=on option allows in this case to display the filename of each client text segment.
The amount of memory used by every segment is put with the unit identifier because of the -O
unit=auto option. The segment 1a0cb holds 7.62MB of real memory and no pinned memory. The
paging space and virtual memory are meaningless for client segments. The Description of the segment
f23e is truncated because the default format of the report is 80 columns. The -O format=180 or -O
format=nolimit could be used to display the full path of this file.
Unit: page
-------------------------------------------------------------------------------
Pid Command Inuse Pin Pgsp Virtual
381050 yes 11309 9956 0 11308
Detailed report
The detailed report (-D) displays information about the pages owned by a segment and, on-demand, it
can display the frames these pages are mapped to. To print the detailed report, specify the -D flag.
Several fields are presented before the listing of the pages used:
Segid
The segment identifier.
Type
The type of the segment.
PSize
The type of the segment.
Address Range
Ranges in which frames are used by this segment.
Ranges in which frames are used by this segment.
Size of paging space allocation
#svmon -D b9015
Segid: b9015
Type: client
PSize: s (4 KB)
Address Range: 0..9 : 122070..122070
The segment b9015 is a client segment with 11 pages. None of them are pinned.
The page 122070 is physically the page dcd6 in the extended segment 208831.
Page Psize Frame Pin Ref Mod ExtSegid ExtPage Pincount State Swbits
65483 s 72235 Y N N - - 1/0 Hidden 88000000
65353 s 4091 Y N N - - 1/0 Hidden 88000000
65352 s 4090 Y N N - - 1/0 Hidden 88000000
65351 s 4089 Y N N - - 1/0 Hidden 88000000
65350 s 1010007 N N N - - 0/0 In-Use 88020000
65349 s 1011282 N N N - - 0/0 In-Use 88020000
65354 s 992249 N N N - - 0/0 In-Use 88020000
65494 s 1011078 N N N - - 0/0 In-Use 88020000
0 s 12282 N N N - - 0/0 In-Use 88820000
1 s 12281 N N N - - 0/0 In-Use 88820000
2 s 64632 N N N - - 0/0 In-Use 88a20000
3 s 64685 N N N - - 0/0 In-Use 88a20000
4 s 64630 N N N - - 0/0 In-Use 88a20000
5 s 64633 N N N - - 0/0 In-Use 88820000
The frame 72235 is pinned, not referenced and not modified, it is in the Hidden state, it does not pertain
to an extended segment nor to a large page segment.
XML report
To print the XML report, specify the -X option.
By default the report is printed to standard output. The -o filename flag allows you to redirect the report
to a file. When the -O affinity option is used, affinity information is added to the report.
Note: The -O affinity=detail option can take a long time to compute.
The extension of XML reports is .svm. To prevent a report overwrite, the option -O overwrite=off option
can be specified (by default this option is set to on).
This XML file uses a XML Schema Definition (XSD) which can be found in the file: /usr/lib/perf/
svmon_measurement.xsd. This schema is self-documented and thus can be used by anyone to build
custom application using the XML data provided in these reports.
The data provided in this file is a snapshot view of the whole machine. It contains enough data to build an
equivalent of the -G, -P, -S, -W, -U, and -C options.
Makefile
The include files are based on the define directives, which must be properly set. They are defined with the
-D preprocessor flag.
• _AIX® specifies the include files to generate code for AIX.
• _BSD required for proper BSD compatibility.
An example of a Makefile that helps to build a sample program follows:
RsiCons: RsiCons.c
$(CC) -o RsiCons RsiCons.c $(CFLAGS) $(LIBS)
RsiCons1: RsiCons1.c
$(CC) -o RsiCons1 RsiCons1.c $(CFLAGS) $(LIBS)
chmon: chmon.c $
$(CC) -o chmon chmon.c $(CFLAGS) $(LIBS) -lcurses
If the system that is used to compile does not support ANSI function prototypes, include the
-D_NO_PROTO flag.
RSI handle
An RSI handle is a pointer to a data structure of type RsiHandleStructx. Prior to using any other RSI
call, a data-consumer program must use the RSiInit subroutine to allocate a table of RSI handles. An
RSI handle from the table is initialized when you open the logical connection to a host and that RSI
handle must be specified as an argument on all subsequent subroutines to the same host. Only one of
the internal fields of the RSI handle should be used by the data-consumer program, namely the pointer
to received network packets, pi. Only in very special cases will you ever need to use this pointer, which
is initialized by RSiOpenx and must never be modified by a data-consumer program. If your program
changes any field in the RSI handle structure, results are highly unpredictable. The RSI handle is defined
in /usr/include/sys/Rsi.h.
SpmiStatVals
A single data value is represented by a structure defined in /usr/include/sys/Spmidef.h as struct
SpmiStatVals. Be aware that none of the fields defined in the structure must be modified by application
programs. The two handles in the structure are symbolic references to contexts and statistics and should
not be confused with pointers. The last three fields are updated whenever a data_feed packet is
received. These fields are as follows:
Item Descriptor
val The latest actual contents of the statistics data
field.
val_change The difference (delta value) between the latest
actual contents of the statistics data field and the
previous value observed.
error An error code as defined by the enum Error in
included in the /usr/include/sys/Spmidef.h
file.
Note: The two value fields are defined as union Value, which means that the actual data fields may
be long or float, depending on flags in the corresponding SpmiStat structure. The SpmiStat structure
cannot be accessed directly from the StatVals structure (the pointer is not valid, as previously mentioned).
Therefore, to determine the type of data in the val and val_change fields, you must have saved the
SpmiStat structure as returned by the RSiPathAddSetStatx subroutine. This is rather clumsy, so the
RSiGetValuex subroutine does everything for you and you do not need to keep track of SpmiStat
structures.
The SpmiStat structure is used to describe a statistic. It is defined in the /usr/include/sys/
Spmidef.h file of type SpmiStat struct . If you ever need information from this data structure (apart
from information that can be returned by the RSiStatGetPathx subroutine) be sure to save it as it is
returned by the RSiPathAddSetStatx subroutine.
The RSiGetValuex subroutine provides another way of getting access to an SpmiStat structure but can
only do so while a data feed packet is being processed.
The xmtopas daemon accepts the definition of sets of statistics that are to be extracted simultaneously
and sent to the data-consumer program in a single data packet. The structure that describes such a
Item Descriptor
error Returns a zero value if the SPMI's last attempt to
read the data values for a set of peer statistics was
successful. Otherwise, this field contains an error
code as defined in the sys/Spmidef.h file.
avail_resp Used to return the number of peer statistic
data values that meet the selection criteria
(threshold). The field max_responses determines
the maximum number of entries actually returned.
count Contains the number of elements returned in the
array items. This number is the number of data
values that met the selection criteria (threshold),
capped at max_responses.
Resynchronizing
Network connections can go bad, hosts can go down, interfaces can be taken down and processes can
stop functioning.
In the case of the xmtopas protocol, such situations usually result in one or more of the following:
• Missing packets
• Resynchronizing requests
Missing packets
Responses to outstanding requests are not received, which generate a timeout. That's fairly easy to cope
with because the data-consumer program has to handle other error return codes anyway. It also results
in expected data feeds not being received. Your program may want to test for this happening. The proper
way to handle this situation is to use the RSiClosex function to release all memory related to the dead
host and to free the RSI handle. After this is done, the data-consumer program may attempt another
RSiOpenx to the remote system or may simply exit.
Resynchronizing requests
Whenever an xmtopas daemon hears from a given data-consumer program on a particular host for
the first time, it responds with a packet of i_am_back type, effectively prompting the data-consumer
program to resynchronize with the daemon. Also, when the daemon attempts to reconnect to data-
consumer programs that it talked to when it was killed or died, it sends an i_am_back packet.
It is important that you understand how the xmtopas daemon handles “first time contacted.” It is based
upon tables internal to the daemon. Those tables identify all the data-consumers that the daemon knows
about. Be aware that a data-consumer program is known by the host name of the host where it executes
suffixed by the IP port number used to talk to the daemon. Each data-consumer program running is
identified uniquely as are multiple running copies of the same data-consumer program.
Whenever a data-consumer program exits orderly, it alerts the daemon that it intends to exit and the
daemon removes it from the internal tables. If, however, the data-consumer program decides to not
request data feeds from the daemon for some time, the daemon detects that the data consumer has lost
interest and removes the data consumer from its tables as described in Life and Death of xmtopas. If the
Example:
When the RSI communication starts, it uses 3001, 3002 or 3003 ports in the specified range. Only 3 RSI
agents can listen to the ports and the subsequent RSI communication fails.
Finally, lines 34 through 36 prepare an initial value path name for the main processing loop of the
data-consumer program. This is the method followed to create the value path names. Then, the main
processing loop in the internal lststats function is called. If this function returns, issue an RSiClosex
call and exit the program.
Defining a Statset
Eventually, you want the sample of the data-consumer program to receive data feeds from the xmtopas
daemon. Thus, start preparing the SpmiStatSet, which defines the set of statistics with which you are
interested. This is done with the RSiCreateStatSetx subroutine.
In the sample program, the SpmiStatSet is created in the local lststats function shown previously in
lines 6 through 10.
Lines 12 through 19 invoke the local function addstat (Adding Statistics to the Statset), which finds
all the CPU-related statistics in the context hierarchy and initializes the arrays to collect and print the
information. The first two lines expand the value path name passed to the function by appending CPU/
cpu0. The resulting string is the path name of the context where all CPU-related statistics for cpu0 are
The use of RSiPathGetCxx by the sample program is shown in lines 8 through 12. Following that, in lines
14 through 30, two subroutines are used to get all the statistics values defined for the CPU context. This is
done by using RSiFirstStatx and RSiNextStatx subroutines.
In lines 20-21, the short name of the context (“cpu0”) and the short name of the statistic are saved in
two arrays for use when printing the column headings. Lines 22-24 construct the full path name of the
statistics value by concatenating the full context path name and the short name of the value. This is
necessary to proceed with adding the value to the SpmiStatSet with the RSiPathAddSetStatx. The
value is added by using the lines 25 and 26.
Actual processing of received statistics values is done by the lines 20-24. It involves the use of the library
RSiGetValuex subroutine. The following is an example of output from the sample program RsiCons1:
$ RsiCons1 umbra
Traversing contexts
The adddisk function in the following list shows how the RSiFirstCxx, RSiNextCxx, and the
RSiInstantiatex subroutines are combined with RSiPathGetCxx to make sure all subcontexts
are accessed. The sample program's addstat internal function is used to add the statistics of
The output from the RsiCons program when run on the xmtopas daemon on an AIX operating system
host is shown in the following example.
$ RsiCons encee
nobroadcast
birte.austin.ibm.com
gatea.almaden.ibm.com
umbra
This example shows that the hosts to monitor do not necessarily have to be in the same domain or on a
local network. However, doing remote monitoring across a low-speed communications line is unlikely to
be popular; neither with other users of that communications line nor with yourself.
Be aware that whenever you want to monitor remote hosts that are not on the same subnet as the
data-consumer host, you must specify the broadcast address of the other subnets or all the host names of
those hosts in the $HOME/Rsi.hosts file. The reason is that IP broadcasts do not propagate through IP
routers or gateways.
The following example illustrates a situation where you want to do broadcasting on all local interfaces,
want to broadcast on the subnet identified by the broadcast address 129.49.143.255, and also want to
invite the host called umbra. (The subnet mask corresponding to the broadcast address in this example is
255.255.240.0 and the range of addresses covered by the broadcast is 129.49.128.0 - 129.49.143.255.)
129.49.143.255
If the RSiInvitex subroutine detects that the name server is inoperational or has abnormally long
response time, it returns the IP addresses of hosts rather than the host names. If the name server fails
after the list of hosts is partly built, the same host may appear twice, once with its IP address and once
with its host name.
Another sample program written to the data-consumer API is the chmon program . Source code to
the program is in /usr/samples/perfmgr/chmon.c.file. The chmon program is also stored as an
executable during the installation of the Manager component. An example program follows:
chmon[-iseconds_interval][-pno_of_processes][hostname>]
Item Descriptor
seconds_interval Is the interval between observations. Must be
specified in seconds. No blanks must be entered
between the flag and the interval. Defaults to 5
seconds.
no_of_processes Is the number of “hot” processes to be shown.
A process is considered “hotter” the more CPU it
uses. No blanks must be entered between the flag
and the count field. Defaults to 0 (no) processes.
hostname Is the host name of the host to be monitored.
Default is the local host. The sample program exits
after 2,000 observations have been taken, or when
you type the letter “q” in its window.
lspv -l hdisk*
BBBV
This section provides volume group configuration details. This section displays the output of the
following command:
BBBP
This section provides configuration details and performance statistics by displaying the output of set
of commands that are executed during a time interval. The commands are executed in the following
format: <command name> <Parameters>. This format represents the command that was run to get
the output for this section. For details about the fields that are displayed in the command output,
refer to the corresponding command man pages. The list of commands that are run by the nmon tool
follows:
Recorded always.
/usr/sbin/lsconf
Recorded always.
/usr/sbin/lsps -a
Recorded always.
/usr/bin/lparstat -i
Recorded always.
/usr/bin/emstat -a 1 2
Recorded always.
/usr/bin/mpstat -d
Recorded always.
/usr/sbin/lssrad -av
Recorded always.
/usr/sbin/vmo -F –L
Recorded always.
/usr/bin/ipcs -a
Recorded always.
/usr/bin/vmstat -v
Recorded always.
/usr/bin/vmstat -s
Recorded always.
cat /etc/wlm/current/classes
Recorded always.
cat /etc/wlm/current/rules
Recorded always.
cat /etc/wlm/current/limits
Recorded always.
cat /etc/wlm/current/shares
Recorded always.
/usr/sbin/mount
Recorded always.
/usr/sbin/lsattr -El aio0
Recorded always.
/usr/bin/oslevel -rq
Recorded always.
/usr/bin/oslevel -s
Recorded always.
/usr/bin/ps v
Recorded always.
/usr/bin/netstat -rn
BBBP** continued
Commands that are related to shared storage pool (SSP) are executed if SSP data collection is not
enabled explicitly. You can use the -ysub=ssp option to enable SSP data collection. This section
contains the following fields:
/usr/sbin/lscluster -m
Displays the output of the lscluster –m command that provides the details of the cluster and
the nodes that belong to the cluster.
/usr/ios/cli/ioscli lssp –clustername
Displays the output of the lssp –clustername <clustername> command that provides the
details of the shared storage pool that belong to the cluster.
Note: Complete I/O server command line interface (ioscli) path of the command is specified in the
root mode of VIOS.
/usr/ios/cli/ioscli lssp -clustername -sp -bd
Provides the logical unit (LU) details of the shared storage pool. An example command follows:
Note: Complete I/O server command line interface (ioscli) path of the command is specified in the
root mode of VIOS.
Physical location <-> PartitionNumber:MTM
Provides the following details for all backing devices of the VIOS:
• Physical location of the backing device.
• Virtual I/O client partition ID to which the device is mapped to.
• Machine Type Model (MTM) number of that device.
Client-Id : MTM <-> VTD
Provides the following mapping information for all virtual target devices:
• VIOS client partition ID.
• MTM number.
• Virtual Target Device (VTD) name.
VTD <-> BACKING DEVICE
Provides the following mapping information for all virtual target devices:
• Virtual target device name.
• <Backing device name>.<unique device identifier>
BBBP, Disk Statistics
This section indicates the beginning of the section for the list of SSP disks. This section is the
continuation of the SSP section that is recorded by default when the -y sub=ssp option is not
specified explicitly. This section contains the following fields:
Disk Name
SSP disk name.
I/O statistics
The I/O statistics section in the nmon recording file contains statistics about disk, disk adapter, Enterprise
Storage Server (ESS) disks, disk groups, and file system.
The following sections in the nmon recording file are used to identify the I/O statistics:
BBBSSP Fibre Channel adapter Records Fibre Channel adapter statistics in the following format:
BBBSSP Fibre Channel adapter Records the utilization statistics of the Fibre Channel adapter. This
utilization section contains the following fields:
FCREAD, Fibre Channel Read KB/s, <FCNAMES>
Total data in KB that is read on the adapter per second.
FCWRITE, Fibre Channel Write KB/s
Total amount of data in KB that is written to the adapter per
second.
FCXFERIN, Fibre Channel Tranfers In/s
Total number of read requests per second on the adapter.
FCXFEROUT, Fibre Channel Tranfers Out/s
Total number of write requests per second on the adapter.
BBBVFC,<Metrics list>
If the Fiber Channel statistics are enabled by using the ^ option, this
section is also recorded. This section contains the following fields:
vfchost name
Virtual Fibre Channel host adapter name.
client name
Name of the client partition that has the virtual Fibre Channel
adapter.
WWPN
Worldwide port number.
FC Adapter Name
Fibre Channel port name.
NPIV, Virtual FC Records NPIV utilization statistics. This section contains the following
Adapter, <metrics list> fields:
<NPIV_Name>_ read-KB/s
Total amount of data in KB that is read per second on this adapter.
<NPIV_Name>_ write-KB/s
Total amount of data in KB that is written to this adapter.
<NPIV_Name>_ reads/s
Number of read requests per second on this adapter.
<NPIV_Name>_ writes/s
Number of write requests per second on this adapter.
<NPIV_Name>_ port_speed
Speed of this adapter in GB per second.
JFS* Records the Journaled File System (JFS) statistics. The proc file
system is not recorded because it is a pseudo file system. JFS
statistics are recorded by default. This section contains the following
fields:
JFSFILE,JFS Filespace %Used, JFS names
Percentage of file space used by this JFS over the total space
allocated to it.
JFSINODE, JFS Inode %Used, JFS list
Percentage of inode usage by the JFS over the total inode files
present on the LPAR.
Kernel statistics
The PROC section of the nmon recording file contains kernel statistics.
Kernel statistics is enabled by default. Following fields are available in the kernel statistics section:
Runnable
Number of runnable processes that are ready to run per second in the run queue. The run queue is
maintained by the process scheduler and contains the list of threads that are ready to be dispatched.
Swap-in
Length of the swap queue per second, which means the number of ready processes that are waiting to
be paged-in per second. The swap queue contains the list of processes that are ready to run but are
swapped out with the currently running processes.
pswitch
Number of process context switches per second.
syscall
Number of system calls that are run per second.
Memory statistics
The memory statistics contain the memory-specific information and metrics.
The following sections in the nmon recording file are used to identify the memory statistics:
MEM
Provides statistics on the global memory utilization. This statistic is recorded by default. This section
contains the following fields:
Real free %
Percentage of available physical RAM as against the total available RAM.
Virtual free %
Percentage of available paging space as against the total paging space that is allocated.
Real free (MB)
Available RAM space in MB.
Virtual free (MB)
Available paging space in MB.
Real total (MB)
Total physical RAM in MB.
Virtual total (MB)
Total paging space size in MB.
Physical (MB)
Physical memory, in MB, that is allocated to the shared memory partitions from the shared
memory pool at a specific time.
Note: This field is displayed only for an Active Memory Sharing (AMS) partition.
Size of the compressed pool (MB)
Size of the compressed pool.
Note: This field is displayed only for the Active Memory Expansion (AME)-enabled partition.
Process statistics
The process statistics contain the statistics of the top processes that are running on the logical partition. A
top process is a process that has higher CPU consumption. By default, the threshold for CPU consumption
is 0.1%. You can configure this threshold value. Processes that have CPU consumption more than or equal
to the threshold value are recorded in the nmon recording tool.
The following sections in the nmon recording file are used to identify the process statistics:
WLM statistics
The WLM section in the recorded file contains the workload manager (WLM) statistics. The WLM statistics
are related to CPU, memory, I/O.
You can use the -W option to enable the Workload Manager statistics collection in the nmon recording.
Workload Manager (WLM) is designed to provide increased control to the system administrator on how the
process scheduler, virtual memory manager (VMM), and the disk I/O subsystem allocate resources to the
processes.
The different tags that represent WLM statistics follow:
WLMCPU, CPU percent for <WLMCount> Classes, < WLMClasses>
If WLM classes WLMClass1 and WLMClass2 exist in the LPAR, an nmon recording for WLMCPU
section is displayed, which is similar to the following example:
For each sample, percentage of CPU used by corresponding WLM class is recorded.
WLMMEM, Memory percent for <WLMCount> Classes, <WLMClasses>
Percentage of real memory that is used by corresponding WLM class.
WLMBIO, Block IO percent for <WLMCount> Classes, <WLMClasses>
Percentage of disk I/O bandwidth that is used by the corresponding WLM class.
For license inquiries regarding double-byte character set (DBCS) information, contact the IBM Intellectual
Property Department in your country or send inquiries, in writing, to:
Such information may be available, subject to appropriate terms and conditions, including in some cases,
payment of a fee.
Portions of this code are derived from IBM Corp. Sample Programs.
© Copyright IBM Corp. _enter the year or years_.
332 Notices
For more information about the use of various technologies, including cookies, for these purposes,
see IBM’s Privacy Policy at http://www.ibm.com/privacy and IBM’s Online Privacy Statement at http://
www.ibm.com/privacy/details the section entitled “Cookies, Web Beacons and Other Technologies”
and the “IBM Software Products and Software-as-a-Service Privacy Statement” at http://www.ibm.com/
software/info/product-privacy.
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business
Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be
trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at
Copyright and trademark information at www.ibm.com/legal/copytrade.shtml.
Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or
its affiliates.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Notices 333
334 AIX Version 7.3: Performance Tools Guide and Reference
Index
A curt (continued)
sample report (continued)
API calls -e flag 22
basic -p flag 27
pm_delete_program 69 -P flag 30
pm_get_data 69 -s flag 24
pm_get_program 69 -t flag 24
pm_get_tdata 69 syntax 2
pm_get_Tdata 69 System Calls Summary Report 15
pm_reset_data 69 System Summary Report 7
pm_set_program 69
pm_start 69
pm_stop 69
E
pm_tstart 69 event list
pm_tstop 69 POWERCOMPAT 59
examples
B performance monitor APIs 72
Index 335
performance monitor API (continued) SMIT Interface 217
security considerations 65 splat
thread accumulation 64 address-to-name resolution 37
thread group accumulation 64 AIX kernel lock details 41
perfstat command syntax 33
characteristics 88 condition-variable report 54
component-specific interfaces 109 event explanation 35
global interfaces 89 event name 35
perfstat_ interface 115 execution, trace, and analysis intervals 36
perfstat_disk_total Interface 96 flags 33
perfstat_fcstat interface 144 hook ID 35
perfstat_memory_total Interface 94 measurement and sampling 35
perfstat_netadapter interface 156 mutex function detail 51
perfstat_netinterface_total Interface 96 mutex pthread detail 51
perfstat API programming mutex reports 49
see perfstat 88 parameters 33
perfstat_cpu_util interfaces PThread synchronizer reports 49
simplelparstat.c 124 read/write lock reports 52
simplempstat.c 129 reports
pm_delete_program 66 execution summary 37
pm_error 66 gross lock summary 38
pm_groups_info_t 67 per-lock summary 39
pm_info_t 67 simple and runQ lock details 41, 44
pm_init API initialization 67 trace discontinuities 36
pm_initialize 66
pm_initialize API initialization 68
pm_set_program 66
T
pmapi library 66 thread counting-group information
PMU registers 78 consistency flag 70
POWERCOMPAT 59 member count 70
procmon tool 223 process flag 70
profiling 226
R
reboot procedure 216
recovery procedure 216
release specific features 202
Remote Statistics Interface (RSI)
A Full-Screen, character-based monitor 282
Adding statistics to the Statset 278
An Alternative way to decode data feeds 279
Concepts and Terms 271
Data structures 272
Data-Consumer decoding of data feeds 278
Defining a Statset 277
Expanding the data-consumer program 279
Identifying data suppliers 281
Initializing and terminating the program 276
Inviting data suppliers 281
List of RSi Error Codes 282
List of subroutines 270
Request-Response Interface 274
Resynchronizing 275
RSI network driven interface 274
Sample code 276
Specifying port range for RSI communication 276
Traversing contexts 279
S
simple performance lock analysis tool (splat)
see splat 33