CP Performance Optimization Guide
CP Performance Optimization Guide
Table of Contents
Preface ........................................................................................................................ 2
Open Performance Architecture Overview .................................................................. 2
SecureXL ................................................................................................................. 2
CoreXL .................................................................................................................... 2
ClusterXL ................................................................................................................. 3
Packet flows ............................................................................................................ 4
Optimizing Server Hardware and Operating System ................................................... 6
Hyper-Threading ...................................................................................................... 6
NIC Properties ......................................................................................................... 6
CPU Speed .............................................................................................................. 6
ARP Cache Table .................................................................................................... 7
Optimizing Network Performance ................................................................................ 8
Working with SecureXL ........................................................................................... 8
Working with CoreXL ............................................................................................. 12
Working with ClusterXL ......................................................................................... 16
Memory Allocation ................................................................................................. 16
SmartView Tracker Logs and dmesg Output ......................................................... 18
Optimizing the Session Rate ..................................................................................... 19
Working with SecureXL ......................................................................................... 19
Working with ClusterXL ......................................................................................... 22
Improving NAT Session Rate ................................................................................ 24
References ................................................................................................................ 24
All three technologies can work together to maximize their unique advantages.
SecureXL
SecureXL is a technology that enables offloading security processing to processing units
(hardware or software). This allows fast processing of the traffic and enables high-speed
performance.
The firewall module handles the first packet of a connection and offloads the relevant
information to the SecureXL device. Thus the SecureXL device is allowed to process all the
subsequent packets. The firewall can also offload connection templates to the SecureXL
device. In this case, a new connection that matches the template can be created in the
device and the firewall does not even process the first packet. This feature is designed to
optimize performance for connections establishment rate.
CoreXL
CoreXL is a technology that allows Firewall and IPS security code to run on multiple
processors concurrently. The CoreXL layer accelerates traffic that cannot be handled by the
SecureXL device or traffic that requires deep packet inspection.
CoreXL is able to provide near linear scalability of performance, based on the number of
processing cores on a single machine. This increase in performance is achieved without
requiring any changes to management or network topology.
In a CoreXL gateway, the firewall kernel is replicated so that each replicated copy (instance)
runs on a processing core. These instances handle traffic concurrently, and each instance is
a complete and independent inspection kernel.
A Security Gateway Cluster is a group of identical gateways that are connected, so that if
one fails, another immediately takes its place.
ClusterXL provides an infrastructure that ensures that no data is lost in case of a failover,
because each Gateway Cluster member is aware of the connections passing through the
other members via state synchronization.
When ClusterXL is set to High Availability mode, it designates one of the cluster members as
the active machine and the rest of the members are kept in a stand-by mode. All traffic is
directed to the active member. The active member updates the stand-by members of any
state changes, so that if the active member goes down, they can be immediately substituted
for it.
In this mode you only utilize the processing power of a single machine.
When ClusterXL is set to Load Sharing mode, you can distribute network traffic between the
cluster members. Unlike High Availability mode, where only a single member is active at any
given time, in Load Sharing mode all the cluster members are active. The whole cluster is
responsible for assigning a portion of the traffic to each cluster member and this usually
leads to an increase in total throughput of the cluster.
Multicast mode - all packets sent to the cluster reach all the members in the cluster. Each
member then decides whether it should process the packets or not. This mode presents
better performance figures for connections establishment rate than Unicast mode.
Unicast mode - a single cluster member, referred to as the pivot, receives all the packets
sent to the cluster. The pivot is then responsible for propagating the packets to other cluster
members, creating a Load Sharing mechanism. The pivot member still acts as a firewall
module that processes packets. However, the other members can perform other tasks for
the pivot in order to reduce its total load and performance.
NOTE: To support ClusterXL Load Sharing Multicast, extra configuration settings may be
required on the connected router. For more information on ClusterXL Load Sharing Multicast
configuration mode, see the R70 ClusterXL Administration Guide.
Packet flows
When SecureXL is enabled, a packet enters the firewall and first reaches the SecureXL
device. The device can choose to handle the packet in three ways:
1. Acceleration path - The packet is completely handled by the SecureXL device. It is
processed and sent back again to the network. This path does all the IPS processing
when CoreXL is disabled.
2. Medium path - The packet is handled by the SecureXL device, except for IPS
processing. The CoreXL layer passes the packet to one of the firewall instances, to
perform IPS processing. This path is only available when CoreXL is enabled.
3. Firewall path - The SecureXL device is unable to process the packet. It is passed on to
the CoreXL layer and then to one of the instances, for full firewall processing. This path
also processes all packets when SecureXL is disabled.
Instance0
Instance1
Firewall Instance 2
Medium
Path
Firewall Instance 3
Path
Medium
Path
Firewall Path
Medium
Path
Medium Path
Firewall
Path Instance
Path N
Queue
Queue
Medium Firewall
Queue
Path Path
Queue
Queue
Dispatcher
Performance Pack
Accelerated
Path
Medium
Path
Firewall
Path
If you are using a Check Point appliance, you only need to refer to the ARP Cache Table
section.
Hyper-Threading
Hyper-Threading can cause negative impact on performance of the R70 Security Gateway. It
is recommended that you disable this capability.
NIC Properties
This configuration is only for an open server. There are four issues related to the NIC that
can affect performance of the R70 Security Gateway.
1. HCL support
You should verify that you are using certified NICs with the following link:
http://www.checkpoint.com/services/techsupport/hcl/index.html
2. PCI Express
You should use the PCI-Express NICs, because they have better performance than
PCI-X NICs.
3. Speed
Use ethtool <interface name> to verify that the NIC is working at the desired
speed and using full-duplex settings.
4. Statistics
Use ethtool -s ethx to check statistics for the NICs. A properly working system
should display minimal rx/tx drop/error statistics.
CPU Speed
This configuration is only for an open server. If performance is low, use the cat
/proc/cpuinfo command to extract information about the CPU model and speed. You
may be able to improve performance if you upgrade the CPU frequency speed.
NOTE: You should also increase the ARP Cache table if you are testing large subnets that
are directly connected to the gateway without a router.
Format the /etc/sysctl.conf file and run the sysctl –p command. This change
survives boot. (See Example 1.)
Run the sysctl command. This change does not survive boot. (See Example 2.)
The following examples demonstrate how to increase the number of ARP entries to 4096, to
allow for 4096 IPs.
Example 1
net.ipv4.neigh.default.gc_thresh3 = 4096
net.ipv4.neigh.default.gc_thresh2 = 2048
Example 2
sysctl -w net.ipv4.neigh.default.gc_thresh3=4096
sysctl -w net.ipv4.neigh.default.gc_thresh2=2048
o QoS
The first packet of any new TCP session, unless a template exists.
The first packet of any session that requires NAT.
The first packet of any new UDP session, unless a template exists.
All traffic that matches a service that uses a resource.
All traffic that is supposed to be dropped or rejected, according to the rule base (consider
enabling Drop Templates - see below).
2. Review and tune the firewall policy and IPS protections (refer to sk33250 and R70
IPS Administration Guide). .
Command Explanation
To disable debug:
An interactive menu is displayed and provides you with the option to enable or disable
the accelerated traffic by selecting Enable/Disable Check Point SecureXL. Select
Enable in order to enable accelerated traffic. Select Disable in order to disable
accelerated traffic.
IPS Protections
Some protections can cause an adverse affect on the performance of the gateways on which
they are activated. These protections must use more resources, or they apply to common
types of traffic.
Protections with a high performance impact may also reduce network performance.
IPS Exceptions
For protections which prevent SecureXL from accelerating traffic, the IPS exception
mechanism allows SecureXL to accelerate connections that match the exception rules.
For example:
“Network Quota” protection in R70 does not disable SecureXL templates on connections
that match the protection's exception rules.
For further information regarding IPS, refer to the R70 IPS Administration Guide.
Dropped Templates
You should enable drop templates to improve the Security Gateways’ performance when a
large part of the traffic matches a drop rule. This feature allows Performance Pack to handle
the drops. This feature is disabled by default.
3. Select Firewall-1>SecureXL.
4. Check enable_drop_templates.
The following table contains CLI commands that can help you manage drop templates:
Command Result
Drop templates (fwaccel stats –d) contains an index of ranges. If you correlate the
index with sim ranges, then you can better understand the practical ranges for drop
templates and when it is appropriate to use them.
CPU Roles
The cores in a multi-core machine can assume several roles, including:
Distributing non-accelerated packets among kernel instances for IPS and Firewall
inspection.
Traffic entering network interface cards (NICs) is directed to a processing core running the
SND. The association of a particular interface with a processing core is called the interface’s
affinity with that core. This affinity causes the interface’s traffic to be directed to that core and
then SND runs on that core.
Kernel instance
A firewall kernel instance is configured to run on a particular core which is responsible for the
following:
Regarding the firewall daemon, this can be useful when there is massive logging that
consumes a lot of CPU resources.
IMPORTANT: Under normal circumstances, it is not recommended for the SND and an
instance to share a core. However, it is necessary in the following cases:
1. When using a machine with only two cores. It is better for both SND and instances
to share cores, instead of giving each only one core.
2. When you know that almost all of the packets are being processed in the
accelerated path, and you want to assign all CPUs to this path. If the instances do
not receive significant work, then it is appropriate to share the cores.
The following table describes the default configuration of cores and kernel instances:
1 CoreXL is disabled
2 2
4 3
8 6
For more information on configuring the cores, refer to the CP R70 Firewall Administration
Guide.
1. Use the fw ctl affinity -l -r command to understand the role of each CPU.
You can view the cores that are handling kernel instances.
2. Cores that do not have a kernel instance running are for SND to use. The interfaces'
affinity should only be mapped to these cores.
a. If SND cores are more heavily used than instance cores - you may want to
decrease the number of instances, to allow SND to use another core.
b. If instance cores are more heavily used than SND cores - you may want to
increase the number of instances, to share the work among more instances.
NOTE: After the top command is entered, you need to press 1 to view usage per CPU. To
make this the default view, select SHIFT+W.
If Performance Pack is disabled- all interfaces' affinity are mapped to a single core. If you
have more than one core available, you should change the affinity of some interfaces to use
the other cores.
1. Run the top command to display how the SND cores are being used.
For more information, refer to the “sim affinity” section in the R70 Performance Pack
Administration Guide.
NOTE: If interface affinities are attached to a specific core, then you should avoid setting the
affinity of the fwd daemon to these cores. In general, it is recommended to attach a core
with only one of the following components: network interfaces, kernel firewall instances or
user space processes/daemons. You should avoid having more than one these components
attached to the same core.
When you set affinities for Check Point daemons (such as the fwd daemon), they are loaded
at boot from the fwaffinity.conf configuration text file located at: $FWDIR/conf.
n fwd <cpuid>
where <cpuid> is the number of the processing core to be set as the affinity of the
fwd daemon.
For example, to set core #2 as the affinity of the fwd daemon, add to the file:
n fwd 2
You must reboot the server in order that the fwaffinity.conf settings take effect.
After reboot, you can verify the configuration by running the command: fw ctl
affinity -l -r.
# fw ctl affinity -l -r
CPU 0: Mgmt Lan1 Lan2
CPU 1: Lan3 Lan4
CPU 2: fwd
CPU 3: fw_4
CPU 4: fw_3
CPU 5: fw_2
In lab staging tests (when running with CoreXL) you should use many source and/or
destination IPs. Usually, several hundred distinct IP pairs should be sufficient to balance the
connections amongst the kernel instances. Do not use an extremely high number of IPs,
because this may make the templates ineffective.
Some of the SmartDefense protections require the connection to be sticky - the packet must
be handled by the same cluster member. Network performance can be reduced when a
sticky connection is combined with asymmetric routing. For example:
Flush and ACK - The return packet for this connection is not going to be handled by the
original cluster member. The original member holds the packet until it is synchronized and
acknowledged by the other member.
Forwarding - A cluster member forwards packets to the member that handled the first
packet of the connection.
Memory Allocation
Memory allocation failures can reduce the performance of the system.
NOTE: If a memory allocation failure occurs, you should not perform lab tests for achieving
best performance. For example, do not perform a lab test if there are too many concurrent
connections.
2. Search for failures in kmem and smem. (These values are bolded in the following
example.)
Note: Even though failures in hmem are legitimate, they might impact performance especially
when CoreXL is enabled. For optimal performance, there should not be any failed memory
allocations.
On open servers, you can install more memory. However, the maximum amount of
memory that can be used by the kernel is 2 GB.
This message is issued whenever a cluster member changes its state. The log text
specifies the new state of the member.
Either an error was detected by the pnote device, or the device has not reported its state
for a number of seconds (as set by the “timeout” option of the pnote)
For more information on the dmesg log see the R70 ClusterXL Administration Guide.
Concurrent Connections
You should ensure that the total number of concurrent connections is appropriate to the TCP
end timeout. Too many concurrent connections can impede the performance of the R70
Security Gateway.
You can calculate the maximum number of concurrent connections by multiplying the
session establishment rate by the TCP end timeout (by default, 20 seconds).
NOTE: To test session rate many connections need to be opened. You must ensure that the
test is not limited by the maximum number of connections in order for the test to be valid.
NOTE: When Aggressive Aging is enabled and the number of concurrent connections is
near the limit, there can be a performance impact.
Aggressive Aging
Aggressive Aging is triggered when memory consumption is high, and the R70 Security
Gateway deletes some connections to reduce consumption. It destroys old connections,
particularly closed TCP sessions, which were closed at least 3 seconds ago. Aggressive
Aging reduces the number of concurrent connections to prevent memory exhaustion.
However, when Aggressive Aging starts deleting connections, there is a noticeable
performance impact.
NOTE: Aggressive Aging can invalidate a performance test. For best results, you should
ensure that Aggressive Aging is not active during the test. You should disable it, or run the
fw ctl pstat command to make sure that less than 70% of the machine's memory is
used by the test. For more information on machine memory, refer to the Memory Allocation
section.
Templates
In order to accelerate connection establishment, there is a mechanism that attempts to
"group together" all connections that match a specific service but have a different source
port. When the first packet of the first connection in such a group is seen, it is processed by
the firewall, which offloads the connection to the SecureXL device. The firewall also offloads
a “template”, which allows the device to accelerate all other connections in this group. When
the first packet of another connection in this group arrives, the acceleration device can
handle it by itself. This "grouping" allows the acceleration device to handle almost all
packets, including even the first packet of most connections.
If templates are not being created, then there is a rule that is preventing a template from
being created. Refer to the section, Using Templates with Rules for more information.
The connections cannot be grouped because the source port is not the only variation. A
template is not created for these connections and the first packet is handled by the firewall
path.
o Dynamic object
Delayed Notification
A SecureXL device may create a connection that matches a template, and notify the firewall
about the connection only after a period of time. This feature further enhances the
connection rate of the SecureXL device.
The fwaccel stats command indicates the total number of delayed connections
(delayed TCP conns.)
The fwaccel templates command indicates the delayed time for each template under
the DLY entry.
If you are using a single gateway device – Delayed Notification is enabled by default.
State Synchronization
State Synchronization enables all machines in the cluster to be aware of the connections
passing through each of the other machines. It ensures that if there is a failure in a cluster
member, connections that were handled by the failed machine are maintained by the other
machines. However, State Synchronization has some performance cost and occasionally
under heavy load, sync packets could even be lost.
These problems are more likely to occur in load sharing configurations and after failover.
Sync at Risk
A sync at risk condition occurs when a cluster member is not able to send delta syncs to
another cluster member at the required rate. When this happens, the sending member has
to throw away unacknowledged delta syncs, and the receiving member might therefore
receive partial (inconsistent) information.
These problems generally do not occur in High Availability configurations. However, there
may be a problem after failover.
Connectivity problems are more critical in Load Sharing configurations and especially in
asymmetric routing configurations. Even when there is no asymmetric routing, “global”
information (not per-connection) can be lost and cause connectivity issues.
1. A significant portion of the traffic crossing the cluster uses a particular service. If you
do not synchronize this service, then the amount of synchronization traffic is reduced
and cluster performance is enhanced.
2. The service usually opens short connections, whose loss may not be noticed. DNS
(over UDP) and HTTP are typically responsible for most connections, and generally
have very short life and inherent recoverability at the application level. However,
services which typically open long connections, such as FTP, should always be
synchronized.
3. Configurations that ensure bi-directional stickiness for all connections do not require
synchronization to operate (only to maintain High Availability). Such configurations
include:
o ClusterXL in a Load Sharing mode with clear connections (no VPN or static
NAT.)
o OPSEC clusters that guarantee full stickiness (refer to the OPSEC cluster's
documentation.)
When a connection is being delayed, the other cluster members are not immediately notified.
Thus, this connection is not synchronized to the other members. Delayed Synchronization
can significantly reduce the amount of synchronization traffic and improve performance.
However, if there is a failover, these connections would be terminated and connectivity
would be lost. You should consider the relative advantages and disadvantages of enabling
Delayed Synchronization.
1. From the Service tab, double-click on the desired service. The Service Properties
window opens.
4. Click OK.
1. Disable SecureXL. However, this also significantly lowers the performance of the
overall packet rate, throughput and IPS performance.
Or
References
CP R70 Firewall Administration Guide