CP Performance Optimization Guide

Performance Optimization Guide
Table of Contents
Preface ........................................................................................................................ 2
Open Performance Architecture Overview .................................................................. 2
SecureXL ................................................................................................................. 2
CoreXL .................................................................................................................... 2
ClusterXL ................................................................................................................. 3
Packet flows ............................................................................................................ 4
Optimizing Server Hardware and Operating System ................................................... 6
Hyper-Threading ...................................................................................................... 6
NIC Properties ......................................................................................................... 6
CPU Speed .............................................................................................................. 6
ARP Cache Table .................................................................................................... 7
Optimizing Network Performance ................................................................................ 8
Working with SecureXL ........................................................................................... 8
Working with CoreXL ............................................................................................. 12
Working with ClusterXL ......................................................................................... 16
Memory Allocation ................................................................................................. 16
SmartView Tracker Logs and dmesg Output ......................................................... 18
Optimizing the Session Rate ..................................................................................... 19
Working with SecureXL ......................................................................................... 19
Working with ClusterXL ......................................................................................... 22
Improving NAT Session Rate ................................................................................ 24
References ................................................................................................................ 24
©2009 Check Point Software Technologies Ltd. All rights reserved. 1
Classification: [Unrestricted]—For everyone

Preface
This document describes how to optimize the performance of the Security Gateway for
version R70 and later versions. This document also provides an overview of some of the
Firewall technologies in order to provide a basic understanding of how to configure the
gateway parameters to best optimize network performance.
Open Performance Architecture Overview

R70 Security Gateway includes the Open Performance Architecture which is a framework of
technologies designed to accelerate security performance. This framework includes:
 SecureXL - Accelerates traffic using specialized hardware/software
 CoreXL - Utilizes multiple cores

 ClusterXL - Utilizes multiple machines for redundancy/Load Sharing
All three technologies can work together to maximize their unique advantages.
SecureXL
SecureXL is a technology that enables offloading security processing to processing units
(hardware or software). This allows fast processing of the traffic and enables high-speed
performance.
The firewall module handles the first packet of a connection and offloads the relevant
information to the SecureXL device. Thus the SecureXL device is allowed to process all the
subsequent packets. The firewall can also offload connection templates to the SecureXL
device. In this case, a new connection that matches the template can be created in the
device and the firewall does not even process the first packet. This feature is designed to
optimize performance for connections establishment rate.
Performance Pack is a SecureXL device implemented in software, which is designed to

benefit from multiple core CPU architecture.
CoreXL
CoreXL is a technology that allows Firewall and IPS security code to run on multiple
processors concurrently. The CoreXL layer accelerates traffic that cannot be handled by the
SecureXL device or traffic that requires deep packet inspection.
CoreXL is able to provide near linear scalability of performance, based on the number of
processing cores on a single machine. This increase in performance is achieved without
requiring any changes to management or network topology.
In a CoreXL gateway, the firewall kernel is replicated so that each replicated copy (instance)
runs on a processing core. These instances handle traffic concurrently, and each instance is
a complete and independent inspection kernel.

ClusterXL
ClusterXL is a software based Load Sharing and High Availability solution that distributes
network traffic between clusters of redundant Security Gateways. It also provides
transparent failover between machines in a cluster.
A Security Gateway Cluster is a group of identical gateways that are connected, so that if
one fails, another immediately takes its place.
ClusterXL provides an infrastructure that ensures that no data is lost in case of a failover,
because each Gateway Cluster member is aware of the connections passing through the
other members via state synchronization.
ClusterXL Operation Modes

ClusterXL can be configured to operate in three different modes:
 High Availability Mode
 Load Sharing Multicast Mode
 Load Sharing Unicast Mode
Each mode has its relative advantages and disadvantages.
High Availability Mode
When ClusterXL is set to High Availability mode, it designates one of the cluster members as
the active machine and the rest of the members are kept in a stand-by mode. All traffic is
directed to the active member. The active member updates the stand-by members of any
state changes, so that if the active member goes down, they can be immediately substituted
for it.
In this mode you only utilize the processing power of a single machine.
Load Sharing Mode:
When ClusterXL is set to Load Sharing mode, you can distribute network traffic between the
cluster members. Unlike High Availability mode, where only a single member is active at any
given time, in Load Sharing mode all the cluster members are active. The whole cluster is
responsible for assigning a portion of the traffic to each cluster member and this usually
leads to an increase in total throughput of the cluster.

ClusterXL offers two separate Load Sharing solutions: Multicast mode and Unicast mode.
The difference between the two modes is how the members receive the packets sent to the
cluster.
 Multicast mode - all packets sent to the cluster reach all the members in the cluster. Each
member then decides whether it should process the packets or not. This mode presents
better performance figures for connections establishment rate than Unicast mode.
 Unicast mode - a single cluster member, referred to as the pivot, receives all the packets
sent to the cluster. The pivot is then responsible for propagating the packets to other cluster
members, creating a Load Sharing mechanism. The pivot member still acts as a firewall
module that processes packets. However, the other members can perform other tasks for
the pivot in order to reduce its total load and performance.
NOTE: To support ClusterXL Load Sharing Multicast, extra configuration settings may be
required on the connected router. For more information on ClusterXL Load Sharing Multicast
configuration mode, see the R70 ClusterXL Administration Guide.
Packet flows
When SecureXL is enabled, a packet enters the firewall and first reaches the SecureXL
device. The device can choose to handle the packet in three ways:
1. Acceleration path - The packet is completely handled by the SecureXL device. It is
processed and sent back again to the network. This path does all the IPS processing
when CoreXL is disabled.
2. Medium path - The packet is handled by the SecureXL device, except for IPS
processing. The CoreXL layer passes the packet to one of the firewall instances, to
perform IPS processing. This path is only available when CoreXL is enabled.
3. Firewall path - The SecureXL device is unable to process the packet. It is passed on to
the CoreXL layer and then to one of the instances, for full firewall processing. This path
also processes all packets when SecureXL is disabled.

The following diagram displays the three different packet flows.
Instance0
Instance1
Firewall Instance 2
Medium
Path
Firewall Instance 3
Path
Medium
Path
Firewall Path
Medium
Path
Medium Path
Firewall
Path Instance
Path N
Queue
Queue
Medium Firewall
Queue
Path Path
Queue
Queue
Dispatcher
Performance Pack
Accelerated
Path
Medium
Path
Firewall
Path

Optimizing Server Hardware and Operating System
The configuration of the server's hardware and operating system can affect the performance
of the R70 Security Gateway. When you use a server that is not configured properly, you are
diminishing network performance. Some of these configurations are only relevant for an
open server. The server should conform to the following configurations in order to optimize
performance.
If you are using a Check Point appliance, you only need to refer to the ARP Cache Table
section.
Hyper-Threading
Hyper-Threading can cause negative impact on performance of the R70 Security Gateway. It
is recommended that you disable this capability.
If you are using a Check Point appliance, Hyper-Threading is disabled by default.
NIC Properties
This configuration is only for an open server. There are four issues related to the NIC that
can affect performance of the R70 Security Gateway.
1. HCL support
You should verify that you are using certified NICs with the following link:
http://www.checkpoint.com/services/techsupport/hcl/index.html
2. PCI Express
You should use the PCI-Express NICs, because they have better performance than
PCI-X NICs.
3. Speed
Use ethtool <interface name> to verify that the NIC is working at the desired
speed and using full-duplex settings.
4. Statistics
Use ethtool -s ethx to check statistics for the NICs. A properly working system
should display minimal rx/tx drop/error statistics.
CPU Speed
This configuration is only for an open server. If performance is low, use the cat
/proc/cpuinfo command to extract information about the CPU model and speed. You
may be able to improve performance if you upgrade the CPU frequency speed.

ARP Cache Table
This configuration is relevant to a Check Point appliance and an open server. The default
limit of the kernel ARP Cache table is 1024 entries. You can increase the number of entries
to improve network performance. You should increase the ARP Cache table if the dmesg
command displays the message “Neighbour table overflow”.
NOTE: You should also increase the ARP Cache table if you are testing large subnets that
are directly connected to the gateway without a router.
To change the number of ARP entries:
The number of ARP entries is controlled by the net.ipv4.neigh.default.gc_thresh3

parameter. There are two ways to change the number of ARP entries:
 Format the /etc/sysctl.conf file and run the sysctl –p command. This change
survives boot. (See Example 1.)
 Run the sysctl command. This change does not survive boot. (See Example 2.)
The following examples demonstrate how to increase the number of ARP entries to 4096, to
allow for 4096 IPs.
Example 1
Modify the /etc/sysctl.conf file to include the line:
net.ipv4.neigh.default.gc_thresh3 = 4096
net.ipv4.neigh.default.gc_thresh2 = 2048
Run the sysctl -p command for the change to take effect.
Example 2
Run the command:
sysctl -w net.ipv4.neigh.default.gc_thresh3=4096
sysctl -w net.ipv4.neigh.default.gc_thresh2=2048

Optimizing Network Performance
This section discusses factors which affect network performance.
Working with SecureXL

This section discusses how SecureXL can have an impact on network performance.
Conditions that Preclude Accelerated Traffic

When SecureXL is enabled, all traffic should be accelerated. However, traffic that matches
the following conditions would not be accelerated:
 Enabling some features can disable SecureXL altogether. For example:

o ClusterXL sticky decision function
o QoS
 The first packet of any new TCP session, unless a template exists.
 The first packet of any session that requires NAT.
 The first packet of any new UDP session, unless a template exists.
 All traffic that matches a service that uses a resource.
 All traffic that is supposed to be dropped or rejected, according to the rule base (consider
enabling Drop Templates - see below).
 All traffic whose source or destination is the gateway itself.

 All traffic that matches a rule with user authentication or session authentication.
 All traffic that requires anti virus or anti spam filtering.
 Non-TCP/UDP/GRE/ESP traffic.
 All multicast traffic.
 All fragmented traffic.
 All traffic with IP options.
 RST packets, when the "Spoofed Reset Protection" feature is activated.
 Traffic that is suspected to violate firewall protections, such as TCP sequence verification
(packets with abnormal sequences) or anti-spoofing (packets which come from an
unexpected interface).
Managing Non-Accelerated Traffic

Usually, the majority of network traffic should be accelerated when you are running
SecureXL. If you suspect that the majority of traffic is non-accelerated, you may need to
analyze SecureXL logs to identify the cause.

There are two actions that you can perform:
1. Confirm that the majority of the traffic is non-accelerated.
2. Review and tune the firewall policy and IPS protections (refer to sk33250 and R70
IPS Administration Guide). .
Confirming Non-Accelerated Traffic

Use the fwaccel stats command to verify the amount of non-accelerated traffic
compared to accelerated traffic. In the following example there are 124 accelerated packets
and 766,058 packets that are non-accelerated.
# fwaccel stats
Name Value Name Value
-------------------- --------------- -------------------- ---------------
conns created 480 conns deleted 471
temporary conns 0 templates 0
nat conns 0 accel packets 124
accel bytes 13360 F2F packets 766058
ESP enc pkts 0 ESP enc err 0
ESP dec pkts 0 ESP dec err 0
ESP other err 0 espudp enc pkts 0
espudp enc err 0 espudp dec pkts 0
espudp dec err 0 espudp other err 0
AH enc pkts 0 AH enc err 0
AH dec pkts 0 AH dec err 0
AH other err 0 memory used 0
free memory 0 acct update interval 3600
current total conns 8 TCP violations 0
conns from templates 0 TCP conns 4
delayed TCP conns 0 non TCP conns 4
delayed nonTCP conns 0 F2F conns 8
F2F bytes 48076865 crypt conns 0
enc bytes 0 dec bytes 0
Name (Statistic Parameter) Explanation
accel packets Number of accelerated packets
accel bytes Number of accelerated traffic bytes
F2F packets Number of packets handled by the Security

Gateway in slow-path
conns from templates Number of connections created from templates
F2F bytes Number of traffic bytes handled by the Security

Gateway in the firewall path

TIP: You can use the following commands to enable debugging in SecureXL and
Performance Pack in order to understand and identify causes for non-accelerated traffic.
Command Explanation
fw ctl debug –buf 32000 Set debug buffer
fwaccel dbg + offload Debug SecureXL offload mechanism
sim dbg + f2f Debug Performance Pack forward to firewall

incidents
fw ctl kdebug –T -f > Forward debug output to a file

debug.txt&
NOTE: Enabling debug might have a negative impact on performance.
To disable debug:
 Run the sim dbg resetall and fw ctl debug 0 commands.
Disabling Performance Pack

If the majority of traffic cannot be accelerated, disabling the Performance Pack might
improve performance.
To disable Performance Pack:
 Run the cpconfig command.
An interactive menu is displayed and provides you with the option to enable or disable
the accelerated traffic by selecting Enable/Disable Check Point SecureXL. Select
Enable in order to enable accelerated traffic. Select Disable in order to disable
accelerated traffic.
IPS Protections
Some protections can cause an adverse affect on the performance of the gateways on which
they are activated. These protections must use more resources, or they apply to common
types of traffic.
 Protections with a critical performance impact normally prevent SecureXL from

accelerating the traffic and can significantly reduce network performance.
 Protections with a high performance impact may also reduce network performance.

Protections that have a critical or high performance impact should only be activated when
there is a critical or high severity, or they are specifically needed. If your gateways
experience heavy traffic load, be careful when activating high/critical performance impact
protections on profiles that affect a large number of mixed (client and server) machines.
IPS Exceptions
For protections which prevent SecureXL from accelerating traffic, the IPS exception
mechanism allows SecureXL to accelerate connections that match the exception rules.
For example:
 “Network Quota” protection in R70 does not disable SecureXL templates on connections
that match the protection's exception rules.
 IP ID Masking, and TTL Masking (Fingerprint Scrambling) protections do not disable

templates and acceleration on connections that match these protections' exception rules
For further information regarding IPS, refer to the R70 IPS Administration Guide.
Dropped Templates
You should enable drop templates to improve the Security Gateways’ performance when a
large part of the traffic matches a drop rule. This feature allows Performance Pack to handle
the drops. This feature is disabled by default.
To enable drop templates:
1. Open Policy>Global Properties from the SmartDashboard.
2. Select the SmartDashboard Customization window and click Configure.
3. Select Firewall-1>SecureXL.
4. Check enable_drop_templates.
The following table contains CLI commands that can help you manage drop templates:
Command Result
fwaccel stat To check the status of drop templates
fwaccel templates –d To view current dropped templates
fwaccel stats –d To get statistics about dropped templates
sim ranges –a To view the Security Gateway's rule base ranges
(output goes to /var/log/messages)
Drop templates (fwaccel stats –d) contains an index of ranges. If you correlate the
index with sim ranges, then you can better understand the practical ranges for drop
templates and when it is appropriate to use them.

Working with CoreXL
This section discusses how CoreXL can have an impact on network performance.
CPU Roles
The cores in a multi-core machine can assume several roles, including:
 Secure Network Dispatcher (SND)

 Kernel Instance
 Daemon
Secure Network Dispatcher (SND)

This role is responsible for:
 Processing incoming traffic from the network interfaces.

 If Performance Pack is running - processing packets which can be accelerated
(acceleration path).
 Distributing non-accelerated packets among kernel instances for IPS and Firewall
inspection.
Traffic entering network interface cards (NICs) is directed to a processing core running the
SND. The association of a particular interface with a processing core is called the interface’s
affinity with that core. This affinity causes the interface’s traffic to be directed to that core and
then SND runs on that core.
Kernel instance
A firewall kernel instance is configured to run on a particular core which is responsible for the
following:
 Firewall processing (firewall path)

 IPS processing (medium path)
Traffic which is not accelerated by Performance Pack is forwarded to one of the instances
for further processing.

Daemon
The firewall daemon (fwd) and other daemons can be configured to run on a dedicated
core.
Regarding the firewall daemon, this can be useful when there is massive logging that
consumes a lot of CPU resources.
IMPORTANT: Under normal circumstances, it is not recommended for the SND and an
instance to share a core. However, it is necessary in the following cases:
1. When using a machine with only two cores. It is better for both SND and instances
to share cores, instead of giving each only one core.
2. When you know that almost all of the packets are being processed in the
accelerated path, and you want to assign all CPUs to this path. If the instances do
not receive significant work, then it is appropriate to share the cores.
Balancing Core Utilization

In many cases, the CPU can be overloaded and can create a performance bottleneck. You
should balance the CPU usage between the cores to optimize performance.
Optimizing Core Utilization

In some cases, you should change the default configuration and divide the cores between
kernel instances and SND for optimal performance.
The following table describes the default configuration of cores and kernel instances:
Number of Cores Number of Kernel Instances
1 CoreXL is disabled
2 2
4 3
8 6
For more information on configuring the cores, refer to the CP R70 Firewall Administration
Guide.
To optimize core utilization:
1. Use the fw ctl affinity -l -r command to understand the role of each CPU.
You can view the cores that are handling kernel instances.
2. Cores that do not have a kernel instance running are for SND to use. The interfaces'
affinity should only be mapped to these cores.

3. Run the top command to see which cores are heavily utilized.
a. If SND cores are more heavily used than instance cores - you may want to
decrease the number of instances, to allow SND to use another core.
b. If instance cores are more heavily used than SND cores - you may want to
increase the number of instances, to share the work among more instances.
To increase or decrease the number of instances, use the CoreXL

configuration menu in cpconfig.
NOTE: After the top command is entered, you need to press 1 to view usage per CPU. To
make this the default view, select SHIFT+W.
Distributing Interfaces to the Cores

You should distribute the interfaces affinity equally between the cores which are available for
SND processing. The default configuration is:
 If Performance Pack is enabled - interface affinity is handled in automatic mode. In this

mode, Performance Pack determines affinity based on the load. You may want to switch to
manual mode and set interface affinity yourself, and possibly improve the performance.
 If Performance Pack is disabled- all interfaces' affinity are mapped to a single core. If you
have more than one core available, you should change the affinity of some interfaces to use
the other cores.
To distribute the interfaces:
1. Run the top command to display how the SND cores are being used.
2. If the cores are unbalanced, you should distribute the interfaces.
o If Performance Pack is enabled - run the sim affinity -s command to

use static affinity to balance the interfaces between the SND cores.
o If Performance Pack is disabled – run the fw ctl affinity -s command

to use static affinity to balance the interfaces between the SND cores.
Working with Cores

Here are some important tips to remember when you are working with cores.
 You should map heavily used interfaces' affinity to separate cores.

 If Performance Pack is enabled and you have a pair of interfaces that serve the same
connections, then you should map the interfaces' affinity to the same core. In most cases,
Performance Pack’s automatic affinity provides the optimal utilization. If this is not the case,
it is recommended performance-wise to manually set the affinitiy of interfaces using the sim
affinity –s command.
For more information, refer to the “sim affinity” section in the R70 Performance Pack
Administration Guide.

Additional performance tips can be found in sk33250.
Allocating a Core for Heavy Logging

If the gateway is performing heavy logging, it may be advisable to allocate a processing core
to the fwd daemon, which performs the logging. Just as adding a core for the SND, this too
also reduces the number of cores available for kernel instances.
To allocate a processing core to the fwd daemon:
1. Reduce the number of kernel instances using cpconfig.
2. Set the fwd daemon affinity, as detailed below.
Setting the fwd Daemon Affinity

Check which processing cores are running the kernel instances and which cores are
handling interface traffic with the fw ctl affinity -l –r command. Set the fwd
daemon affinity to the remaining core in order to allocate it to the fwd daemon.
NOTE: If interface affinities are attached to a specific core, then you should avoid setting the
affinity of the fwd daemon to these cores. In general, it is recommended to attach a core
with only one of the following components: network interfaces, kernel firewall instances or
user space processes/daemons. You should avoid having more than one these components
attached to the same core.
When you set affinities for Check Point daemons (such as the fwd daemon), they are loaded
at boot from the fwaffinity.conf configuration text file located at: $FWDIR/conf.
Edit the file by adding the following line:
n fwd <cpuid>
where <cpuid> is the number of the processing core to be set as the affinity of the
fwd daemon.
For example, to set core #2 as the affinity of the fwd daemon, add to the file:
n fwd 2
 You must reboot the server in order that the fwaffinity.conf settings take effect.
 After reboot, you can verify the configuration by running the command: fw ctl
affinity -l -r.
Here is an example of the output:
# fw ctl affinity -l -r
CPU 0: Mgmt Lan1 Lan2
CPU 1: Lan3 Lan4
CPU 2: fwd
CPU 3: fw_4
CPU 4: fw_3
CPU 5: fw_2

CPU 6: fw_1
CPU 7: fw_0
All: cprid cpd
VPN and VoIP Traffic

With CoreXL, VPN tunnel establishment and VoIP control connection are processed in
firewall instance 0. This means that CoreXL does not provide scalability for these scenarios.
If Performance Pack is enabled, then the VPN traffic and VoIP data connections are
accelerated by the Performance Pack and pass through the acceleration path to achieve low
latency and high performance.
Firewall and IPS Inspection

When you are running CoreXL, optimal performance is achieved when the connections are
load balanced across the instances and all the cores are working in parallel. See the section,
Balancing Core Utilization for more information.
In lab staging tests (when running with CoreXL) you should use many source and/or
destination IPs. Usually, several hundred distinct IP pairs should be sufficient to balance the
connections amongst the kernel instances. Do not use an extremely high number of IPs,
because this may make the templates ineffective.
Working with ClusterXL

This section discusses how ClusterXL can have an impact on network performance.
Static NAT with SmartDefense Protections

Using Static NAT with SmartDefense protections can result in circumstances where
asymmetric routing between the cluster members has a negative impact on network
performance. Asymmetric routing or a non-sticky connection is where one member in a Load
Sharing configuration handles one direction of the connection and a different member
handles the second direction.
Some of the SmartDefense protections require the connection to be sticky - the packet must
be handled by the same cluster member. Network performance can be reduced when a
sticky connection is combined with asymmetric routing. For example:
 Flush and ACK - The return packet for this connection is not going to be handled by the
original cluster member. The original member holds the packet until it is synchronized and
acknowledged by the other member.
 Forwarding - A cluster member forwards packets to the member that handled the first
packet of the connection.
Memory Allocation
Memory allocation failures can reduce the performance of the system.
NOTE: If a memory allocation failure occurs, you should not perform lab tests for achieving
best performance. For example, do not perform a lab test if there are too many concurrent
connections.

To view if memory allocations have failed:
1. Run the fw ctl pstat command.
2. Search for failures in kmem and smem. (These values are bolded in the following
example.)
This is an example of a sample output of memory allocations:

Machine Capacity Summary:
Memory used: 20% (165MB out of 823MB) - below low watermark
Concurrent Connections: 0% (25 out of 999900) - below low watermark
Aggressive Aging is not active
Hash kernel memory (hmem) statistics:
Total memory allocated: 257949696 bytes in 62914 4KB blocks using
62 pools
Initial memory allocated: 20971520 bytes (Hash memory extended by
236978176 bytes)
Memory allocation limit: 862978048 bytes using 512 pools
Total memory bytes used: 5977548 unused: 251972148 (97.68%)
peak: 72292464
Total memory blocks used: 1726 unused: 61188 (97%) peak:
17803
Allocations: 95277797 alloc, 0 failed alloc, 95201809 free
System kernel memory (smem) statistics:
Total memory bytes used: 392118672 peak: 420534724
Blocking memory bytes used: 422932 peak: 958412
Non-Blocking memory bytes used: 391695740 peak: 419576312
Allocations: 5894755 alloc, 0 failed alloc, 5893180 free, 0 failed
free
Kernel memory (kmem) statistics:
Total memory bytes used: 139844652 peak: 204210436
Allocations: 101171165 alloc, 0 failed alloc, 101094365 free,
0 failed free
External Allocations: 0 for packets, 2660 for SXL
Note: Even though failures in hmem are legitimate, they might impact performance especially
when CoreXL is enabled. For optimal performance, there should not be any failed memory
allocations.
Resolving memory problems

Here are some possible solutions to memory allocation problems:
 On open servers, you can install more memory. However, the maximum amount of
memory that can be used by the kernel is 2 GB.
 You can decrease the TCP end timeout.

 You can decrease the number of concurrent connections to reduce memory consumption.

SmartView Tracker Logs and dmesg Output
You can use SmartView Tracker logs and dmesg output to help you detect problematic
events that can impede network performance. You may encounter one or more of the
following events: cluster failovers, cluster overload synchronization, memory problems, and
dropped packets.
Sample SmartView Tracker Logs

The following SmartView Tracker logs are examples of events that can impede network
performance:
- member [ID] ([IP]) <is active|is down|is stand-by|is initializing> ([REASON]).
This message is issued whenever a cluster member changes its state. The log text
specifies the new state of the member.
- [DEVICE] on member [ID] ([IP]) detected a problem ([REASON]).
Either an error was detected by the pnote device, or the device has not reported its state
for a number of seconds (as set by the “timeout” option of the pnote)
- interface [INTERFACE NAME] of member [ID] ([IP]) is down (receive <up|down>,

transmit <up|down>).
This message is issued whenever an interface encounters a problem, either in receiving

or transmitting packets. Note that in this case the interface may still be working properly,
as far as the OS is concerned, but is unable to communicate with other cluster members
due to a faulty cluster configuration.
Sample dmesg Log

The following dmesg log is an example of an event that can impede network performance:
FW-1: State synchronization is in risk. Please examine your synchronization network to

avoid further problems!
For more information on the dmesg log see the R70 ClusterXL Administration Guide.

Optimizing the Session Rate
This section discusses factors which affect session rate and can have an impact on
performance.
Working with SecureXL

This section discusses how SecureXL can have an impact on session rate.
Concurrent Connections
You should ensure that the total number of concurrent connections is appropriate to the TCP
end timeout. Too many concurrent connections can impede the performance of the R70
Security Gateway.
You can calculate the maximum number of concurrent connections by multiplying the
session establishment rate by the TCP end timeout (by default, 20 seconds).
NOTE: To test session rate many connections need to be opened. You must ensure that the
test is not limited by the maximum number of connections in order for the test to be valid.
To compare the number of concurrent connections with maximum limit of

connections:
1. Use the fw tab -t connections command to display the maximum limit of the
connections table.
For example:
[Expert@cpmodule]# fw tab -t connections
localhost:
-------- connections --------
dynamic, id 8158, attributes: keep, sync, aggressive aging, kbuf
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31, expires 25, refresh,
limit 1000000, hashsize 1048576, free function c2f372c0 0, post
sync handler c2f2b230
2. Use the fw tab -t connections -s command to find out the concurrent

number of entries in the connections table.
For example:
[Expert@cpmodule]# fw tab -t connections -s
HOST NAME ID #VALS #PEAK #SLINKS
localhost connections 8158 26 244914 40

3. If the peak number of connections has reached the limit, you must perform one of the
following actions:
o Reduce the TCP end timeout.
a) From SmartDashboard, select Policy>Global Properties. The Global

Properties window opens.
b) Select Stateful Inspection.
c) Decrease the number in the TCP end timeout: field.
o Increase the maximum concurrent connections.
a) From SmartDashboard, double click on the gateway object. The

Check Point Gateway window opens.
b) Select Capacity Optimization.
c) Increase the number in the Maximum concurrent connections: field.
NOTE: When Aggressive Aging is enabled and the number of concurrent connections is
near the limit, there can be a performance impact.
Aggressive Aging
Aggressive Aging is triggered when memory consumption is high, and the R70 Security
Gateway deletes some connections to reduce consumption. It destroys old connections,
particularly closed TCP sessions, which were closed at least 3 seconds ago. Aggressive
Aging reduces the number of concurrent connections to prevent memory exhaustion.
However, when Aggressive Aging starts deleting connections, there is a noticeable
performance impact.
NOTE: Aggressive Aging can invalidate a performance test. For best results, you should
ensure that Aggressive Aging is not active during the test. You should disable it, or run the
fw ctl pstat command to make sure that less than 70% of the machine's memory is
used by the test. For more information on machine memory, refer to the Memory Allocation
section.
Templates
In order to accelerate connection establishment, there is a mechanism that attempts to
"group together" all connections that match a specific service but have a different source
port. When the first packet of the first connection in such a group is seen, it is processed by
the firewall, which offloads the connection to the SecureXL device. The firewall also offloads
a “template”, which allows the device to accelerate all other connections in this group. When
the first packet of another connection in this group arrives, the acceleration device can
handle it by itself. This "grouping" allows the acceleration device to handle almost all
packets, including even the first packet of most connections.

To verify that templates are being created:
 Run the fwaccel stat command.

Here is a sample output of the fwaccel stat command. The second line has been
bolded to indicate that templates are being created.
Accelerator Status : on
Accept Templates : enabled
Drop Templates : disabled
Accelerator Features : Accounting, NAT, Cryptography, Routing,
HasClock, Templates, Synchronous, IdleDetection,
Sequencing, TcpStateDetect, AutoExpire,
DelayedNotif, TcpStateDetectV2, CPLS, WireMode,
DropTemplates, Streaming, MultiFW, AntiSpoofing,
DoS Defender
Cryptography Features : Tunnel, UDPEncapsulation, MD5, SHA1, NULL,
3DES, DES, CAST, CAST-40, AES-128, AES-256,
ESP, LinkSelection, DynamicVPN, NatTraversal,
EncRouting
If templates are not being created, then there is a rule that is preventing a template from
being created. Refer to the section, Using Templates with Rules for more information.
Conditions that Prevent Using Templates

There are several conditions that can prevent a template from being created or from being
effective:
 The connections cannot be grouped because the source port is not the only variation. A
template is not created for these connections and the first packet is handled by the firewall
path.
 Traffic which requires NAT does not use a template.
 VPN traffic does not use a template.
 Complex connections (FTP, H323, etc.) do not use a template.
 Non-TCP/UDP traffic does not use a template.
Using Templates with Rules

Some rules in the SmartDashboard can prevent a template from being created. All traffic
which matches this rule is affected, as well as any rule below it. In SmartDashboard, you
should place all rules that can use a template at the top of the rule base (unless this violates
other considerations). After you have changed the rule base, SecureXL automatically
creates new templates for grouped connections.

Here are rules that can prevent a template from being created:
 Rules with the following objects:

o Time object
o Port range object
o Dynamic object
 Rule with a service that has a handler (protocol type) enabled.

 Rules with "complex" services. (i.e. Services that have anything specified in the Match
field, or Enable reply from any port of their Advanced section)
 Rules with RPC/DCOM/DCE-RPC services.

 Rules with client authentication or session authentication.
 When SYN Defender or Small PMTU features are activated.
Delayed Notification
A SecureXL device may create a connection that matches a template, and notify the firewall
about the connection only after a period of time. This feature further enhances the
connection rate of the SecureXL device.
 The fwaccel stats command indicates the total number of delayed connections
(delayed TCP conns.)
Refer to the section, Managing Non-Accelerated Traffic for more information.
 The fwaccel templates command indicates the delayed time for each template under
the DLY entry.
If you are using a single gateway device – Delayed Notification is enabled by default.
If you are using a ClusterXL gateway – Delayed Notification is disabled by default.
Working with ClusterXL

This section discusses how ClusterXL can have an impact on session rate.
State Synchronization
State Synchronization enables all machines in the cluster to be aware of the connections
passing through each of the other machines. It ensures that if there is a failure in a cluster
member, connections that were handled by the failed machine are maintained by the other
machines. However, State Synchronization has some performance cost and occasionally
under heavy load, sync packets could even be lost.

If you receive the following error messages when running dmesg, then there may be
connectivity problems.
"FW-1: State synchronization is in risk. Please examine your

synchronization network to avoid further problems !”
These problems are more likely to occur in load sharing configurations and after failover.
Sync at Risk
A sync at risk condition occurs when a cluster member is not able to send delta syncs to
another cluster member at the required rate. When this happens, the sending member has
to throw away unacknowledged delta syncs, and the receiving member might therefore
receive partial (inconsistent) information.
A sync at risk condition might result in connectivity problems.
 These problems generally do not occur in High Availability configurations. However, there
may be a problem after failover.
 Connectivity problems are more critical in Load Sharing configurations and especially in
asymmetric routing configurations. Even when there is no asymmetric routing, “global”
information (not per-connection) can be lost and cause connectivity issues.
Resolving a Sync at Risk Condition

You can resolve a sync at risk condition and decide not to synchronize a service if ALL of the
following conditions are true:
1. A significant portion of the traffic crossing the cluster uses a particular service. If you
do not synchronize this service, then the amount of synchronization traffic is reduced
and cluster performance is enhanced.
2. The service usually opens short connections, whose loss may not be noticed. DNS
(over UDP) and HTTP are typically responsible for most connections, and generally
have very short life and inherent recoverability at the application level. However,
services which typically open long connections, such as FTP, should always be
synchronized.
3. Configurations that ensure bi-directional stickiness for all connections do not require
synchronization to operate (only to maintain High Availability). Such configurations
include:
o Any cluster in High Availability mode (for example, ClusterXL New HA or

Nokia VRRP.)
o ClusterXL in a Load Sharing mode with clear connections (no VPN or static
NAT.)
o OPSEC clusters that guarantee full stickiness (refer to the OPSEC cluster's
documentation.)

Delayed Synchronization and ClusterXL
In a ClusterXL configuration, the SecureXL Delayed Synchronization feature is disabled by
default. You may want to enable Delayed Synchronization to improve session rate.
When a connection is being delayed, the other cluster members are not immediately notified.
Thus, this connection is not synchronized to the other members. Delayed Synchronization
can significantly reduce the amount of synchronization traffic and improve performance.
However, if there is a failover, these connections would be terminated and connectivity
would be lost. You should consider the relative advantages and disadvantages of enabling
Delayed Synchronization.
To enable Delayed Synchronization from SmartDashboard:
1. From the Service tab, double-click on the desired service. The Service Properties
window opens.
2. Click Advanced…. The Advanced Service Properties window opens.
3. Select the Start Synchronizing checkbox.
4. Click OK.
Improving NAT Session Rate

You can disable SecureXL to improve the NAT session rate.
To improve NAT session rate:
1. Disable SecureXL. However, this also significantly lowers the performance of the
overall packet rate, throughput and IPS performance.
2. Do one of the following:
 Decrease TCP end timeout to 2 seconds.
Refer to the Concurrent Connections section, for more information on

decreasing TCP end timeout.
Or
 Increase the dispatcher connection table hash size by editing

$FWDIR\modules\fwkern.conf with fwmultik_gconn_tab_hsize=
8388608 and rebooting the machine. However, this change reduces the capacity
of the maximum number of concurrent connections.
References
CP R70 Firewall Administration Guide
CP R70 PerformancePack Administration Guide
CP R70 ClusterXL Administration Guide
CP R70 IPS Administration Guide

CP Performance Optimization Guide

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

CP Performance Optimization Guide

Загружено:

Авторское право:

Доступные форматы

Performance Optimization Guide

©2009 Check Point Software Technologies Ltd. All rights reserved. 1

Classification: [Unrestricted]—For everyone

Open Performance Architecture Overview

 SecureXL - Accelerates traffic using specialized hardware/software

 CoreXL - Utilizes multiple cores

Performance Pack is a SecureXL device implemented in software, which is designed to

©2009 Check Point Software Technologies Ltd. All rights reserved. 2

Classification: [Unrestricted]—For everyone

ClusterXL Operation Modes

 High Availability Mode

 Load Sharing Multicast Mode

 Load Sharing Unicast Mode

Each mode has its relative advantages and disadvantages.

High Availability Mode

Load Sharing Mode:

©2009 Check Point Software Technologies Ltd. All rights reserved. 3

Classification: [Unrestricted]—For everyone

©2009 Check Point Software Technologies Ltd. All rights reserved. 4

Classification: [Unrestricted]—For everyone

©2009 Check Point Software Technologies Ltd. All rights reserved. 5

Classification: [Unrestricted]—For everyone

If you are using a Check Point appliance, Hyper-Threading is disabled by default.

©2009 Check Point Software Technologies Ltd. All rights reserved. 6

Classification: [Unrestricted]—For everyone

To change the number of ARP entries:

The number of ARP entries is controlled by the net.ipv4.neigh.default.gc_thresh3

Modify the /etc/sysctl.conf file to include the line:

Run the sysctl -p command for the change to take effect.

Run the command:

©2009 Check Point Software Technologies Ltd. All rights reserved. 7

Classification: [Unrestricted]—For everyone

Working with SecureXL

Conditions that Preclude Accelerated Traffic

 Enabling some features can disable SecureXL altogether. For example:

 All traffic whose source or destination is the gateway itself.

Managing Non-Accelerated Traffic

©2009 Check Point Software Technologies Ltd. All rights reserved. 8

Classification: [Unrestricted]—For everyone

1. Confirm that the majority of the traffic is non-accelerated.

Confirming Non-Accelerated Traffic

Name (Statistic Parameter) Explanation

accel packets Number of accelerated packets

accel bytes Number of accelerated traffic bytes

F2F packets Number of packets handled by the Security

conns from templates Number of connections created from templates

F2F bytes Number of traffic bytes handled by the Security

©2009 Check Point Software Technologies Ltd. All rights reserved. 9

Classification: [Unrestricted]—For everyone

fw ctl debug –buf 32000 Set debug buffer

fwaccel dbg + offload Debug SecureXL offload mechanism

sim dbg + f2f Debug Performance Pack forward to firewall

fw ctl kdebug –T -f > Forward debug output to a file

NOTE: Enabling debug might have a negative impact on performance.

 Run the sim dbg resetall and fw ctl debug 0 commands.

Disabling Performance Pack

To disable Performance Pack:

 Run the cpconfig command.

 Protections with a critical performance impact normally prevent SecureXL from

©2009 Check Point Software Technologies Ltd. All rights reserved. 10

Classification: [Unrestricted]—For everyone

 IP ID Masking, and TTL Masking (Fingerprint Scrambling) protections do not disable

To enable drop templates: