Академический Документы
Профессиональный Документы
Культура Документы
Michael Endrizzi
Director of Services and Training
mendrizzi@midpointtech.com
• Overview
• Linux Review
• SecureXL
• CoreXL
• Balancing CoreXL Tips
VS1
Secure Network Distributor
Linux 2.6.18-XXcp
All based on Linux
Linux Kernel
Unix ps
3) CP Process
VS1 VS2
2) Fw kernel 4) Linux
Affinity Instance
Process
Affinity
FWD syslogd
logging
1) Interface
Affinity SecureXL 0) Rule Processing
SND Secure Speedup
Network
Distributor
Smart Dashboard
POLICY
User Space Edit Policy
Kernel Space
Linux TCP/IP
NIC NIC
© 2013 Midpoint Technology, Inc. 952-837-6206 – sales@midpointtech.com
13
1/24/2015 Proprietary and Confidential: No part of this document may be reproduced without permission from
Midpoint Technology, Inc
NGF Components
Kernel Space
SPLAT/GAIA Kernel (fwk)
(Security Enforcement)
cpd
• Note: fwk was moved to user mode
fwk
• With large number of VS’s, kernel was
getting too big
fwd
vpnd
Eth0: 10.2.1.101/24
Eth0 : 172.17.1.111/24
Eth3: 172.17.2.2/24
Eth0:1: 10.2.2.101/24 Eth2: 10.2.2.253/24
Eth0:2 : 172.17.2.111/24
Eth0: 10.2.1.153/24
Eth1:1 : 172.17.0.1/24
VB: internal/internet
Eth0:1 : 10.2.0.153/24
VB: Host/Host#1
20
1/24/2015
© 2013 Midpoint Technology, Inc. 952-837-6206 – sales@midpointtech.com
Proprietary and Confidential: No part of this document may be reproduced without permission from
20
1/24/2015 Midpoint Technology, Inc
Balancing Check Point Systems
• Overview
• Linux Review
• SecureXL
• CoreXL
• Balancing CoreXL Tips
• Linux Overview
• Threads
• Network Processing
Monolithic Kernel
Kernel Mod Mod
Mod ule
Kernel ule ule
Kernel
• Linux
• Linux Tervald – Still heavily involved vs design by committee vs free-for-all
• Pre-emptive kernel – Most kernel tasks can be pre-empted for higher
priority tasks
• Modular – Kernel functions can be dynamically created/removed
Check Point implements firewall subsystem in these modules
• Multi-processor support
• Threads = Processes (Unique to Linux – will explain later)
• Users can see internal kernel data in sysfs file system. Looking glass
into kernel internals
Introduced to reduce the time a kernel task held on to the processor locking
out other tasks. Overall increase in efficiency and multi-processing support.
http://www.amazon.com/Linux-Kernel-Development-3rd-Edition/dp/0672329468
© 2013 Midpoint Technology, Inc. 952-837-6206 – sales@midpointtech.com
Proprietary and Confidential: No part of this document may be reproduced without permission from
27
1/24/2015 Midpoint Technology, Inc
Linux Kernel Basics
• Linux Overview
• Threads
• Network Processing
• CPU Instructions
• Data
• Security attributes
Are you chasing a CPU or memory problem? Need to know how lack of
memory will slow a system and make it feel like it is a CPU problem.
‘top’->’f’->’u’
If you ever want to figure out why a process is swapping, you need
to be able to know what parts of a process are taking too much space.
Virtual
Size
Physical
Memory
Parameters
GROW
Program Counter
Local Variables
To 0 memory
Stack Code
Length: 1000 bytes
Length: 256 bytes
301 LOGIN:
+ username +
Packets
Mail Fragmented
Program Running as Super-User
Mike
UserName[256] Password
Environment
To High Memory
+
Virtual memory is 3 things: so swap out
1) Allows processes to think they
own all memory
10GB Disk Swap space
2) Allows processes to ignore
physical memory limitations Assuming all processes needed 3GB User space
We would need at a minimum
3) Paging system: That uses disk 12GB (3GB+3GB+3GB+3GB)User + 1GB kernel = 13GB
to swap out sleeping data to Physical and Swap space to hold all these programs.
temp storage area to free up
physical memory for active
processes.
User Level
• CPU Instructions • CPU Instructions
• Data • Data
• Security attributes • Security attributes
Kernel Level
User Level
+
• Kernel Data Structures
• Superuser privileges
http://www.amazon.com/Linux-Kernel-Development-3rd-Edition/dp/0672329468
Kernel
Process A Process B
Kernel
Process A Process B
Kernel
OS sees 4
‘logical’
processors
Logical
Processors
Threads
Cores
Physical
CPU http://blogs.msdn.com/b/gauravseth/archive/2006/03/20/555519.aspx
© 2013 Midpoint Technology, Inc. 952-837-6206 – sales@midpointtech.com
Proprietary and Confidential: No part of this document may be reproduced without permission from
50
1/24/2015 Midpoint Technology, Inc
What are CPU Threads
Enabling Symmetric Multi-Threading (SMT) or HyperThreading (HT) doubles the number of logical processors.
• Works just like Linux thread processing where a process (web server) has 2 threads (2 clients requesting
pages) and the kernel can preemptively multi-task the two threads so it seems like they are parallel
processing.
• Without HT. Each Linux thread gets a time-slice by the Linux kernel but only 1 thread runs at a time.
• With HT: At the hardware level there is a mini-Linux like kernel that can multiplex/task 2 threads concurrently.
So the two threads could conceptually start and finish within 1 kernel time slice instead of 2 separate time
slices.
• Performance improvement 30%??? on CPU intensive items. I/O intensive theoretically could slower.
• Shares a cache
WithHyperthreading, Managed by HW
TIME
http://blogs.msdn.com/b/gauravseth/archive/2006/03/20/555519.aspx
OS sees 8
HP DL380p – 8 Logical Processors ‘logical’
processors
No Hyper
Logical
Threading Processors
Threads
Cache
Cores
Physical
CPU
© 2013 Midpoint Technology, Inc. 952-837-6206 – sales@midpointtech.com
Proprietary and Confidential: No part of this document may be reproduced without permission from
52
1/24/2015 Midpoint Technology, Inc
Example: CheckPoint 12600
12 core
Process A Process B
Threads were created to support concurrent processing using SHARED data and NOT copying
data between processes or resource heavy context switching.
SHARED DATA
• Virtual Memory Space
• CPU Instructions
• Processor state
• Kernel threads
• Security attributes
• File system
Thread C
• Signals
Thread B
Thread A
Thread context switches are very fast because the kernel only has to save a several
registers (TCB) (approx 96 bytes vs. approx 1K+ with heavy process switches). This is
because all the non-saved data is shared between the threads so its live data and no
need to save it and restore it. In addition, threads can see into each others address space
because remember everything in the process is shared.
Thread A Thread B
• Thread registers Mini- Context Switch
• Program counter
• Stack pointer
Kernel
SHARED DATA
• Virtual Memory Space
• CPU Instructions
• Processor state
• Kernel threads
• Security attributes
• File system
Thread C
• Signals
Thread B
Thread A
Thread B Thread C
Which CPU?
Thread A
Which CPU?
You can watch threads switching between processors to see which processor is busy
So a Linux process is like a beehive that keeps all the common data/honey. The threads
are like the bees that go off and do their work and bring back data/honey where it is all
shared between them.
• Linux Overview
• Threads
• Network Processing
Interrupt Handler
Kernel
Poll
Hey WAKE UP
Lower priority interrupts will disable its own interrupt line (eg. Network card)
so that it doesn’t get interrupted from the same source and run uninterrupted
(unless a higher priority interrupt comes along).
Kernel
LOW PRIORITY
HIGH PRIORITY
System Clock
APICs are the hardware that manage interrupts. A motherboard had one I/O APIC that
interfaces with the hardware and talks to LOCAL APIC controllers embedded within the
CPU.
In SMP environments where IRQs can be handled by multiple CPUs APICs can be
dynamically programmed by the kernel to direct IRQs to a specific CPU for balancing out
the handling of IRQs from external devices.
http://www.amazon.com/Linux-Kernel-Development-Robert-Love/dp/0672329468/ref=sr_sp-atf_title_1_1?s=books&ie=UTF8&qid=1389542475&sr=1-
1&keywords=linux+kernel+development Page: 3322
© 2013 Midpoint Technology, Inc. 952-837-6206 – sales@midpointtech.com
Proprietary and Confidential: No part of this document may be reproduced without permission from
75
1/24/2015 Midpoint Technology, Inc
Interrupts are not Process/Threads
This is why you can’t see interrupts in a ‘ps’ or a ‘top’, different data structure than
processes/threads. Very much like them but CANNOT SLEEP/BLOCK!
User Process
Kernel
Driver Transmit Functions Interrupt Handler
Device Driver
User Process
Kernel
DMA Interface
Host
Initiate Data DMA chip
Transfer!!!
eth0
TCP/IP Stack
DMA
Space
http://www.ece.rice.edu/~willmann/teng_nics_overview.html#overview
http://www.amazon.com/Essential-Device-Drivers-Sreekrishnan-Venkateswaran/dp/0132396556 Page 505
http://www.linuxfoundation.org/collaborate/workgroups/networking/kernel_flow
http://www.ece.rice.edu/~willmann/teng_nics_overview.html#overview
http://www.amazon.com/Essential-Device-Drivers-Sreekrishnan-Venkateswaran/dp/0132396556 Page 505
http://www.linuxfoundation.org/collaborate/workgroups/networking/kernel_flow
© 2013 Midpoint Technology, Inc. 952-837-6206 – sales@midpointtech.com
Proprietary and Confidential: No part of this document may be reproduced without permission from
80
1/24/2015 Midpoint Technology, Inc
Monitor # of Interrupts per Device
/proc/interrupts keeps track of # of hw interrupts per interface since boot.
Linux will use eth0 as default for network cards until system gets busy then it tries and
re-balance between CPUs (see eth0)
Linux will start with CPU 0 handling all interrupts. (?? Does Linux auto balance??)
Interrupt affinity is used by Check Point CoreXL as we will see in the next section.
https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-cpu-irq.html
IRQ 32
CPU 1
http://en.wikipedia.org/wiki/Check_Point_VPN-1
Linux Tidbits:
GPL (GNU Public License) –
• No custom mods to Linux, they have to be shared with Linux
community as source code. There is a version 2.6.18cp (not sure status)
• Can compile code with GNU compiler and keep source
private. CP uses internal kernel modules to do this.
• Overview
• Linux Review
• SecureXL
• CoreXL
• Balancing CoreXL Tips
Symptom Verify
Text
Picture
Text
Text
Picture
Text
Text
Picture
Text
Text
Picture
Text
https://downloads.Check Point.com/fileserver/SOURCE/direct/ID/7513/FILE/CoreXL_Advanced_Configuration_Guide.pdf
User mode
fwk
Kernel mode
Firewall Dispatcher(fwkdrvr)
SND Performance Pack Packet Handler
(SecureXL acceleration)
When Check Point moved the fw kernel from the Linux kernel to User Mode, they
left only a little bit of code to work with the firewall dispatcher in place. Other than
that it was a clean compile of the User mode kernel…This was not a massive rewrite
User mode
Kernel mode
Firewall Dispatcher (fwkdrvr)
SND
VS0 SecureXL VS1 SecureXL VS2 SecureXL
SecureXL acceleration
1) Subsequent packets from a
single connection
2) Subsequent packets from
the same source IP, same dest
IP and same dest port
(multiple HTTP requests to
same dest)
© 2013 Midpoint Technology, Inc. 952-837-6206 – sales@midpointtech.com
Proprietary and Confidential: No part of this document may be reproduced without permission from
94
1/24/2015 Midpoint Technology, Inc
Packet Journey Thru SecureXL
Linux Kernel Linux User Space
Concurrent processing
of SI’s is possible unlike
Device driver hogs HW interrupts. Can be
Hardware processor. Can’t be interrupted.
Core 3
FW
Interrupt interrupted. Just Instance 3
(stops whole processor transfer data.
single thread)
SND
Device FW FW
Interface
Driver Instance 1 Instance 2
eth0
Eth0 needs
service
Can’t accelerate
Big job! Schedule this, send on to a
software specific FW SND picks instance
interrupt Instance 1 to process packet
F2F F2F
Outbound to another VS
Shared Memory
© 2013 Midpoint Technology, Inc. 952-837-6206 – sales@midpointtech.com
Proprietary and Confidential: No part of this document may be reproduced without permission from
98
1/24/2015 Midpoint Technology, Inc
SecureXL Demo - VSX
State table
Accept
connections
DOS
drops
NAT in SecureXL
Total acceleration
“C” Current Counts
# from accept
templates
NAT performed by
SecureXL
PXL: PSL + SecureXL
IPS packets
Connections sent to
firewall, NOT
XL/SLOW PATH
• If you have any random issues, immediately turn off SecureXL to determine
if there is a difference
• Using ‘top’ to monitor performance, turn SecureXL on/off and see what %SI is
doing
• Might have to distribute SecureXL across multiple cores if %SI is busy and doesn’t
autobalance. See next section.
• Monitor stats to make sure both state table and connection templates are being used
• Move HTTP 1.0 type protocols to the top of the rulebase so they get hit
• Avoid protocols that disable connection template acceleration (more on this at end)
• Overview
• Linux Review
• SecureXL
• CoreXL
• Balancing CoreXL Tips
3) CP Process
VS1 VS2
2) Fw kernel 4) Linux
Affinity Instance
Process
Affinity
FWD syslogd
logging
1) Interface SND
Secure
Affinity
Network
Distributor
• Interface Affinity
• Instance Affinity
• Process Affinity
• Linux Process Affinity
• Interface affinity is grey zone. Could be included in both SecureXL and CoreXL
• Interface affinity can be used with SecureXL license and no CORExl license
• Interface affinity can also be used without CoreXL or SecureXL license,
it is a Linux function
• Here - Interface affinity is grouped with CoreXL for completeness and
topic flow
SND is responsible for managing interfaces assigned to that core. If there are multiple
CPUs handling different interfaces, then each CPU has a different SND.
MQ enables assigning multiple interrupts per interface. Certain interface cards have
multiple TxRx queues per interface. Src/Dst flows are tied to a queue. Then queues
are assigned IRQs and tied to specific processors. This technique optimizes CPU
cache utilization.
https://greenhost.nl/2013/04/10/multi-
queue-network-interfaces-with-smp-on-linux/
IRQ 2 IRQ 3
IRQ 1 IRQ 1
• R77.10
• Only on appliances…needs the right hardware and drivers
• Supports increased throughput, not so much increased number of sessions
• Based on src/dst assigned to a CPU. So a single high throughput src/dst will
only use 1 CPU and not take advantage of multiple CPUs.
In this simple environment SND has ability to concurrently run on any core (that isn’t running
a fw instance), but by Linux default it chooses to run on CPU 0. (Probably not good because
all interfaces and processes will use CPU 0).
So if you have interfaces that are dropping packets, you might want to check this if CPU 0
is busy. POINT: Even though the configuration seems balanced, you need to verify!
NOTE: Interface settings will survive reboot (BUT not CoreXL settings (next)).
SND SND
When an interface is set to All, it will attempt to use a CPU that is NOT
being used by a firewall instance. But it will try to use a CPU that is being
used by another interface…in order to keep the the local CPU cache fresh.
So when under low CPU usage, most interfaces will default to ALL (below) and
be autobalanced as CPU and interface activity picks up. The default for ALL
is CPU 0…until CPU activity picks up
Every 60 seconds, CoreXL examines the CPUs to see if they are busy.
If they are busy it will rebalance interfaces to non-busy CPUs.
(fwkernel and Linux processes rebalance every 1-2 seconds).
(NOTE: I do NOT know how to set back to autobalance once you hard set the interfaces
except by reboot on NGF, or factor defaults on VSX)
Below the fw kernel saw the CPU going to 80 si% and rebalanced the interfaces
from ‘all’ to ‘eth1:1’, gave eth1 its own CPU
Before
After
This tells us that the CPUs have not become busy since reboot and the fw kernel
has not done any rebalancing. If the ifconfig shows packet drops, then you have
different issues than CPU not being able to handle the load.
$FWDIR/conf
• Interface Affinity
• Instance Affinity
• Process Affinity
• Linux Process Affinity
VS1 VS2
2) Firewall
Instances
FWD syslogd
logging
SND
Secure
Network
Distributor
• Note the difference between HT, Dual Core and Dual Processor. Note where the caches are! Remember this
when you assign interfaces so you keep the cache hot
• Works just like Linux thread processing where a process (web server) has 2 threads (2 clients requesting
pages) and the kernel can preemptively multi-task the two threads so it seems like they are parallel
processing.
• Without HT. Each Linux thread gets a time-slice by the Linux kernel but only 1 thread runs at a time.
• With HT: At the hardware level there is a mini-Linux like kernel that can multiplex/task 2 threads concurrently.
So the two threads could conceptually start and finish within 1 kernel time slice instead of 2 separate time
slices.
• Performance improvement 30%???
Without Hyperthreading, Managed by Kernel
Thread 1 Thread 2 Thread 1 Thread 2
WithHyperthreading, Managed by HW
http://blogs.msdn.com/b/gauravseth/archive/2006/03/20/555519.aspx
TIME
Implemented in R77+
Restrictions:
• Only enhances performance of IPS/AB/AV CPU intensive functions and
NOT I/O operations. Too many interrupts may actually slow it down.
• Supported only on R77+ GAIA
• Only on Check Point Appliances
• Has to be enabled in the BIOS
• Does not work with large number of HIDE NAT connections. Each CPU has
pre-allocated # of HIDE NAT slots. If one CPU uses all its HIDE NAT slots
then it can’t handle new HIDE NAT connections.
In NGF there is only 1 firewall. With CoreXL, the kernel will replicate itself X times,
depending on how many firewall instances you setup. Each instance will parallel process
network traffic with the SAME shared rulebase and state tables.
Each instance has an ‘affinity’ for a specific processor.
Here you can see the firewalls are individual KERNEL threads inside the kernel with a parent
of PID 1 – ‘init’. Kernel threads usually come in thread groups of size 1, unlike user space.
VSX VS’s are a different than NGF. Each VS has a 1 firewall instance that is executed by 1
corresponding Linux OS User mode process (not totally true, but not lying and still making the point).
VS VS VS VS
VS instances are implemented with Linux threads (user mode but mapped onto
kernel threads so they can be scheduled by the kernel)
fw kernel VS
=
• Virtual Memory Space
• CPU Instructions instance instance
• Processor state
• Kernel threads
• Security attributes VS
instance
VS
instance
Processes Threads
1= CoreXL OFF
2+ = CoreXL ON
pstree -p
Note that the default assignment is probably adequate for 90% of the cases….unless you have a lot of busy network
interfaces, help processes, Linux processes that interfere with the firewall instances. For example, if eth0 ran 100% of
CPU0, then you might want to move the firewall instance.
From sk98348
# of Cores # of FW # of SNDs
Instances
1 1 0 (Corexl disabled)
2 2 2
4 3 1 Reserved for
SND
6-20 # cores -2 2
21-30 # cores -4 4
© 2013 Midpoint Technology, Inc. 952-837-6206 – sales@midpointtech.com
Proprietary and Confidential: No part of this document may be reproduced without permission from
160
1/24/2015 Midpoint Technology, Inc
NGF: Assign Affinity
When you re-assign an instance to a CPU, you are telling the instance to only
use THAT CPU…when the CPU is free. So double-edge sword:
1. GOOD: Guarantee the cache will be always hot on that CPU
2. BAD: What if that CPU is busy with other assigned processes…Has to wait till
end of the other process timeslice to get CPU time. You could up its priority.
So make sure you choose a CPU that is NOT assigned to any other process if
possible.
2.2.2.2 1.1.1.1
CORE 0
fwk
VS
fwd instance
vpnd
VS0
© 2013 Midpoint Technology, Inc. 952-837-6206 – sales@midpointtech.com
Proprietary and Confidential: No part of this document may be reproduced without permission from
169
1/24/2015 Midpoint Technology, Inc
Impact of Setting Affinity
The SRC column shows at what level the affinity command for the process
was issued. ‘V’ means the command was issued to ALL the components
of a VS. ‘P’ means only to the firewall instances. ‘I’ means a single fw instance.
Here are the affinity configurations that the VSs use to set their affinities. As
you set affinities at the different levels, these files will begin to appear.
This is how a VS instance knows what affinity to use. If there is no config file
at the I instance level, then it goes up to the P Process level config file, etc.
The “I” in the SRC column means the affinity config comes from the ‘I’
config file
SETTING AFFINITY (not CoreXL Firewall Instance count) must be done manually in both members
and does not impact cluster status
https://sc1.Check Point.com/documents/R76/CP_R76_ClusterXL_AdminGuide/7298.htm
© 2013 Midpoint Technology, Inc. 952-837-6206 – sales@midpointtech.com
Proprietary and Confidential: No part of this document may be reproduced without permission from
180
1/24/2015 Midpoint Technology, Inc
Cluster XL Hints
Linux
Process Linux Firewall Instance
/Firewall Process
Instance /Firewall
Instance
Firewall Instance Firewall Instance
Linux
Process
/Firewall Linux
Instance Process
/Firewall
Instance
F SORT
j Processor
u Number of page faults
f Add
columns
j Processor
u Number of page faults
1 Show all processors
i,z Just show running processes
c Command vs process name
-e All processes
-L All threads
psr Processor id
• Interface Affinity
• Instance Affinity
• Process Affinity
• Linux Process Affinity
3) CP Process
VS1 VS2 4) Linux
Affinity Process
FWD syslogd
logging
SND
Secure
Network
Distributor
NOTE: /opt/CPsuite-R75.40VS/fw1/conf/vsaffinity_exception.conf are a list of LINUX processes that are not impacted by
the affinity command. You have to edit this list to modify LINUX process affinity
FWD
Logging/HA
• Overview
• Linux Review
• SecureXL
• CoreXL
• Balancing CoreXL Tips
Symptom Verify
eth0 to eth3
The following traffic is not throughput (state tables), nor connection-rate (accept
templates)accelerated by SecureXL:
Traffic types other than TCP, UDP, PIM, GRE, ESP
First packets of any new TCP session, unless a "template" exists
First packet in a UDP session
Traffic matching certain Firewall rules:
rules with a service that uses a resource
rules for dropping or rejecting traffic
rules where the source or destination is the gateway itself
rules with a Security Server
rules with user authentication
rules with session authentication
The following traffic is not connection-rate (accept templates) accelerated by SecureXL and will stop
building templates in the rulebase if they are found:
Non-TCP/UDP connections such as PIM, GRE, ESP ---- ICMP
Protocols that are not connection intensive such as SMTP, FTP, RPC, NFS, NNTP, NTP
Complex connections such as IPSec VPN, FTP, H.323, etc.
Traffic in environment using NAT (for security, NAT addresses can change and can be shared)
© 2013 Midpoint Technology, Inc. 952-837-6206 – sales@midpointtech.com
Proprietary and Confidential: No part of this document may be reproduced without permission from
203
1/24/2015 Midpoint Technology, Inc
Check SecureXL and Rulebase
ICMP prevents SecureXL Accept Connection templates from accelerating HTTP 1.0/type
connections. Need to move to end of rulebase.
All interfaces
on eth0!
Busy FWD
Busy core 0
Router
Router
11 1 1
VSX VSX
Traffic states cached Chassis Chassis
locally and SecureXL will
not send it through user
mode kernel.
CoreXL will cache states in
processor CPU dedicated
to those interfaces
SND SND
VLAN
VLAN
VLAN
VLAN
Core 0 Core 1
© 2013 Midpoint Technology, Inc. 952-837-6206 – sales@midpointtech.com
Proprietary and Confidential: No part of this document may be reproduced without permission from
Midpoint Technology, Inc
Tuning Tips
? ?
Using ‘top’ watch the running processes by typing ‘i’. This will only list the running
processes. Then use ‘f’ and ‘u’ for listing page faults.
Below is a real live MLM that is faulting heavily on one of the logging daemons. You can
have a fast CPU and add more CPUs, but fwd will spend most of its time swapping pages.
If you have a lot of active processes that are NOT faulting, then you have 3
options:
1. More CPUs
2. Faster CPUs
3. Set affinity on the most busiest processes so they are using their hot caches
and not spending time flushing caches
4. Set the affinity of the busiest processes to -15 so that they get a bigger chunk
of the timeslice
Goal is to allocate evenly. Obviously this is only a guess, but you’d have
to evaluate your system with the following commands to have more accurate
measurement
Symptom Verify
fw_4 10(cache 1)
fw_5 11 (cache 1)
Goal is to allocate evenly. Obviously this is only a guess, but you’d have
to evaluate your system with the following commands to have more accurate
measurement
Symptom Verify
Sync 0 All 0 4 7