Linux性能调优-Linux performance tuning knobs

Seminar: Linux Performance Tuning Knobs
Xuekun Hu PRC Scalability Lab
SSG/PRC Scalability Lab
Intel Confidential Slide 1
My experience of building Linux knowledge man src/Document/* Google Ask people Summarize and Write down
Agenda
Know your System and Workloads Basic Performance Tuning Knobs
System tuning knobs for limitations File system tuning TCP/IP stack tuning Memory system tuning
Small workloads
Know your system and workloads /proc/cpuinfo : cpu info /proc/meminfo : mem info dmesg : system info fdisk -l : disk info sar : OS, network, &cpu related counters
sar -bBcrwqWuR -P ALL -n DEV [interval] [count]
iostat : disk related counters

iostat -d -k -t -x /dev/sd* [interval]
vmstat : virtual memory

vmstat -n [interval]
netstat : network statistics

Most Useful Counters - Memory

Counter Name Pswpin/s What it Counts When to be Concerned What it Could Mean Need more memory; If below 500MB, might need to add memory. If above 1GB, app might be able to utilize more memory
O/S swapping
Values > 200 or significantly higher than baseline If below 500MB or above 1GB
Kbmemfree
Last observed amount of memory available in KB
Most Useful Counters Network Interface

Counter Name What it Counts Measure of bandwidth Packets discarded network errors NIC interrupt/s When to be Concerned Confirm if the network is bottlenecked Values above 0 What it Could Mean
Txbyt/s, rxbyt/s
If > 100 MB/s, system is approaching Gbe network saturation packets dropped per second because of a lack of space in linux buffers NIC interrupts/s is too high, need to set interrupt moderation rate
Rxdrop/s, txdrop/s
Nic interrupts/s
Values above 10000
Most Useful Counters Physical Disk

Counter Name What it Counts Latency of the disk subsystem When to be Concerned Counter value is significantly higher than a baseline What it Could Mean
await
Underperforming disk(s); storage sub-system not optimally configured Need to add more storage
Avgqu-sz
Number of Greater than 2 or disk reads/ significantly greater writes than a baseline waiting to be issued
Most Useful Counters Processor

Counter Name What it Counts When to be Concerned What it Could Mean
1 - % idle
CPU Utilization
Increases from a baseline (e.g. CPU, frequency) without an increase in performance; less than 99% if benchmarking
Significantly higher than a baseline or higher than 40% --
Inefficient use of the CPU(s); workload needs to be optimized
% system
Time CPU in kernel mode (processing interrupts) Time CPU in in user mode (running apps)
Increase in I/O (possibly inefficient); malfunctioning device Decrease in I/O
% user
Most Useful Counters System

Counter Name What it Counts When to be Concerned Above 10,000 What it Could Mean
Context Switches/Sec
How often CPU switches from user to kernel or between threads
Too many interrupts (inefficient I/O, malfunctioning device); too many threads competing for resources
Tuning Knobs
System limits
#ulimit n 110490
Refer to max number of file descriptors allowed per process Fix unable to open file descriptor issue
/proc/sys/fs/file-max = 254108
Refer to max number of file descriptors allowed per system
/proc/sys/kernel/pid_max = 65536
Max number of processes/threads
/proc/sys/vm/max_map_count = 131072
Max number of memory map area a process may have increase it when want to create more than 32k threads in one process
/proc/sys/kernel/shmmax = 268435456 /proc/sys/kernel/shmall = 4194304

When application (like Oracle, DB2) use Sys V shared memory system calls (shm*)
#ulimit -a
File system tuning

#mount -t ext3 -o noatime,nodiratime,noacl /dev/sda1 /mnt IO scheduler
cfq (completely fair queuing) deadline noop as (anticipatory)
ramdisk/ramfs/tmpfs
Network Tuning
net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_tw_recycle = 1 net.ipv4.tcp_fin_timeout = 30 net.ipv4.tcp_keepalive_time = 1800 net.core.wmem_max = 8388608 net.core.rmem_max = 8388608 net.ipv4.tcp_rmem/net.ipv4.tcp_wmem net.ipv4.tcp_max_syn_backlog = 4096 net.ipv4.tcp_max_tw_buckets = 450000 net.core.somaxconn = 20480 net.core.netdev_max_backlog = 30000
Network Tuning (cont)

Ensure the latest driver version is being used If all CPUs have high privileged time, try affinitizing the NIC to 1 CPU echo 1 > /proc/irq/26/smp_affinity Or echo 0xff > /proc/irq/26/smp_affinity Or if 1 CPU is saturated, try using irqbalance to balance interrupts Verify or tweak the following NIC Properties: Transmit and Receive Descriptors Should generally be around 256. Senders transmit descriptors should = receivers receive descriptors
#Modprobe e1000 RxDescriptors=4096 TxDescriptors=4096
Offload Receive IP Checksum/Receive TCP Checksum/TCP Segmentation/Transmit IP Checksum/Transmit TCP Checksum These are usually enabled by default, and in most cases this is the best setting.
#ethtool K eth0 tso on
Interrupt Moderation Rate Using this feature can reduce CPU utilization. The best setting is workload-dependent
#modprobe e1000 InterruptThrottleRate=3000

transmit queue length
#ifconfig eth0 txqueuelen 40000
Memory Tuning
vm.min_free_kbytes
force the Linux VM to keep a minimum number of kilobytes free
vm.nr_hugepages
Many server software, like Oracle, DB2, JRockit, support it
If working with a NUMA system, balance memory and use software flags to take advantage of NUMA
numactl
Simply do Add more memory

Application & General Tuning

Ensure application is built using optimal compiler and flags Research application-specific startup flags or configurables Try enabling large pages if application supports it If clients are being used to drive the workload, consider adding additional clients if their CPU utilization is over 50%
Small workloads
Workload Category 1 Business Processing ERP CRM OLTP E-commerce 2 IT Infrastructure File & Print Networking
Workload SAP, SPECjbb2005, Oracle Apps, SPECjAppServer2004, Volanomark
TPC-C, Sysbench-MySQL, DBT2-MySQL Dell DVD Store Dbench, NetBench, LLCBench Netperf, Tbench, Httperf, Jmeter
Proxy Caching Security Systems Management 3 Decision Support Data Warehousing/Data Mart Data Analysis/Data Mining 4 Collaborative Email Workgroup 5 6 Application Development Web Infrastructure Application Development Streaming Media Web Serving 7 8 Technical OS Scientific/Engineering System component File system io SPECint,SPECfp,SPECint_rate,SPECfp_rate, WMLS SPECweb2005, WebBench Linpack, HPC Suite*, SPECint, SPECfp, Stream, Matrix Multiply, Adobe After Effects, blastp, GunGard LmBench, Aim Aiostress, Iozone, Iometer, TIOBench MMB3, Lotus NotesBench TPC-H SPECweb99_SSL
SysBench-MySQL
.. SysBench AT (OLTP) MySQL SysBench AT is an open source set of several MySQL workloads. The AT case is a set of mixed read and write queries, dominated by reads. SysBench AT is multi-threaded, capable of generating up to 96 concurrent threads. SysBench AT is multi-threaded, capable of generating up to 96 concurrent threads. The number of threads is controlled by a command-line parameter. The performance metric is transactions per second. Database size is ~10 Million rows, about 2.4 GB disk space. Source code can be downloaded from http://sysbench.sourceforge.net SysBench version 0.4.7 Platform capability with an open source database Benefit of Hyper-Threading Technology Performance benefit from larger cache (>10% 1M to 2M) Database workload dominated by reads, 1 insert per transaction Warm-up phase insures that data and metadata are virtually all memory resident (cached) for measurement phase Linux 64 bit SMP Workstations or Servers
Application
Demonstrates
Workload OS Platform Usage
Linpack
.. Linpack Long standing industry recognized benchmark from: http://www.netlib.org/performance/html/PDStop.html Solves systems of linear equations in dense matrix; adheres to Linpack standard Uses Block Matrix algorithms and Intel Math Kernel Library (MKL) Built-in Parallel processing capability with Open MP Measures floating point performance in FLOPS (Floating Point Ops Per Second) Intel created binary (only) version 2.1.3 Floating Point computation performance potential Achieves very high percentage of tpp1 Near Linear Scaling with clock speed and # CPUs Minimal benefit from larger cache (>1M) Use of memory >4GB (via run-time parameters). 15K matrix = ~2GB memory Entirely Compute Intensive Runs best with Hyper-Threading Technology Off Linux & windows Workstations, servers
Application
Demonstrates
Workload OS Platform
IOMeter
.. Application IOMeter IO subsystem measurement and characterization tool Open source http://www.iometer.org/ Can measure performance of disk and network controllers, bandwidth and latency capabilities of buses, network throughput to attached drives, shred bus performance, system-level hard drive performance, system-level network performance Windows , Linux, Netware One server
Demonstrates
Workload OS Platform
IOZone
.. IOZone Comparing the ability of a computer to complete single tasks Open Source Similar with bonnie++ (http://www.coker.com.au/bonnie++/) Download from http://www.iozone.org
Application
Demonstrates Workload OS Platform
Measure operations such as: read, write, fwrite, pread and so on.
Windows , Linux , Solaris One server
LMbench
.. LMbench A simple and portable benchmark used to test bandwidth and latency. Including Bandwidth benchmark, latency benchmark and miscellanious benchmark Workload author: Larry McVoy and Carl Staelin. GPL Download from http://www.bitmover.com/lmbench/ Test for bandwidth and latency Potential issues on timing and clock resolution Measure system behavior: memory BW, cache BW, system call, context switch Linux One machine
Application
Demonstrates Workload OS Platform Usage
Netperf
.. Netperf Netperf is a benchmark that can be used to measure the performance of many different types of networking, including TCP and UDP, DLPI, Unix Domain Sockets and SCTP. It provides tests for both unidirecitonal throughput, and end-to-end latency. Not GPL, HP COPYRIGHT, but freely distributable Download from http://www.netperf.org/netperf/NetperfPage.html Tests for both unidirecitonal throughput, and end-to-end latency.
Application
Demonstrates Workload OS Platform Usage
Linux 2 machines: server and client
Httperf
Httperf
Application
a tool for measuring web server performance With a flexible facility for generating various HTTP workloads Generate HTTP GET requests
Demonstrates Workload OS Platform Usage Not a benchmark but a tool Open-source (Linux/Windows)
Discussion
Any other things related with Linux want to talk? Workloads you are using? Scripts to collect performance data?

Linux性能调优-Linux performance tuning knobs

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Linux性能调优-Linux performance tuning knobs

Загружено:

Авторское право:

Доступные форматы

Seminar: Linux Performance Tuning Knobs

Xuekun Hu PRC Scalability Lab

SSG/PRC Scalability Lab

Intel Confidential Slide 1

SSG/PRC Scalability Lab

Intel Confidential Slide 2

SSG/PRC Scalability Lab

Intel Confidential Slide 3

iostat : disk related counters

vmstat : virtual memory

netstat : network statistics

Most Useful Counters - Memory

Last observed amount of memory available in KB

SSG/PRC Scalability Lab

Intel Confidential Slide 5

Most Useful Counters Network Interface

Values above 10000

SSG/PRC Scalability Lab

Intel Confidential Slide 6

Most Useful Counters Physical Disk

SSG/PRC Scalability Lab

Intel Confidential Slide 7

Most Useful Counters Processor

Inefficient use of the CPU(s); workload needs to be optimized

Increase in I/O (possibly inefficient); malfunctioning device Decrease in I/O

SSG/PRC Scalability Lab

Intel Confidential Slide 8

Most Useful Counters System

How often CPU switches from user to kernel or between threads

SSG/PRC Scalability Lab

Intel Confidential Slide 9

SSG/PRC Scalability Lab

Intel Confidential Slide 10

/proc/sys/kernel/shmmax = 268435456 /proc/sys/kernel/shmall = 4194304

File system tuning

SSG/PRC Scalability Lab

Intel Confidential Slide 12

SSG/PRC Scalability Lab

Network Tuning (cont)

#modprobe e1000 InterruptThrottleRate=3000

#ifconfig eth0 txqueuelen 40000

SSG/PRC Scalability Lab

Intel Confidential Slide 14

Simply do Add more memory

Application & General Tuning

SSG/PRC Scalability Lab

Intel Confidential Slide 16

SSG/PRC Scalability Lab

Intel Confidential Slide 17

Workload SAP, SPECjbb2005, Oracle Apps, SPECjAppServer2004, Volanomark

SSG/PRC Scalability Lab

Intel Confidential Slide 18

Workload OS Platform Usage

SSG/PRC Scalability Lab

Intel Confidential Slide 19

SSG/PRC Scalability Lab

Intel Confidential Slide 20

SSG/PRC Scalability Lab

Intel Confidential Slide 21

Demonstrates Workload OS Platform

SSG/PRC Scalability Lab

Intel Confidential Slide 22

Demonstrates Workload OS Platform Usage