Академический Документы
Профессиональный Документы
Культура Документы
My experience of building Linux knowledge man src/Document/* Google Ask people Summarize and Write down
Agenda
Know your System and Workloads Basic Performance Tuning Knobs
System tuning knobs for limitations File system tuning TCP/IP stack tuning Memory system tuning
Small workloads
Know your system and workloads /proc/cpuinfo : cpu info /proc/meminfo : mem info dmesg : system info fdisk -l : disk info sar : OS, network, &cpu related counters
sar -bBcrwqWuR -P ALL -n DEV [interval] [count]
O/S swapping
Values > 200 or significantly higher than baseline If below 500MB or above 1GB
Kbmemfree
Txbyt/s, rxbyt/s
If > 100 MB/s, system is approaching Gbe network saturation packets dropped per second because of a lack of space in linux buffers NIC interrupts/s is too high, need to set interrupt moderation rate
Rxdrop/s, txdrop/s
Nic interrupts/s
await
Underperforming disk(s); storage sub-system not optimally configured Need to add more storage
Avgqu-sz
Number of Greater than 2 or disk reads/ significantly greater writes than a baseline waiting to be issued
1 - % idle
CPU Utilization
Increases from a baseline (e.g. CPU, frequency) without an increase in performance; less than 99% if benchmarking
Significantly higher than a baseline or higher than 40% --
% system
Time CPU in kernel mode (processing interrupts) Time CPU in in user mode (running apps)
% user
Context Switches/Sec
Too many interrupts (inefficient I/O, malfunctioning device); too many threads competing for resources
Tuning Knobs
System limits
#ulimit n 110490
Refer to max number of file descriptors allowed per process Fix unable to open file descriptor issue
/proc/sys/fs/file-max = 254108
Refer to max number of file descriptors allowed per system
/proc/sys/kernel/pid_max = 65536
Max number of processes/threads
/proc/sys/vm/max_map_count = 131072
Max number of memory map area a process may have increase it when want to create more than 32k threads in one process
#ulimit -a
SSG/PRC Scalability Lab
Intel Confidential Slide 11
ramdisk/ramfs/tmpfs
Network Tuning
net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_tw_recycle = 1 net.ipv4.tcp_fin_timeout = 30 net.ipv4.tcp_keepalive_time = 1800 net.core.wmem_max = 8388608 net.core.rmem_max = 8388608 net.ipv4.tcp_rmem/net.ipv4.tcp_wmem net.ipv4.tcp_max_syn_backlog = 4096 net.ipv4.tcp_max_tw_buckets = 450000 net.core.somaxconn = 20480 net.core.netdev_max_backlog = 30000
Intel Confidential Slide 13
Offload Receive IP Checksum/Receive TCP Checksum/TCP Segmentation/Transmit IP Checksum/Transmit TCP Checksum These are usually enabled by default, and in most cases this is the best setting.
#ethtool K eth0 tso on
Interrupt Moderation Rate Using this feature can reduce CPU utilization. The best setting is workload-dependent
Memory Tuning
vm.min_free_kbytes
force the Linux VM to keep a minimum number of kilobytes free
vm.nr_hugepages
Many server software, like Oracle, DB2, JRockit, support it
If working with a NUMA system, balance memory and use software flags to take advantage of NUMA
numactl
Small workloads
Workload Category 1 Business Processing ERP CRM OLTP E-commerce 2 IT Infrastructure File & Print Networking
TPC-C, Sysbench-MySQL, DBT2-MySQL Dell DVD Store Dbench, NetBench, LLCBench Netperf, Tbench, Httperf, Jmeter
Proxy Caching Security Systems Management 3 Decision Support Data Warehousing/Data Mart Data Analysis/Data Mining 4 Collaborative Email Workgroup 5 6 Application Development Web Infrastructure Application Development Streaming Media Web Serving 7 8 Technical OS Scientific/Engineering System component File system io SPECint,SPECfp,SPECint_rate,SPECfp_rate, WMLS SPECweb2005, WebBench Linpack, HPC Suite*, SPECint, SPECfp, Stream, Matrix Multiply, Adobe After Effects, blastp, GunGard LmBench, Aim Aiostress, Iozone, Iometer, TIOBench MMB3, Lotus NotesBench TPC-H SPECweb99_SSL
SysBench-MySQL
.. SysBench AT (OLTP) MySQL SysBench AT is an open source set of several MySQL workloads. The AT case is a set of mixed read and write queries, dominated by reads. SysBench AT is multi-threaded, capable of generating up to 96 concurrent threads. SysBench AT is multi-threaded, capable of generating up to 96 concurrent threads. The number of threads is controlled by a command-line parameter. The performance metric is transactions per second. Database size is ~10 Million rows, about 2.4 GB disk space. Source code can be downloaded from http://sysbench.sourceforge.net SysBench version 0.4.7 Platform capability with an open source database Benefit of Hyper-Threading Technology Performance benefit from larger cache (>10% 1M to 2M) Database workload dominated by reads, 1 insert per transaction Warm-up phase insures that data and metadata are virtually all memory resident (cached) for measurement phase Linux 64 bit SMP Workstations or Servers
Application
Demonstrates
Linpack
.. Linpack Long standing industry recognized benchmark from: http://www.netlib.org/performance/html/PDStop.html Solves systems of linear equations in dense matrix; adheres to Linpack standard Uses Block Matrix algorithms and Intel Math Kernel Library (MKL) Built-in Parallel processing capability with Open MP Measures floating point performance in FLOPS (Floating Point Ops Per Second) Intel created binary (only) version 2.1.3 Floating Point computation performance potential Achieves very high percentage of tpp1 Near Linear Scaling with clock speed and # CPUs Minimal benefit from larger cache (>1M) Use of memory >4GB (via run-time parameters). 15K matrix = ~2GB memory Entirely Compute Intensive Runs best with Hyper-Threading Technology Off Linux & windows Workstations, servers
Application
Demonstrates
Workload OS Platform
IOMeter
.. Application IOMeter IO subsystem measurement and characterization tool Open source http://www.iometer.org/ Can measure performance of disk and network controllers, bandwidth and latency capabilities of buses, network throughput to attached drives, shred bus performance, system-level hard drive performance, system-level network performance Windows , Linux, Netware One server
Demonstrates
Workload OS Platform
IOZone
.. IOZone Comparing the ability of a computer to complete single tasks Open Source Similar with bonnie++ (http://www.coker.com.au/bonnie++/) Download from http://www.iozone.org
Application
Measure operations such as: read, write, fwrite, pread and so on.
Windows , Linux , Solaris One server
LMbench
.. LMbench A simple and portable benchmark used to test bandwidth and latency. Including Bandwidth benchmark, latency benchmark and miscellanious benchmark Workload author: Larry McVoy and Carl Staelin. GPL Download from http://www.bitmover.com/lmbench/ Test for bandwidth and latency Potential issues on timing and clock resolution Measure system behavior: memory BW, cache BW, system call, context switch Linux One machine
Application
Netperf
.. Netperf Netperf is a benchmark that can be used to measure the performance of many different types of networking, including TCP and UDP, DLPI, Unix Domain Sockets and SCTP. It provides tests for both unidirecitonal throughput, and end-to-end latency. Not GPL, HP COPYRIGHT, but freely distributable Download from http://www.netperf.org/netperf/NetperfPage.html Tests for both unidirecitonal throughput, and end-to-end latency.
Application
Httperf
Httperf
Application
a tool for measuring web server performance With a flexible facility for generating various HTTP workloads Generate HTTP GET requests
Demonstrates Workload OS Platform Usage Not a benchmark but a tool Open-source (Linux/Windows)
Discussion
Any other things related with Linux want to talk? Workloads you are using? Scripts to collect performance data?