Вы находитесь на странице: 1из 10

DETERMINING CPU RESOURCE USAGE FOR LINUX AND UNIX

Roger Snowden, Center of Expertise, Oracle November 13, 2007

ABSTRACT
Even though Unix and Linux systems run on Symmetric Multi Processor architectures, concurrent processing is still constrained by the physical reality that only one process can be running on a CPU at any moment. When demand for CPU exceeds capacity, performance is adversely affected. Therefore, it is important for system administrators to understand their CPU usage and plan hardware resources accordingly. This article discusses simple methods to evaluate and analyze current CPU usage. It does not purport to provide full-scale capacity planning methods, nor does it provide guidance for statistical analysis. This article is intended for system administrators, database administrators, and managers who wish to determine CPU utilization on Unix and Linux platforms and to conduct basic capacity planning.

PROCESS SCHEDULING AND STATES


Since it is typical for more processes to be running than exist CPUs, operating systems provide a scheduling mechanism to permit sharing of CPUs. In Unix and Linux systems, this scheduling mechanism switches processes between three states, as illustrated below.

Processes are generally in one of three states at all times: running, suspended (sleeping) and ready-to-run (ready queue, or run queue).

RUNNING
In the running state, a process is actually executing instructions on a CPU. This process is said to be on CPU, and continues to run until interrupted by the operating system, or the process voluntarily yields CPU. Interrupts occur for one of several reasons: 1) another process with a higher priority requires CPU 2) another process with similar priority requires CPU and the scheduler is allocating CPU time by a round-robin or first-in-first-out time sharing algorithm. When a process leaves the execution state, it enters either the suspended state or the ready-to-run state, as described below.

SUSPENDED
When a process must wait for a resource, such as a disk I/O operation, that will take considerable time, it enters the suspended, or sleeping state, during which time it is dormant. A process can voluntarily enter this state, by requesting the operating system wake it up at a predetermined time interval typical behaviour of Oracle wait events or it can be placed in Suspended state by the operating system when it makes a request for a resource that is expected to take considerable time, such as reading disk or awaiting a network socket message. When the awaited operation is complete, or the predetermined sleep time is complete, the process wakes up and is placed into the ready-to-run state.

READY-TO-RUN
Processes ready to run are entered in the run queue, and are said to be in the ready-to-run state. The run queue is an ordered structure that allows the operating system kernel to select the next process to be placed on CPU for execution, since only one process at a time can actually be executing on a given CPU. Each CPU on the system will have its own run queue. As the name implies, ready-to-run processes require CPU resources. When a process has its execution interrupted because of elapsed time-slice, or is preempted because another process has a higher priority, it is placed directly from the CPU to the run queue, and does not enter the suspended state. Thus, many processes competing for CPU time will be in the run queue concurrently, increasing the length of the queue. Other than inevitable momentary transitions through the run queue, a non-zero run queue length is an indicator of demand for CPU resources in excess of CPU capacity. Contrast this with situations where many processes are competing for I/O and transition from CPU to suspended state prior to entering the run queue. When many processes are suspended, the run queue may be small or empty, and CPU resources are in fact plentiful. This explains why surges in I/O on a system often result in dramatically reduced CPU utilization as processes are sleeping, not waiting for a CPU to become available.

ORACLE WAIT EVENTS


In a busy Oracle database server system, most user (session) processes will typically be in the suspended state, either waiting for I/O completion or waiting for a request message from the client. For example, between SQL parse/execute/fetch cycles, the Oracle user process is normally shown by the v$session view to be in the "SQL*Net message from client" wait event, and that user process will be sleeping. When an Oracle process is in a wait event, the Oracle instance itself tracks this and Oracle "knows" it is waiting. Ordinarily the Oracle session notes the system time just before entering a wait event, notes the system time as it wakes up to continue execution, then compares the times to determine how long it waited. This information gets added to the session and system statistics as wait time for the particular resource for which the process was waiting. Wait event information is then reported in such views as v$session and is accumulated in such reporting vehicles as AWR and Statspack.

ORACLE LOST TIME


Since multiple processes in ready-to-run state at the same time will also all be in the run queue structure at the same time, the length of the run queue is an effective measure of CPU demand in excess of capacity. When an Oracle process migrates from CPU to ready-to-run, the Oracle session process has no indicator of entering the non-running state, and so does not track run queue time as waiting for CPU. Hence, time spent in run queue is generally lost" time with respect to Oracle statistical metrics. This phenomenon can be seen in Oracle SQL trace files, both in the raw trace file as well as the tkprof-formatted output. Observe the trace fragment sample shown below:
PARSE#2:c=20000,e=19091,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=489408753167

PARSE#2:c=20000,e=58484,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=488803627448

SQLTrace(extended)fragments The first line was taken during a period when the run queue was measured at zero, so the server was not CPU-bound at all. The field c= shows CPU time consumed by the parse step, as reported by the operating system. The e= field represents elapsed time for the parse event, as tracked by the Oracle process. All times are reported in microseconds. Although this operating system, Linux, rounds its reported CPU time to the nearest centisecond, it is nonetheless clear the two values are essentially the same. The second line was taken from a period when the run queue was 10, so clearly the server was rather CPU bound. In this case, CPU time consumed by the parse step is still 20,000 microseconds, but elapsed time as recorded by the session is nearly three times that value. Since a large portion of the code execution time was spent in the run queue, and the Oracle session is not aware it is in the nonrunning state, it can only track the overall system clock time that has elapsed from start of the parse to its completion. In this case, approximately one third of the total elapsed time was spent actually executing parse code, while nearly two thirds of that time was spent in ready-to-run state, waiting for an available CPU for execution. When significant differences between reported CPU time and elapsed time are observed, this is evidence the process is CPU bound. Essentially, time spent in run queues is a symptom of demand for CPU in excess of capacity. So, tracking run queue sizes is an effective indicator of CPU saturation. However, since processes transition between the three states normally, it is typical for some processes to be in the ready-to-run state most of the time. That is, at any point in time, it is likely at least one process will be in the run queue, but may not stay in the queue long. So, while some processes may spend at least some time waiting for CPU, the time spent in the run queue may be insignificant. For that reason, a large scale and busy production server may well show an average run queue size greater than zero. A non-zero run queue size is not necessarily an indicator of CPU saturation, so we need more information to determine CPU utilization and saturation.

UTILIZATION AND IDLE %


The vmstat utility, available on all Unix and Linux systems, gives us an overall indication of CPU utilization with the % Idle column, usually shows as "id". When a system is completely idle, "id" will be 100. When a system is completely busy, "id" will be 0. To get a sense of overall CPU utilization versus capacity, we can consider the "r" run queue and "id" % idle columns together. Observe the sample vmstat report, taken at two second intervals. Notice as the run queue (r) values increase, the % idle (id) values decrease, although not in lockstep. Essentially, as CPUs become completely busy, the run queue size will suddenly increase. This is typical of any queueing system when a point of saturation is reached.

procsmemoryswapiosystemcpu rbswpdfreebuffcachesisobiboincsussyidwa 1134561582047360346382800375826148712009143741 01345616396473243464124003778641483117115134627 9034561626847324346490400320681065616544042 12034561634847324346490400 69321152791544510 80345616348473243464904004221016612584110 10034561666847324346490400032100913103325420 ...

The graph below illustrates congestion in a typical queueing system, such that as workload increases, response time increases. The increase is gradual until the saturation point is reached, at which time response time increases dramatically. In Oracle database servers, this sudden, exponential time increase can be observed in disk I/O, internal waits, such as latches, pins or enqueues, or as a sharp increase in CPU run queue length.

In the graph, the horizontal line illustrates arrivals. In our CPU case, this would be arrival of processes at the run queue, i.e. waiting for CPU. So, as the arrival rate increases, so does wait time. Once the capacity of the CPU is reached, the backlog of arrivals grows exponentially, and thus application response time increases dramatically.

VARIANCE VERSUS AVERAGE


In measuring usage of a resource, versus the capacity of that resource, averages are often used. While an average utilization can be useful in predicting resource needs, it can also be misleading. Consider the vmstat output below:

Utilization,"BusyCPU"
100 80 Utilization 60 40 20 13 17 21 25 29 33 37 41 1 5 9 0 Avg% Util Run Queue %Util

Time

This was extrapolated from a vmstat report from a busy, but not overloaded system. The id column was converted into % Utilization by subtracting the idle percent value from 100, (idle is the inverse of utilization). As you can see from the middle horizontal dotted line, the average utilization is 60%. This implies a remaining capacity, or headroom of 40%. However, the chart shows utilization peaking at nearly 100%, at regular intervals during this sampling period. Only by considering variancepeaks and valleys of a data samplecan you get a meaningful picture of what is going on. If the system administrator of this server imagines there is actually 40% unused CPU capacity, then increasing the workload without increasing CPU capacity will definitely result in increased response time and degraded performance. Now, considering only variance can be equally problematic. Consider the same server, under a higher workload:

Utilization,"SaturatedCPU"
Utilization,RunQueue 100 80 60 40 20 10 13 16 19 22 25 28 1 4 7 0 % Utilization Run Queue AvgRunQ

Time

In this case, the % Utilization line is nearly perfectly straight, without variance, at 100%. When a servers CPUs are saturated with work, they cannot become busier than 100%. At the point of saturation, processes begin to wait for CPU, in the run queue. Correspondingly, note the Run Queue line near the bottom of the chart, which shows considerable variance now. As pointed out earlier, when such a bottleneck is reached, congestion grows exponentially, as would be evident if the workload is increased beyond this point. So, when evaluating CPU utilization, consider both average utilization as well as variance. Also, consider average and variance of run queue length. There is no universal rule of thumb to determine what is the right run queue size. On some servers, under busy conditions, it may be perfectly normal for the run queue to average 4 processes, and for that server to be performing well. This is because it is normal for processes to be transitioning through the run queue, even if the time spent in the queue is minimal. The best way to evaluate normal performance metrics is to capture information, such as vmstat output, during busy periods when performance is acceptable. This allows some comparison to be made during heavier, poor-performance periods, or during pre-production testing in preparation for an upgrade.

TOOLS FOR THE JOB


All hardware vendors provide tools capable of measuring CPU utilization. Many are quite specialized and effective for capacity planning. However, vmstat is universally available and entirely adequate for the level of analysis discussed in this article. Better yet, it is free of charge.

OSWATCHER
Oracle Supports Center of Expertise has developed a script based tool, OSWatcher, that will capture not only vmstat output, but that of other available operating system performance monitoring tools, such as top, iostat and mpstat. OSWatcher is available from Metalink as note 301137.1. It is a shell script tool and will run on Unix and Linux servers. It operates as a background shell process and runs the native operating system utilities at configurable intervals, usually 30 seconds, and retains an archive of the output for a default period of 48 hours. These values may be increased in order to obtain and retain more information when evaluating performance, and to capture baseline information during important cycle-end periods. Oracle recommends customers download and install OSWatcher on all production and test servers that need to be monitored. When reading the output captured from OSWatcher, the vmstat report contains three lines of output from each sample, and resembles this text:
LinuxOSWv2.1.0XXXsrv44 zzz***MonOct2907:00:37EDT2007 procsmemoryswapiosystemcpu rbswpdfreebuffcachesisobiboincsussyidwa 101032623804921259142800173665883976911910 140032620524921259142800012105656998200 70032619884921259142800001028508100000 zzz***MonOct2907:01:39EDT2007 procsmemoryswapiosystemcpu rbswpdfreebuffcachesisobiboincsussyidwa 100032625804921259194800173655873966911910 1500326065249212591948000380110559198200 8203260652492125919480001001045501100000

The format of vmstat is slightly altered by OSWatcher, as it adds identifying information at the beginning of each hourly archive file, and places a timestamp before each sample. It also captures three lines of output. The first two lines should be ignored, and only the last of the three lines from each sample regarded for analysis purposes. The first line of any vmstat contains cumulative information and is inaccurate for the sample assumed. The second line from the OSWatcher capture is influenced by the startup of vmstat itself, by OSWatcher, and should also be disregarded. Only the third line is accurate and useful for analysis purposes.

PERL SCRIPT
To facilitate easier analysis of OSWatcher vmstat archives, a perl script is provided. The script will accept a vmstat file in OSWatcher archive format and produce a space-separated file, with column headings, suitable for import into a spreadsheet program such as Microsoft Excel. Each line begins with a timestamp integer, representing the number of seconds since January 1, 1970 on most machines. This field is placed in each line to permit calculation of the time interval between samplings by OSWatcher, since there is no other simple way of getting that information. Thus, the timestamp of the first line, subtracted from the timestamp of the second line will calculate the number of seconds between OSWatcher vmstat lines. The output of the script will look like this:
timestamprbswpdfreebuffcachesisobiboincsussyidwa 119463125100336430412576243457100000010021004001000 11946313110033642796457624345450000044100998500991 119463137100336433028576283454756000601009103201982 119463143100336426532576323454752000481008100400991 119463149100336432980576363454748000481010991001000

To use the file, open it in Excel as a .txt file, and choose delimited from the import wizard. On the next wizard screen, select space as the delimiter. The fields should line up neatly, with column heading across the top. To produce a line chart, simply use the chart wizard and select both the r and id columns to produce a two-line chart for run queue and percent idle values. If you want to use percent utilization instead, create another column to the right of the imported table, label the column % Util or something similar, and compute values for each cell in the column as 100 minus the contents of the id column, then use that column of values in your chart. In addition, you can use the data analysis wizard to create statistical descriptions of your data. A tutorial on statistical analysis is beyond this scope of this paper, however. The complete text of the perl script is included as Appendix A. No warranty or representations are made for the script, other than it worked with OSWatcher 2.0.1 on a Linux machine, RedHat EL 2.6.9-42.0.10.0.1 at time of development and testing. For questions or comments on the script, please contact the author directly at roger.snowden@oracle.com.

APPENDIX A
Perl source for osw_vmstat_parse.pl
#!/usr/bin/perlw ##osw_vmstat_parse.plUtilitytoparseOSWatchervmstatarchivesforspreadsheets ##November9,2007 ##Copyright(c)2007Oracle,allrightsreserved. ##Author:RogerSnowden,OracleSupport,CoEroger.snowden@oracle.com ##History ##11/09/2007rsnowden:initialcut,v1.0 ## ##Usage: ##perlosw_vmstat_parse.pl<filename> ##where<filename>isthenameofanindividualOSWvmstatarchivefile ##Theutilitywillproduceanoutputfileintheformof: ##osw_vm_yymmdd_hhmm.txt ##whereyymmdd_hhmmismonth,day,year,hourstakenfromthetimeportionof ##inputarchivefile. ## ## useTime::Local ##buildhashtableofmonthnametostring %Month=( "Jan"=>0, "Feb"=>1, "Mar"=>2, "Apr"=>3, "May"=>4, "Jun"=>5, "Jul"=>6, "Aug"=>7, "Sep"=>8, "Oct"=>9, "Nov"=>10, "Dec"=>11 ) %MetaColumns=() %ReverseColumns=() #initializestuffhere $version="1.0" $carlage=0##handycounter $definedMeta="FALSE" $lineCount=0##tostoreoverall"good"lines if(!$ARGV[0]) { print"\n" print"osw_vmstat_parse.plVersion",$version,"Usage:\n" print"perlosw_vmstat_parse.pl<filename>\n" print"where<filename>isavmstatarchivedfileproducedbyOSWatcher.\n" exit1 } $fileName=$ARGV[0] #willbeinthisform:hostname.oracle.com_vmstat_07.11.09.1300.dat #chopupname,grabdate ($host,$dummy,$date_part)=split("_",$fileName) $year=substr($date_part,0,2) $month=substr($date_part,3,2)

$day=substr($date_part,6,2) $hours=substr($date_part,9,4) ##osw_vm_yymmdd_hhmm.txt ##whilewehavethevalues,buildoutputfilename $fileOut="osw_vm_".$year.$month.$day.$hours.".txt" open(INFILE,$fileName)ordie"File",$fileName,"cannotbeopened.\n" ##createoutputfilenamefrominputfilename ## while($line=<INFILE>) { #first,chopintoarrayoftokens @thisline=split("",$line) #grabversionofOSW if($thisline[1]eq"OSW") { ($platform,$dummy,$version,$host)=@thisline$platform="" } #parseoutdate/timestamp if($thisline[0]eq"zzz") { #zzz***MonOct2910:01:24EDT2007 ($junk,$wkday,$month,$mday,$time,$zone,$year)=split("",$line) $junk=$zone##justtoeliminatespuriouswarnings $wkday=substr($wkday,3) ($hours,$min,$sec)=split(":",$time) $timeseconds=timelocal($sec,$min,$hours,$mday,$Month{$month},$year) } if($thisline[0]eq"procs"){}#donothing,headerthing if($thisline[0]eq"r"&&$definedMetaeq"FALSE")#colhdrs { $colCount=0 ##saveoffa"list"ofcolumns,inoriginalorder @columnNames=@thisline foreach$column(@thisline) { #firsttheobverse $MetaColumns{$column}=$colCount #thenthereverse $ReverseColumns{$colCount}=$column $colCount++ } $definedMeta="TRUE" } ##noweatfirsttwolinesofvmstatoutput if($thisline[0]=~/^\d+$/&&$thisline[1]=~/^\d+$/) { $carlage++ if($carlage==3)#...andthenumberofthecountingshallbethree. { $bucketNumber=0 foreach$bucket(@thisline) { $bucketName=$ReverseColumns{$bucketNumber} push@$bucketName,$bucket $bucketNumber++

} push@timestamp,$timeseconds $carlage=0##resettoavaluesuitableforcarl $lineCount++##bump"good"lines,soweknowlater } } } ##knockoutatextfileofstuff ##osw_vm_yymmdd_hhmm.txt open(TXTOUT,">$fileOut")ordie"Cannotopen",$fileOut,"forwriting\n" printTXTOUT"timestamp" foreach$col(@columnNames) { printTXTOUT$col,"" } printTXTOUT"\n" $lines=0 while($lines<$lineCount) { printTXTOUT$timestamp[$lines],"" foreach$col(@columnNames) { printTXTOUT$$col[$lines],"" } printTXTOUT"\n"##neverforgetthelinefeed! $lines++##andbumpdapointer } closeTXTOUT

Вам также может понравиться