Вы находитесь на странице: 1из 4

6/12/2017

AIXforSystemAdministrators

Practical Guide to AIX (and Linux on Power)

HOME FS­LVM GENERAL HMC NETWORKS NIM PERFORMANCE STORAGE­BACKUP UPD.­INSTALL POWERHA POWERVM EXTRA

FS­LVM

GENERAL

HMC

NETWORKS

NIM

PERFORMANCE

STORAGE­BACKUP

UPD.­INSTALL

POWERHA

POWERVM

EXTRA

HOME FS­LVM GENERAL HMC NETWORKS NIM PERFORMANCE STORAGE­BACKUP UPD.­INSTALL POWERHA POWERVM EXTRA

IOSTAT ­ FCSTAT:

IOPS (I/O per second) for a disk is limited by queue_depth/(average IO service time). Assuming a queue_depth of 3, and an average IO service time of 10 ms, this equals to 300 IOPS for the hdisk. And for many applications this may not be enough throughput.

IO is queued at each layer, where travels:

­filesystem ­ filesystem buffer ­hdisk ­ queue_depths ­adapter ­ num_cmd_elems

­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­

Adapter I/O:

There are no adapter stats in AIX. They are derived from the disk stats. The adapter busy% is simply the sum of the disk busy%. So if the adapter busy% is, for example, 350% then you have 3.5 disks busy on that adapter. Or it could be 7 disks at 50% busy or 14 disks at 25% or

There is no way to determine the adapter busy and in fact it is not clear what it would really mean. The adapter has a dedicated on­board CPU that is always busy (probably no real OS) and we don't run nmon of these adapter CPUs to find out what they are really doing.

­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­

IOSTAT:

iostat ­s iostat ­d hdisk1 5 iostat ­a 5

print the System throughput report since boot display only 1 disk statistics (hdisk1) shows adapter statistics as well

 

iostat ­DRTl 5 2 ­D ­R ­T ­l

shows 2 statistics in 5 seconds interval extended report resest min. max values at each interval adds timestamp long listing mode

Disks:

xfers

read

write

queue ­­­­­­ ­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­ ­­­­­

 

%tm

bps

tps

bread bwrtn

rps

avg

max

time fail

wps

avg

max time fail

avg

min

max

avg

avg

serv

 

act

serv

serv outs

 

serv

serv outs

 

time

time

time

wqsz sqsz

qfull

hdisk4

13.9 212.2K 13.9 156.7K 55.5K

9.6

14.9

40.4

0

0

4.4

0.6

1.1

0

0

0.0

0.0

0.0

0.0

0.0

0.0

hdisk5

20.1 246.4K

16.5 204.8K 41.6K 12.4

16.2

49.8

0

0

4.2

0.6

1.3

0

0

0.0

0.0

0.0

0.0

0.0

0.0

hdisk6

19.7 282.4K

17.1 257.9K 24.5K 14.7

13.6

37.3

0

0

2.4

0.6

1.1

0

0

0.0

0.0

0.0

0.0

0.0

0.0

hdisk7

18.7 300.3K

20.3 215.4K 84.9K 13.1

13.9

39.7

0

0

7.2

0.7

5.6

0

0

0.0

0.0

0.0

0.0

0.0

0.0

­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­

xfers (transfers):

%tm act:

percent of time the device was active (we can see if disk load is balanced correctly or not, 1 used heavily others not)

bps:

amount of data transferred (read or written) per second (default is in bytes per second)

tps:

number of transfers per second

bread:

(Transfer is an I/O request, and multiple logical requests can be combined into 1 I/O request with different size)

bwrtn:

amount of data read per second (default is in bytes per second) amount of data written per second (default is in bytes per second)

%tm_act:

The OS has a sensor, regularily asking the disk if it is busy or not. When the disks answers half of the times "I'm busy", then the "%

tm_act" will be 50%. If the disk answers every time "I'm busy" then tm_act will be 100%, etc

requested operations not yet fulfilled, read or write. If many very small requests go to the disk the chance of the sensor asking exactly when one such operation is still open goes up ­ much more so than the real activity of the disk.

A disk answers with "busy", when there are

So, "100% busy" does not necessarily mean the disk is at the edge of its trasnfer bandwidth. It could mean either that because the disk is getting relatively few but big requests (example: stream I/O) but it could also mean that the disk is getting a lot of requests which

are relatively small so that the disk is occupied most of the time, but not using its complete transfer bandwith. To find out which is the case analyse the corresponding "bread" and "bwrtn" column from iostat.

­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­

read/write:

rps/wps:

number of read/write transfers per second.

avgserv:

average service time per read/write transfer (default is in milliseconds)

timeouts:

fail:

number of read/write timeouts per second number of failed read/write requests per second

­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­

queue (wait queue):

avgtime:

average time spent in the wait queue (waiting to get sent to the disk, the disk's queue is full) (default is in

CLICK AD TO SUPPORT:

SEARCHONTHISBLOG:

Searchfull) (default is in CLICK AD TO SUPPORT: SEARCHONTHISBLOG: ABOUT FS­LVM ­ Filesystem FS ­ Logical

(default is in CLICK AD TO SUPPORT: SEARCHONTHISBLOG: Search ABOUT FS­LVM ­ Filesystem FS ­ Logical

FS­LVM

HMC

DPO DPO

RMC RMC

NETWORK

NFS NFS

­ HEA Netcd NFS RSH ­ RCP Sendmail SSH ­ SCP SSH ­ X11 Telnet ­

6/12/2017

AIXforSystemAdministrators

milliseconds)

avgwqsz:

average wait queue size (waiting to be sent to the disk)

avgsqsz:

average service queue size (this can't exceed queue_depth for the disk)

sqfull:

number of times the service queue becomes full per second (that is, the disk is not accepting any more service requests)

­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­

From the application's point of view, the length of time to do an IO is its service time plus the time it waits in the hdisk wait queue. Time spent in the queue indicates increasing queue_depth may improve performance, for correct tuning check maximum numbers as well. If avgwqsz is often > 0, then increase queue_depth If sqfull in the first report is high, then increase queue_depth

When you increase the queue_depths (so more IO are sent to the disk), the IO service times are likely to increase, but throughput will also increase. If IO service times start approaching the disk timeout value, then you're submitting more IOs than the disk can handle:

# lsattr ­El hdisk400 | grep timeout

rw_timeout

40

READ/WRITE time out value

True

If read/write max service times are 30 or 60 seconds or close to what the read/write time out value is set to, this likely indicates a command timeout or some type of error recovery the disk driver experienced.

­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­

A good general rule for tuning queue_depths, is to tune until these rates are achieved:

average read service time ~ 15 ms with write cache, write average ~ 2 ms (writes typically go to write cache first)

Typically for large disk subsystems that aren't overloaded, IO service times will average around 5­10 ms. When small random reads start averaging greater than 15 ms, this indicates the storage is getting busy.

­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­

For tuning, we can set up these categories:

1. We're filling up the queues and IOs are waiting in the hdisk or adapter drivers

2. We're not filling up the queues, and IO service times are good

3. We're not filling up the queues, and IO service times are poor

4. We're not filling up the queues, and we're sending IOs to the storage faster than it can handle and it loses the IOs

#2: we want to reach this #3: indicates bottleneck beyond hdisk (probably in adapter, SAN fabric or at storage box side) #4: should be avoided (if storage loses IOs, at the host IO will timeout, with recover code it will be resubmitted, in the meantime appl. is waiting for this IO)

­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­

General rule of thumb, if %tm_act greater than 70%, than probably better to migrate something to other disks as well. Moving data to less busy drives can obviously help ease this burden. Generally speaking, the more drives that your data hits, the better.

%iowait: percentage of time the CPU is idle AND there is at least one I/O in progress (all CPUs averaged together) (High I/O wait does not mean definitely I/O bottleneck. Zero I/O wait dos not mean there is no I/P botleneck.) %iowait>25% system is probably I/O bound.

­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­

­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­

­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­

FCSTAT:

# fcstat fcs0

FIBRE CHANNEL STATISTICS REPORT: fcs0

World Wide Node Name: 0x20000000C9F170E8 <­­WWPN number World Wide Port Name: 0x10000000C9F170E8

Port Speed (supported): 8 GBIT

<­­FC adapter speed

Port Speed (running):

8 GBIT

Error Frames: 0 Dumped Frames: 0

<­­both of them affect IO, when frames are damaged or <­­frames are discarded

FC SCSI Adapter Driver Information No DMA Resource Count: 37512 No Adapter Elements Count: 104848 value No Command Resource Count: 13915968 service times

# lsattr ­El fcs0 | egrep 'xfer|num|dma'

<­­IOs queued at the adapter due to lack of resources (max_xfer_size) <­­number of times since boot, an IO was temporarily blocked due to an inadequate num_cmd_elems

<­­same as above, non­zero values indicate that increasing num_cmd_elems may help improve IO

lg_term_dma

0x800000

Long term DMA

True

max_xfer_size 0x100000

Maximum Transfer Size

True

num_cmd_elems 2048

Maximum number of COMMANDS to queue to the adapter True

When the default value is used (max_xfer_size=0x100000) the memory area is 16 MB in size. When setting this attribute to any other allowable value (say 0x200000) then the memory area is 128 MB in size. At AIX 6.1 TL2 or later a change was made for virtual FC adapters so the DMA memory area is always 128 MB even with the default max_xfer_size. This memory area is a DMA memory area, but it is different than the DMA memory area controlled by the lg_term_dma attribute (which is used for IO control). The default value for lg_term_dma of 0x800000 is usually adequate.

So for heavy IO and especially for large IOs (such as for backups) it's recommended to set max_xfer_size=0x200000. Like the hdisk queue_depth attribute, changing the num_cmd_elems value requires stopping use of the resources or a reboot.

Labels: PERFORMANCE

11comments:

the resources or a reboot. Labels: PERFORMANCE 11comments: Anonymous August 26, 2013 at 6:00 AM NIM

NIM

PERFORMANCE

Client (Machines) LPP Source MKSYSB Nimadm SPOT PERFORMANCE Basics CPU ­ General CPU ­ Virtualization I/O

STORAGE­BACKUP

EMC EMC

SDD SDD

UPDATE­INSTALL

POWERHA

CAA CAA

POWERVM

LPM LPM

SEA SEA

LINUX

+EXTRAS

kdb kdb

ksh ksh

6/12/2017

HI

Could you explain GPFS concept

6/12/2017 HI Could you explain GPFS concept Reply Replies Habib Khilji December 11, 2013 at 10:58

Check the following link and also google.

http://www­03.ibm.com/systems/software/gpfs/

AIXforSystemAdministrators

Hi, Can you please help me to find how many eth cards & fc cards i have in each vio server based on below info?

VioS1:

ent0 Available 02­00 4­Port Gigabit Ethernet PCI­Express Adapter (e414571614102004) ent1 Available 02­01 4­Port Gigabit Ethernet PCI­Express Adapter (e414571614102004) ent2 Available 02­02 4­Port Gigabit Ethernet PCI­Express Adapter (e414571614102004) ent3 Available 02­03 4­Port Gigabit Ethernet PCI­Express Adapter (e414571614102004)

fcs0 Available 03­00 8Gb PCI Express Dual Port FC Adapter (df1000f114108a03) fcs1 Available 03­01 8Gb PCI Express Dual Port FC Adapter (df1000f114108a03)

VioS2:

fcs0 Available 01­00 8Gb PCI Express Dual Port FC Adapter (df1000f114108a03) fcs1 Available 01­01 8Gb PCI Express Dual Port FC Adapter (df1000f114108a03)

Dual Port FC Adapter (df1000f114108a03) Reply Replies Naunihal Singh December 30, 2013 at 8:18 AM You've

You've following

VIOS Server1: You have 1 Quad port Ethernet Card and 1 Dual port FC Card

VIOS Server2: You have only 1 Dual port FC Card

Great blog. Like this blog . Thank you so much ^^^

Nicely explained, Thanks a lot for this information

Hello, very good article

One note on "max_xfer_size": If you're using NPIV, YOU HAVE TO CHANGE THE VALUE ON THE VIOS SERVER FIRST, then reboot (or rmdev ­Rl and cfgmgr), and then, change it on the Client LPAR. If you fail to do this in this order, you may find yourself with "cfgmgr" errors or LED 554 if you boot the LPAR.

In fact, I just solve my problems

Thanks, Chris's AIX Blog (https://www.ibm.com/developerworks/community/blogs/cgaix/entry/multibos_saved_my_bacon1?lang=es).

aix aix

Hi, thanks for your valuable info.

Hi,

I have query, I'm having 2 VIOS & 8 LPAR's in frame

2

7

1

VIOS ­ updating the lg_term_dma, max_xfer_size & rebooting

LPAR's ­ updating the lg_term_dma, max_xfer_size & rebooting

LPAR ­ NOT UPDATING the values of lg_term_dma, max_xfer_size

So, the updated values of VIOS will not effect the LPAR, which was not updated.

Thanks & Regards,

Very good deep dive infos. Thank you.

11:08 AM Very good deep dive infos. Thank you. Reply Replies aix April 9, 2015 at

aix

szívesen :)

Rsh

vi

6/12/2017

Subscribe to: Post Comments (Atom)

AIXforSystemAdministrators

to: Post Comments (Atom) AIXforSystemAdministrators Home Older Post © aix4admins.blogspot.com (2015) ­

© aix4admins.blogspot.com (2015) ­ Unauthorized use of this material is strictly prohibited

Theme images by Storman. Powered by Blogger.