Академический Документы
Профессиональный Документы
Культура Документы
Mary Lovelace Otavio Rocha Filho Katja Gebuhr Ivo Gomilsek Ronda Hruby Paulo Neto Jon Parkes Leandro Torolho
ibm.com/redbooks
7521edno.fm
International Technical Support Organization SAN Volume Controller Best Practices and Performance Guidelines October 2011
SG24-7521-02
7521edno.fm
Note: Before using this information and the product it supports, read the information in Notices on page iii.
Third Edition (October 2011) This edition applies to Version 6, Release 2 of the IBM System Storage SAN Volume Controller. This document created or updated on February 16, 2012.
Copyright International Business Machines Corporation 2011. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
7521spec.fm
Notices
This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A. The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs.
iii
7521spec.fm
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. These and other IBM trademarked terms are marked on their first occurrence in this information with the appropriate symbol ( or ), indicating US registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both:
AIX alphaWorks DB2 developerWorks DS4000 DS6000 DS8000 Easy Tier Enterprise Storage Server FlashCopy Global Technology Services GPFS HACMP IBM Nextra pSeries Redbooks Redbooks (logo) Storwize System p System Storage System x System z Tivoli XIV z/OS
The following terms are trademarks of other companies: ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office. Microsoft, Windows NT, Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. Disk Magic, and the IntelliMagic logo are trademarks of IntelliMagic BV in the United States, other countries, or both. NetApp, and the NetApp logo are trademarks or registered trademarks of NetApp, Inc. in the U.S. and other countries. Oracle, JD Edwards, PeopleSoft, Siebel, and TopLink are registered trademarks of Oracle Corporation and/or its affiliates. QLogic, and the QLogic logo are registered trademarks of QLogic Corporation. SANblade is a registered trademark in the United States. VMware, the VMware "boxes" logo and design are registered trademarks or trademarks of VMware, Inc. in the United States and/or other jurisdictions. Intel Xeon, Intel, Intel logo, Intel Inside logo, and Intel Centrino logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States, other countries, or both. Linux is a trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others.
iv
7521TOC.fm
Contents
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv The team that wrote this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv Become a published author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii Summary of changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix October 2011, Third Edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix Part 1. Configuration guidelines and best practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Chapter 1. SVC update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1 SVC V5.1 enhancements and changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 SVC V6.1 enhancements and changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 SVC V6.2 enhancements and changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.4 Contents of this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.4.1 Part 1 - Configuration Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.4.2 Part 2 - Performance Best Practices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.4.3 Part 3 - Monitoring, Maintenance and Troubleshooting . . . . . . . . . . . . . . . . . . . . 10 1.4.4 Part 4 - Practical Scenarios. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Chapter 2. SAN topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 SVC SAN topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.2 Topology basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.3 ISL oversubscription . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.4 Single switch SVC SANs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.5 Basic core-edge topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.6 Four-SAN core-edge topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.7 Common topology issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.8 Split clustered system / Stretch clustered system. . . . . . . . . . . . . . . . . . . . . . . . . 2.2 SAN switches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Selecting SAN switch models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Switch port layout for large edge SAN switches . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Switch port layout for director-class SAN switches . . . . . . . . . . . . . . . . . . . . . . . . 2.2.4 IBM System Storage/Brocade b-type SANs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.5 IBM System Storage/Cisco SANs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.6 SAN routing and duplicate WWNNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Zoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Types of zoning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Pre-zoning tips and shortcuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 SVC internode communications zone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.4 SVC storage zones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.5 SVC host zones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.6 Sample standard SVC zoning configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.7 Zoning with multiple SVC clustered systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.8 Split storage subsystem configurations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 12 12 12 13 14 14 15 17 20 22 22 23 23 23 25 25 26 26 27 28 28 31 33 37 37
7521TOC.fm
2.4 Switch Domain IDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Distance extension for remote copy services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Optical multiplexors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.2 Long-distance SFPs/XFPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.3 Fibre Channel: IP conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Tape and disk traffic sharing the SAN. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Switch interoperability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8 IBM Tivoli Storage Productivity Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9 iSCSI support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9.1 iSCSI initiators and targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9.2 iSCSI Ethernet configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9.3 Security and performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9.4 Failover of port IP addresses and iSCSI names . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9.5 iSCSI protocol limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 3. SAN Volume Controller clustered system . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Advantages of virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 How does the SVC fit into your environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Scalability of SVC clustered systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Advantage of multi-clustered systems as opposed to single-clustered systems . 3.2.2 Growing or splitting SVC clustered systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Clustered system upgrade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 4. Backend storage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Controller affinity and preferred path. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Considerations for DS4000/DS5000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Setting DS4000/DS5000 so both controllers have the same WWNN . . . . . . . . . . 4.2.2 Balancing workload across DS4000/DS5000 controllers . . . . . . . . . . . . . . . . . . . 4.2.3 Ensuring path balance prior to MDisk discovery . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.4 ADT for DS4000/DS5000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.5 Selecting array and cache parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.6 Logical drive mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Considerations for DS8000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Balancing workload across DS8000 controllers . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 DS8000 ranks to extent pools mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.3 Mixing array sizes within a Storage Pool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.4 Determining the number of controller ports for DS8000 . . . . . . . . . . . . . . . . . . . . 4.3.5 LUN masking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.6 WWPN to physical port translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Considerations for XIV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Cabling considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 Host options and settings for IBM XIV systems . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.3 Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Considerations for V7000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 Defining internal storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.2 Configuring IBM Storwize V7000 storage systems . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Considerations for Third Party storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.1 Pathing considerations for EMC Symmetrix/DMX and HDS . . . . . . . . . . . . . . . . . 4.7 Medium error logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8 Mapping physical LBAs to volume extents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.9 Using Tivoli Storage Productivity Center to identify storage controller boundaries. . . .
37 37 38 38 38 38 39 39 40 40 40 40 41 41 43 44 45 45 46 47 51 53 54 54 54 55 56 56 56 58 58 58 59 59 60 60 62 63 63 64 65 65 65 66 66 66 67 67 67
Chapter 5. Storage pools and Managed Disks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.1 Availability considerations for Storage Pools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 vi
SAN Volume Controller Best Practices and Performance Guidelines
7521TOC.fm
5.2 Selecting storage subsystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Selecting the Storage Pool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Selecting the number of arrays per Storage Pool . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Selecting LUN attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 Considerations for IBM XIV Storage System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 SVC quorum disk considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Tiered storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Adding MDisks to existing Storage Pools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.1 Checking access to new MDisks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.2 Persistent reserve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.3 Renaming MDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 Restriping (balancing) extents across a Storage Pool . . . . . . . . . . . . . . . . . . . . . . . . . 5.7.1 Installing prerequisites and the SVCTools package . . . . . . . . . . . . . . . . . . . . . . . 5.7.2 Running the extent balancing script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8 Removing MDisks from existing Storage Pools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8.1 Migrating extents from the MDisk to be deleted . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8.2 Verifying an MDisks identity before removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8.3 LUNs to MDisk translation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.9 Remapping managed MDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.10 Controlling extent allocation order for volume creation . . . . . . . . . . . . . . . . . . . . . . . . 5.11 Moving an MDisk between SVC clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
72 73 73 74 75 76 79 80 80 80 80 81 81 82 85 85 85 86 94 95 97
Chapter 6. Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 6.1 Volume Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 6.1.1 Thin-provisioned volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 6.1.2 Space allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 6.1.3 Thin-provisioned volume performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 6.1.4 Limits on Virtual Capacity of Thin-provisioned volumes . . . . . . . . . . . . . . . . . . . 102 6.1.5 Testing an application with Thin-provisioned volume . . . . . . . . . . . . . . . . . . . . . 103 6.2 What is volume mirroring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 6.2.1 Creating or adding a mirrored volume. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 6.2.2 Availability of mirrored volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.2.3 Mirroring between controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.3 Creating Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.3.1 Selecting the Storage Pool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.3.2 Changing the preferred node within an I/O Group . . . . . . . . . . . . . . . . . . . . . . . 106 6.3.3 Moving a volume to another I/O Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 6.4 Volume migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6.4.1 Image type to striped type migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6.4.2 Migrating to image type volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6.4.3 Migrating with volume mirroring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 6.5 Preferred paths to a volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 6.5.1 Governing of volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 6.6 Cache mode and cache-disabled volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 6.6.1 Underlying controller remote copy with SVC cache-disabled volumes . . . . . . . . 114 6.6.2 Using underlying controller flash copy with SVC cache disabled volumes . . . . . 115 6.6.3 Changing cache mode of volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 6.7 The effect of load on storage controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 6.8 Setting up FlashCopy services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 6.8.1 Steps to making a FlashCopy volume with application data integrity . . . . . . . . . 120 6.8.2 Making multiple related FlashCopy volumes with data integrity . . . . . . . . . . . . . 122 6.8.3 Creating multiple identical copies of a volume . . . . . . . . . . . . . . . . . . . . . . . . . . 124 6.8.4 Creating a FlashCopy mapping with the incremental flag. . . . . . . . . . . . . . . . . . 124
Contents
vii
7521TOC.fm
6.8.5 Thin-provisioned FlashCopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8.6 Using FlashCopy with your backup application. . . . . . . . . . . . . . . . . . . . . . . . . . 6.8.7 Using FlashCopy for data migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8.8 Summary of FlashCopy rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8.9 IBM Tivoli Storage FlashCopy Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8.10 IBM System Storage Support for Microsoft Volume Shadow Copy Service . . . Chapter 7. Remote Copy services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Remote Copy services: an introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.1 Common terminology and definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.2 Intercluster link . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 SVC functions by release . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 What is new in SVC 6.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.2 Remote copy features by release . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Terminology and functional concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.1 Remote copy partnerships and relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.2 Global Mirror control parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.3 Global Mirror partnerships and relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.4 Asynchronous remote copy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.5 Understanding Remote Copy write operations . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.6 Asynchronous remote copy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.7 Global Mirror write sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.8 Importance of write ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.9 Colliding writes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.10 Link speed, latency, and bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.11 Choosing a link cable of supporting GM applications . . . . . . . . . . . . . . . . . . . . 7.3.12 Remote Copy Volumes: Copy directions an default roles. . . . . . . . . . . . . . . . . 7.4 Intercluster (Remote) link . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.1 SAN configuration overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.2 Switches and ISL oversubscription . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.3 Zoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.4 Distance extensions for the Intercluster Link . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.5 Optical multiplexors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.6 Long-distance SFPs/XFPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.7 Fibre Channel: IP conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.8 Configuration of intercluster (long distance) links . . . . . . . . . . . . . . . . . . . . . . . . 7.4.9 Link quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.10 Hops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.11 Buffer credits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5 Global Mirror design points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.1 Global Mirror parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.2 chcluster and chpartnership commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.3 How GM Bandwidth is distributed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.4 1920 errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6 Global Mirror planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6.1 Summary of Metro Mirror and Global Mirror rules. . . . . . . . . . . . . . . . . . . . . . . . 7.6.2 Planning overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6.3 Planning specifics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.7 Global Mirror use cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.7.1 Synchronize a Remote Copy relationship . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.7.2 Setting up GM relationships: saving bandwidth and resizing volumes . . . . . . . . 7.7.3 Master and auxiliary volumes and switching their roles . . . . . . . . . . . . . . . . . . . 7.7.4 Migrating a Metro Mirror relationship to Global Mirror. . . . . . . . . . . . . . . . . . . . .
125 126 126 127 128 128 131 132 133 134 135 135 138 138 139 139 140 142 142 142 143 144 144 145 146 147 147 147 148 148 149 150 150 150 150 151 152 153 154 155 156 156 160 160 160 161 162 164 164 165 166 167
viii
7521TOC.fm
7.7.5 Multiple Cluster Mirroring (MCM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.7.6 Performing three-way copy service functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.7.7 When to use storage controller Advanced Copy Services functions. . . . . . . . . . 7.7.8 Using Metro Mirror or Global Mirror with FlashCopy. . . . . . . . . . . . . . . . . . . . . . 7.7.9 Global Mirror upgrade scenarios. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.8 Inter-cluster MM / GM source as FC target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.9 States and steps in the GM relationship . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.9.1 Global Mirror states. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.9.2 Disaster Recovery and GM/MM states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.9.3 State definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.10 1920 errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.10.1 Diagnosing and fixing 1920. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.10.2 Focus areas for 1920 errors (the usual suspects). . . . . . . . . . . . . . . . . . . . . . . 7.10.3 Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.10.4 Disabling gmlinktolerance feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.10.5 Cluster error code 1920: check list for diagnosis . . . . . . . . . . . . . . . . . . . . . . . 7.11 Monitoring Remote Copy relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 8. Hosts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 Configuration recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.1 Recommended host levels and host object name . . . . . . . . . . . . . . . . . . . . . . . 8.1.2 The number of paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.3 Host ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.4 Port masking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.5 Host to I/O Group mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.6 Volume size as opposed to quantity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.7 Host volume mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.8 Server adapter layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.9 Availability as opposed to error isolation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Host pathing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.1 Preferred path algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.2 Path selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.3 Path management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.4 Dynamic reconfiguration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.5 Volume migration between I/O Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 I/O queues. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.1 Queue depths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Multipathing software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5 Host clustering and reserves. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5.1 AIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5.2 SDD compared to SDDPCM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5.3 Virtual I/O server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5.4 Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5.5 Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5.6 Solaris . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5.7 VMware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.6 Mirroring considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.6.1 Host-based mirroring. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.7 Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.7.1 Automated path monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.7.2 Load measurement and stress tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
167 171 173 173 174 175 177 177 179 180 182 182 183 186 187 188 189 191 192 192 192 193 194 194 194 194 199 199 199 199 200 201 201 203 205 205 207 207 209 212 213 215 216 217 220 221 221 222 223 223
Contents
ix
7521TOC.fm
Chapter 9. SVC 6.2 performance highlights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1 SVC continuing performance enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Solid State Drives (SSDs) and Easy Tier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.1 Internal SSDs Redundancy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.2 Performance scalability and I/O Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 Real Time Performance Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 10. Backend performance considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1 Workload considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Tiering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3 Storage controller considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.1 Backend IO capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4 Array considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.1 Selecting the number of LUNs per array. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.2 Selecting the number of arrays per storage pool . . . . . . . . . . . . . . . . . . . . . . . 10.5 I/O ports, cache, throughput considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5.1 Back-end queue depth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5.2 MDisk transfer size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.6 SVC extent size. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.7 SVC cache partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.8 DS8000 considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.8.1 Volume layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.8.2 Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.8.3 Determining the number of controller ports for DS8000 . . . . . . . . . . . . . . . . . . 10.8.4 Storage pool layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.8.5 Extent size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.9 XIV considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.9.1 LUN size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.9.2 IO ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.9.3 Storage pool layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.9.4 Extent size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.10 Storwize V7000 considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.10.1 Volume setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.10.2 IO ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.10.3 Storage pool layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.10.4 Extent size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.11 DS5000 considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.11.1 Selecting array and cache parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.11.2 Considerations for controller configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.11.3 Mixing array sizes within the storage pool . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.11.4 Determining the number of controller ports for DS4000 . . . . . . . . . . . . . . . . . Chapter 11. Easy Tier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Overview of Easy Tier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Easy Tier concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.1 SSD arrays and MDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.2 Disk tiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.3 Single tier storage pools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.4 Multiple tier storage pools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.5 Easy Tier process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.6 Easy Tier operating modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.7 Easy Tier activation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Easy Tier implementation considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
SAN Volume Controller Best Practices and Performance Guidelines
227 228 229 230 231 232 233 234 235 235 235 244 245 245 247 247 248 250 252 253 253 258 258 260 264 265 265 266 268 268 268 268 270 273 275 275 275 277 278 278 279 280 280 280 281 281 281 282 283 284 284 284
7521TOC.fm
11.3.2 Implementation rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4 Measuring and activating Easy Tier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4.1 Measuring by using the Storage Advisor Tool . . . . . . . . . . . . . . . . . . . . . . . . . 11.5 Using Easy Tier with the SVC CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.5.1 Initial cluster status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.5.2 Turning on Easy Tier evaluation mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.5.3 Creating a multitier storage pool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.5.4 Setting the disk tier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.5.5 Checking a volumes Easy Tier mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.5.6 Final cluster status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.6 Using Easy Tier with the SVC GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.6.1 Setting the disk tier on MDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.6.2 Checking Easy Tier status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.7 Solid State Drives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 12. Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1 Application workloads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1.1 Transaction-based workloads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1.2 Throughput-based workloads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1.3 Storage subsystem considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1.4 Host considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 Application considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2.1 Transaction environments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2.2 Throughput environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3 Data layout overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3.1 Layers of volume abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3.2 Storage administrator and AIX LVM administrator roles . . . . . . . . . . . . . . . . . . 12.3.3 General data layout recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3.4 Database strip size considerations (throughput workload) . . . . . . . . . . . . . . . . 12.3.5 LVM volume groups and logical volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4 Database Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5 Data layout with the AIX virtual I/O (VIO) server . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5.2 Data layout strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.6 Volume size. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.7 Failure boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
284 285 286 286 287 288 288 290 291 292 292 293 293 295 296 297 298 298 298 299 299 299 300 300 301 301 302 302 305 305 306 306 306 307 307 307
Part 2. Management, monitoring and troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309 Chapter 13. Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1 Using Tivoli Storage Productivity Center to analyze the SVC . . . . . . . . . . . . . . . . . . 13.1.1 IBM SAN Volume Controller (SVC) or Storwize V7000 . . . . . . . . . . . . . . . . . . 13.2 SVC considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2.1 SVC traffic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2.2 SVC best practice recommendations for performance . . . . . . . . . . . . . . . . . . . 13.3 Storwize V7000 considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.1 Storwize V7000 traffic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.2 Storwize V7000 best practice recommendations for performance . . . . . . . . . . 13.4 Top 10 reports for SVC and Storwize V7000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.1 Top 10 for SVC and Storwize V7000#1: I/O Group Performance reports. . . . . 13.4.2 Top 10 for SVC and Storwize V7000#2: Node Cache Performance reports . . 13.4.3 Top 10 for SVC #3: Managed Disk Group Performance reports. . . . . . . . . . . . 13.4.4 Top 10 for SVC and Storwize V7000 #5-9: Top Volume Performance reports.
Contents
311 312 312 316 316 316 317 317 317 318 319 327 335 341 xi
7521TOC.fm
13.4.5 Top 10 for SVC and Storwize V7000 #10: Port Performance reports. . . . . . . . 13.5 Reports for Fabric and Switches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.5.1 Switches reports: Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.5.2 Switch Port Data Rate performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.6 Case study: Server - performance problem with one server . . . . . . . . . . . . . . . . . . . 13.7 Case study: Storwize V7000- disk performance problem . . . . . . . . . . . . . . . . . . . . . 13.8 Case study: Top volumes response time and I/O rate performance report. . . . . . . . 13.9 Case study: SVC and Storwize V7000 performance constraint alerts . . . . . . . . . . . 13.10 Case study: Fabric - monitor and diagnose performance . . . . . . . . . . . . . . . . . . . . 13.11 Case study: Using Topology Viewer to verify SVC and Fabric configuration . . . . . 13.11.1 Ensuring that all SVC ports are online . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.11.2 Verifying SVC port zones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.11.3 Verifying paths to storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.11.4 Verifying host paths to the Storwize V7000 . . . . . . . . . . . . . . . . . . . . . . . . . . 13.12 Using SVC or Storwize V7000 GUI for real-time monitoring . . . . . . . . . . . . . . . . . . 13.13 Gathering manually the SVC statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 14. Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1 Automating SVC and SAN environment documentation . . . . . . . . . . . . . . . . . . . . . . 14.1.1 Naming Conventions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1.2 SAN Fabrics documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1.3 SVC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1.4 Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1.5 Technical Support Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1.6 Tracking Incident & Change tickets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1.7 Automated Support Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1.8 Subscribing for SVC support information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2 Storage Management IDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3 Standard operating procedures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.1 Allocate and de-allocate volumes to hosts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.2 Add and remove hosts in SVC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.4 SVC Code upgrade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.4.1 Prepare for upgrade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.4.2 SVC Upgrade from 5.1 to 6.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.4.3 Upgrade SVC clusters participating in MM or GM . . . . . . . . . . . . . . . . . . . . . . 14.4.4 SVC upgrade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.5 SAN modifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.5.1 Cross-referencing HBA WWPNs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.5.2 Cross-referencing LUNids. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.5.3 HBA replacement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.6 SVC Hardware Upgrades . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.6.1 Add SVC nodes to an existing cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.6.2 Upgrade SVC nodes in an existing cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.6.3 Move to a new SVC cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.7 Wrap up. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 15. Troubleshooting and diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.1 Common problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.1.1 Host problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.1.2 SVC problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.1.3 SAN problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.1.4 Storage subsystem problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.2 Collecting data and isolating the problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
346 352 352 353 355 359 368 370 374 380 380 383 383 386 388 391 395 396 396 399 400 401 401 402 403 403 404 405 405 405 406 406 410 412 412 413 413 414 415 416 416 417 417 418 419 420 420 420 422 422 424
xii
7521TOC.fm
15.2.1 Host data collection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.2.2 SVC data collection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.2.3 SAN data collection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.2.4 Storage subsystem data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.3 Recovering from problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.3.1 Solving host problems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.3.2 Solving SVC problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.3.3 Solving SAN problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.3.4 Solving back-end storage problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.4 Mapping physical LBAs to volume extents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.4.1 Investigating a medium error using lsvdisklba . . . . . . . . . . . . . . . . . . . . . . . . . 15.4.2 Investigating thin-provisioned volume allocation using lsmdisklba . . . . . . . . . . 15.5 Medium error logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.5.1 Host-encountered media errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.5.2 SVC-encountered medium errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
424 427 431 435 438 438 440 443 443 447 447 447 448 448 449
Part 3. Practical examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451 Chapter 16. SVC scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.1 SVC upgrade with CF8 nodes and internal SSDs. . . . . . . . . . . . . . . . . . . . . . . . . . . 16.2 Move a AIX server to another LPAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.3 Migration to new SVC using Copy Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.4 SVC Scripting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IBM Redbooks publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Referenced Web sites. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How to get IBM Redbooks publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453 454 465 468 472 477 477 477 478 479 479
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
Contents
xiii
7521TOC.fm
xiv
7521pref.fm
Preface
This IBM Redbooks publication captures several of the best practices based on field experience and describes the performance gains that can be achieved by implementing the IBM System Storage SAN Volume Controller at the V6.2 level. This book is intended for experienced storage, SAN, and SVC administrators and technicians. Readers are expected to have an advanced knowledge of the SAN Volume Controller (SVC) and SAN environment, and we recommend these books as background reading: IBM System Storage SAN Volume Controller, SG24-6423 Introduction to Storage Area Networks, SG24-5470 Using the SVC for Business Continuity, SG24-7371
xv
7521pref.fm
Jon Parkes is a Level 3 Service Specialist at IBM UK, Hursley. He has over 15 years experience of testing, developing disk drives, storage products and applications. He is experienced in managing product test, product quality assurance activities and providing technical advocacy to Clients. For the past 4 years he has specialised in the test and support of SAN Volume Controller and V7000 product range. Otavio Rocha Filho is a SAN Storage Specialist for Strategic Outsourcing, IBM Brazil Global Delivery Center in Hortolandia. Since joining IBM in 2007 he's been the SAN Storage subject matter expert (SME) for many of its international customers. Working with Information Technology since 1988, he has been dedicated to storage solutions design, implementation and support since 1998, deploying the latest in Fibre Channel and SAN technology since its early years. Otavio's certifications include Open Group Master IT Specialist, Brocade SAN Manager, and ITIL Service Management Foundation. Leandro Torolho is an IT Specialist for IBM Global Services in Brazil. With a background in UNIX and Backup areas, he is currently a SAN Storage subject matter expert (SME) working on implementation and support for its international customers. He holds a Bachelor degree in Computer Science from USCS/SCS, So Paulo Brazil as well as Post-Graduation in Computer Networks from FASP/SP, So Paulo, Brazil. He has 10 years of experience on Information Technology and is AIX, TSM and ITIL certified. We extend our thanks to the following people for their contributions to this project. There are many people that contributed to this book. In particular, we thank the development and PFE teams in Hursley, England. The authors of the previous edition of this book were: Katja Gebuhr Alex Howell Nik Kjeldsen Jon Tate We also want to thank the following people for their contributions: Lloyd Dean Parker Grannis Brian Sherman Bill Wiegand
xvi
7521pref.fm
Comments welcome
Your comments are important to us! We want our books to be as helpful as possible. Send us your comments about this book or other IBM Redbooks publications in one of the following ways: Use the online Contact us review IBM Redbooks publications form found at: ibm.com/redbooks Send your comments in an e-mail to: redbooks@us.ibm.com Mail your comments to: IBM Corporation, International Technical Support Organization Dept. HYTD Mail Station P099 2455 South Road Poughkeepsie, NY 12601-5400
Preface
xvii
7521pref.fm
xviii
7521chang.fm
Summary of changes
This section describes the technical changes made in this edition of the book and in previous editions. This edition might also include minor corrections and editorial changes that are not identified. Summary of Changes for SG24-7521-02 for SAN Volume Controller Best Practices and Performance Guidelines as created or updated on February 16, 2012.
New information
SVC 6.2 function Space-Efficient VDisks SVC Console VDisk Mirroring
xix
7521chang.fm
xx
7521p01.fm
Part 1
Part
7521p01.fm
7521Update20111209.fm
Chapter 1.
SVC update
In this chapter we will provide a summary of the enhancements in the SAN Volume Controller (SVC) since version 4.3.0. Changed terminology from previous SVC releases are explained. We also provide a summary of the contents of this book.
7521Update20111209.fm
7521Update20111209.fm
Optional second management IP address configured on eth1 port The existing SVC node hardware has two Ethernet ports. Until SVC 4.3, only one Ethernet port (eth0) has been used for cluster configuration. In SVC V5.1 a second, new cluster IP address can be optionally configured on the eth1 port. Added interoperability Interoperability with new storage controllers, host operating systems, fabric devices and other hardware. An updated list can be found at: https://www-304.ibm.com/support/docview.wss?uid=ssg1S1003553. Withdrawal of support for 2145-4F2 nodes (32 bit). As stated before, SVC V5.1 only supports SVC 2145 engines that use 64-bit hardware. SVC Entry Edition allows up to 250 drives, running only on 2145-8A4 nodes The SVC Entry Edition uses a per-disk-drive charge unit, and now may be used for storage configurations of up to 250 disk drives.
7521Update20111209.fm
Temporarily withdrawal of support for SSDs on the 2145-CF8 nodes. At the time of writing 2145-CF8 nodes using internal Solid State Drives (SSDs) are unsupported with V6.1.0.x code (fixed in version 6.2). Interoperability with new storage controllers, host operating systems, fabric devices and other hardware. An updated list can be found at: https://www-304.ibm.com/support/docview.wss?uid=ssg1S1003697. Removal of 15 character maximum name length restrictions SVC V6.1 supports object names up to 63 characters. Previous levels only supported up to 15 characters. SVC code upgrades The SVC Console code has been removed. Now you just need to update the SVC code. The upgrade from SVC V5.1 will require the usage of the formerly console interface or the command line. After the upgrade is complete, you can remove the existing ICA (console) application from your SSPC or Master Console. The new GUI is launched through a web browser pointing the SVC ip address. SVC to Backend Controller I/O change SVC V6.1 allows variable block sizes up to 256KB against 32KB supported in the previous versions. This is handled automatically by the SVC system without requiring any user control. Scalability The maximum extent size increased four times to 8GB. With an extent size of 8GB, the total storage capacity manageable per cluster is 32PB. The maximum volume size increased to 1PB. The maximum number of WWNN increased to 1024 allowing up to 1024 backend storage subsystems to be virtualized. SVC and Storwize V7000 Interoperability The virtualization layer of IBM Storwize V7000 is built upon the IBM SAN Volume Controller technology. SVC V6.1 is the first version supported in this environment. Terminology change To coincide with new and existing IBM products and functions, several common terms have changed and are incorporated in the SAN Volume Controller information. The following table shows the current and previous usage of the changed common terms.
Table 1-1 Terminology mapping table 6.1.0 SAN Volume Controller term event Previous SAN Volume Controller term error Description An occurrence of significance to a task or system. Events can include completion or failure of an operation, a user action, or the change in state of a process. The process of controlling which hosts have access to specific volumes within a cluster. A collection of storage capacity that provides the capacity requirements for a volume. The ability to define a storage unit (full system, storage pool, volume) with a logical capacity size that is larger than the physical capacity assigned to that storage unit.
7521Update20111209.fm
Description A discrete unit of storage on disk, tape, or other data recording medium that supports some form of identifier and parameter list, such as a volume label or input/output control.
7521Update20111209.fm
Licensing change: Removal of physical site boundary. The licensing for SVC systems (formerly clusters) within the same country belonging to the same customer can be aggregated in a single license. FlashCopy is now licensed on the main source volumes. SVC V6.2 changes the way the FlashCopy is licensed so that SVC now counts as the main source in FlashCopy relationships. Previously, if cascaded FlashCopy was set up, multiple source volumes would have to be licensed. Interoperability with new storage controllers, host operating systems, fabric devices and other hardware. An updated list can be found at: https://www-304.ibm.com/support/docview.wss?uid=ssg1S1003797. Exceed entitled virtualization license is allowed during 45 days from the installation date for the purpose of migrating data from one system to another. With the benefit of virtualization, SVC allows customers to bring new storage systems into their storage environment and very quickly and easily migrate data from their existing storage systems to the new storage systems. In order to facilitate this migration, IBM allows customers to temporarily (45 days from the date of installation of the SVC) exceed their entitled virtualization license for the purpose of migrating data from one system to another. The following table shows the current and previous usage of the changed common terms.
Table 1-2 Terminology mapping table 6.2.0 SAN Volume Controller term clustered system or system Previous SAN Volume Controller term cluster Description A collection of nodes that are placed in pairs (I/O groups) for redundancy, which provide a single management interface.
7521Update20111209.fm
Chapter 6. Volumes
In this chapter, we discuss several aspects and options when creating Volumes (formerly VDisks). We describe how to create, manage and migrate volumes across I/O groups. We enlarge upon the thin-provisioned volumes presenting performance and limit considerations. We also explain how to take advantage of mirroring and flashcopy capabilities.
Chapter 8. Hosts
In this chapter we provide some recommendations about host attachment and pathing configuration regarding to performance and scalability. We discuss the host concerns about some volume operations like extend volume or migration between I/O groups. The host clustering and the underlying reserve policy is also discussed here.
7521Update20111209.fm
10
7521SAN Fabric20111209.fm
Chapter 2.
SAN topology
The IBM System Storage Area Network (SAN) Volume Controller (SVC) has unique SAN fabric configuration requirements that differ from what you might be used to in your storage infrastructure. A quality SAN configuration can help you achieve a stable, reliable, and scalable SVC installation; conversely, a poor SAN environment can make your SVC experience considerably less pleasant. This chapter provides you with information to tackle this topic. Note: As with any of the information in this book, you must check the IBM System Storage SAN Volume Controller V6.2.0 - Software Installation and Configuration Guide, GC27-2286, and IBM System Storage SAN Volume Controller 6.2.0 Configuration Limits and Restrictions, S1003799, for limitations, caveats, updates, and so on that are specific to your environment. Do not rely on this book as the last word in SVC SAN design. Also, anyone planning for an SVC installation must be knowledgeable about general SAN design principles. Refer to the IBM System Storage SAN Volume Controller Support Web page for updated documentation before implementing your solution. The Web site is: http://www-947.ibm.com/support/entry/portal/Overview/Hardware/System_Storage/St orage_software/Storage_virtualization/SAN_Volume_Controller_(2145) Note: All document citations in this book refer to the 6.2 version of the SVC product documents. If you use a different version, refer to the correct edition of the documents. As you read this chapter, remember that this is a best practices book based on field experiences. Although there will be many other possible (and supported) SAN configurations not found in this chapter, we think they are not the most recommended.
11
7521SAN Fabric20111209.fm
2.1.1 Redundancy
One of the fundamental SVC SAN requirements is to create two (or more) entirely separate SANs that are not connected to each other over Fibre Channel in any way. The easiest way is to construct two SANs that are mirror images of each other. Technically, the SVC supports using just a single SAN (appropriately zoned) to connect the entire SVC. However, we do not recommend this design in any production environment. In our experience, we also do not recommend this design in development environments either, because a stable development platform is important to programmers, and an extended outage in the development environment can cause an expensive business impact. For a dedicated storage test platform, however, it might be acceptable.
12
7521SAN Fabric20111209.fm
communicate. Conversely, storage traffic and inter-node traffic must never cross an ISL, except during migration scenarios. High-bandwidth-utilization servers (such as tape backup servers) must also be on the same SAN switches as the SVC node ports. Putting them on a separate switch can cause unexpected SAN congestion problems. Putting a high-bandwidth server on an edge switch is a waste of ISL capacity. If at all possible, plan for the maximum size configuration that you ever expect your SVC installation to reach. As you will see in later parts of this chapter, the design of the SAN can change radically for larger numbers of hosts. Modifying the SAN later to accommodate a larger-than-expected number of hosts might produce a poorly-designed SAN. Moreover it can be difficult, expensive, and disruptive to your business. Planning for the maximum size does not mean that you need to purchase all of the SAN hardware initially. It only requires you to design the SAN considering the maximum size. Always deploy at least one extra ISL per switch. Not doing so exposes you to consequences from complete path loss (this is bad) to fabric congestion (this is even worse). The SVC does not permit the number of hops between the SVC clustered system and the hosts to exceed three hops, which is typically not a problem.
13
7521SAN Fabric20111209.fm
The RPQ process involves a review of your proposed SAN design to ensure that it is reasonable for your proposed environment.
14
7521SAN Fabric20111209.fm
SVC Node 2 2 2
SVC Node
Core Switch
Core Switch
Edge Switch
Edge Switch
Edge Switch
Edge Switch
Host
Figure 2-1 Core-edge topology
Host
15
7521SAN Fabric20111209.fm
SVC Node
SVC Node
Core Switch
Core Switch
Core Switch
Core Switch
Edge Switch
Edge Switch
Edge Switch
Edge Switch
Host
Figure 2-2 Four-SAN core-edge topology
Host
While certain clients have chosen to simplify management by connecting the SANs together into pairs with a single ISL link, we do not recommend this design. With only a single ISL connecting fabrics together, a small zoning mistake can quickly lead to severe SAN congestion. Using the SVC as a SAN bridge: With the ability to connect an SVC clustered system to four SAN fabrics, it is possible to use the SVC as a bridge between two SAN environments (with two fabrics in each environment). This configuration can be useful for sharing resources between the SAN environments without merging them. Another use is if you have devices with different SAN requirements present in your installation. When using the SVC as a SAN bridge, pay special attention to any restrictions and requirements that might apply to your installation.
16
7521SAN Fabric20111209.fm
SVC Node 2 2
SVC Node
Switch
Switch
Switch
Switch
SVC -> Storage Traffic should be zoned to never travel over these links SVC-attach host
Figure 2-3 Spread out disk paths
Non-SVC-attach host
If you have this type of topology, it is extremely important to zone the SVC so that it will only see paths to the storage subsystems on the same SAN switch as the SVC nodes. Implementing a storage subsystem host port mask might also be feasible here. Note: This type of topology means you must have more restrictive zoning than what is detailed in 2.3.6, Sample standard SVC zoning configuration on page 33. Because of the way that the SVC load balances traffic between the SVC nodes and MDisks, the amount of traffic that transits your ISLs will be unpredictable and vary significantly. If you
Chapter 2. SAN topology
17
7521SAN Fabric20111209.fm
have the capability, you might want to use either Cisco Virtual SANs (VSANs) or Brocade Traffic Isolation to dedicate an ISL to high-priority traffic. However, as stated before, internode and SVC to backend storage communication should never cross ISLs.
18
7521SAN Fabric20111209.fm
Old Switch
New Switch
Old Switch
New Switch
SVC -> Storage Traffic should be zoned and masked to never travel over these links, but they should be zoned for intraCluster communications Host Host
This design is a valid configuration, but you must take certain precautions: As stated in Accidentally accessing storage over ISLs on page 17, the zone and Logical Unit Number (LUN) mask the SAN and storage subsystems, so that you do not access the storage subsystems over the ISLs. This design means that your storage subsystems will need connections to both the old and new SAN switches. Have two dedicated ISLs between the two switches on each SAN with no data traffic traveling over them. The reason for this design is because if this link ever becomes congested or lost, you might experience problems with your SVC clustered system if there are also issues at the same time on the other SAN. If you can, set a 5% traffic threshold alert on the ISLs so that you know if a zoning mistake has allowed any data traffic over the links.
19
7521SAN Fabric20111209.fm
Note: It is not a best practice to use this configuration to perform mirroring between I/O Groups within the same clustered system. And, you must never split the two nodes in an I/O Group between various SAN switches within the same SAN fabric. The optional 8 Gbps LW SFPs in the 2145-CF8 and 2145-CG8 allow to split an SVC I/O group across long distances as described in Split clustered system / Stretch clustered system on page 20.
20
7521SAN Fabric20111209.fm
Some service actions require physical access to all SAN Volume Controller nodes in a system. If nodes in a split clustered system are separated by more than 100 meters, service actions might require multiple service personnel. Figure 2-5 illustrates an example of a split clustered system configuration. When used in conjunction with volume mirroring, this configuration provides a high availability solution that is tolerant of a failure at a single site.
Switch
Switch
Physical Location 3
SVC Node
Switch
Switch
host
host
Figure 2-5 A split clustered system with a quorum disk located at a third site
Quorum placement
A split clustered system configuration locates the active quorum disk at a third site. If communication is lost between the primary and secondary sites, the site with access to the active quorum disk continues to process transactions. If communication is lost to the active quorum disk, an alternative quorum disk at another site can become the active quorum disk. Although a system of SAN Volume Controller nodes can be configured to use up to three quorum disks, only one quorum disk can be elected to resolve a situation where the system is partitioned into two sets of nodes of equal size. The purpose of the other quorum disks is to provide redundancy if a quorum disk fails before the system is partitioned. Note: SSD managed disks should not be chosen for quorum disk purposes as long as SSD lifespan depends on write workload.
Configuration summary
Generally, when the nodes in a system have been split among sites, configure the SAN Volume Controller system this way: Site 1: Half of SAN Volume Controller system nodes + one quorum disk candidate Site 2: Half of SAN Volume Controller system nodes + one quorum disk candidate
Chapter 2. SAN topology
21
7521SAN Fabric20111209.fm
Site 3: Active quorum disk Disable the dynamic quorum configuration by using the chquorum command with the override yes option.
Note: Some fix levels of 6.2.0.x. do not support split clustered systems, please check the following flash for the latest details: https://www-304.ibm.com/support/docview.wss?uid=ssg1S1003853
22
7521SAN Fabric20111209.fm
IBM sells and support SAN switches from both of the major SAN vendors listed in the following product portfolios: IBM System Storage b-type/Brocade SAN portfolio IBM System Storage/Cisco SAN portfolio
Fabric Watch
The Fabric Watch feature found in newer IBM/Brocade-based SAN switches can be useful as long as the SVC relies on a healthy properly functioning SAN. Fabric Watch is a SAN health monitor designed to enable real-time proactive awareness of the health, performance, and security of each switch. It automatically alerts SAN managers to predictable problems in order to help in avoiding costly failures. It tracks a wide spectrum of fabric elements, events and counters. Fabric Watch allows you to configure the monitoring and measuring frequency for each switch and fabric element and specify notification thresholds. Whenever these thresholds are exceed, Fabric Watch automatically provides notification using several methods, including e-mail messages, SNMP traps, log entries or even posts alerts to Data Center Fabric Manager (DCFM). The components that Fabric Watch monitors are grouped in classes as follows: Environment (like temperature) Fabric (zone changes, fabric segmentation, E_Port down, among others) Field Replaceable Unit (provides an alert when a part replacement is needed) Performance Monitor (for instance RX and TX performance between two devices)
23
7521SAN Fabric20111209.fm
Port (monitors port statistics and takes action based on the configured thresholds and actions. Actions may include port fencing.) Resource (RAM, flash, memory, and CPU) Security (monitors different security violations on the switch and takes action based on the configured thresholds and their actions) SFP (monitor the physical aspects of an SFP, such as voltage, current, RXP, TXP, and state changes in physical ports) By implementing Fabric Watch you benefit of an improved high availability from proactive notification. Furthermore, it allows you to reduce troubleshooting and root cause analysis (RCA) times. Fabric Watch is an optionally licensed feature of Fabric OS, however it is already included in the base licensing of the new IBM System Storage b-Series switches.
Bottleneck detection
A bottleneck is a situation where the frames of a fabric port cannot get through as fast as they should. In this condition the offered load is greater than the achieved egress throughput on the affected port. The bottleneck detection feature does not require any additional license. It identifies and alerts you to ISL or device congestion as well as device latency conditions in the fabric. The bottleneck detection also enables you to prevent degradation of throughput in the fabric and to reduce the time it takes to troubleshoot SAN performance problems. Bottlenecks are reported through RASlog alerts and SNMP traps and you can set alert thresholds for the severity and duration of the bottleneck. Starting in Fabric OS 6.4.0, you configure bottleneck detection on a per-switch basis, with per-port exclusions.
Virtual Fabrics
Virtual Fabrics adds the capability for physical switches to be partitioned into independently managed logical switches. There are multiple advantages of implementing Virtual Fabrics like hardware consolidation, improved security, resource sharing by several customers, among others. The following IBM System Storage platforms are Virtual Fabrics-capable: SAN768B SAN384B SAN80B-4 SAN40B-4 To configure Virtual Fabrics you do not need to install any additional license.
24
7521SAN Fabric20111209.fm
Smooth fabric migrations during technology refresh projects. In conjunction with tunneling protocols (like FCIP) allows the connectivity between fabrics over long distances. Integrated Routing (IR) is a licensed feature that allows 8-Gbps FC ports of SAN768B and SAN384B among others, to be configured as EX_Ports (or VEX_Ports) supporting Fibre Channel routing. With IR capable switches or directors in conjunction with the respective license, you do not need to deploy external FC routers or FC router blades for FC-FC routing. For more information about the IBM System Storage b-type/Brocade products, refer to the following IBM Redbooks publications: Implementing an IBM b-type SAN with 8 Gbps Directors and Switches, SG24-6116 IBM System Storage b-type Multiprotocol Routing: An Introduction and Implementation, SG24-7544
Port Channels
To ease the required planning efforts for future SAN expansions, ISLs/Port Channels can be made up of any combination of ports in the switch, which means that it is not necessary to reserve special ports for future expansions when provisioning ISLs. Instead, you can use any free port in the switch for expanding the capacity of an ISL/Port Channel.
Cisco VSANs
Virtual SANs (VSANs) let you to achieve an improved SAN scalability, availability, and security by allowing multiple Fibre Channel SANs to share a common physical infrastructure of switches and ISLs. These benefits are achieved based upon independent Fibre Channel services and traffic isolation between VSANs. Using Inter VSAN Routing (IVR), you can establish a data communication path between initiators and targets located on different VSANs without merging VSANs into a single logical fabric. As long as VSANs may group ports across multiple physical switches, enhanced inter switch links (EISLs) can be used to carry traffic belonging to multiple VSANs (VSAN trunking). The main VSAN implementation advantages are hardware consolidation, improved security and resource sharing by several independent organizations like customers.It is possible to use Cisco VSANs, combined with inter-VSAN routes, to isolate the hosts from the storage arrays. This arrangement provides little benefit for a great deal of added configuration complexity. However, VSANs with inter-VSAN routes can be useful for fabric migrations from non-Cisco vendors onto Cisco fabrics, or other short-term situations. VSANs can also be useful if you have a storage array direct attached by hosts in conjunction with some space virtualized through the SVC. (In this instance, it is best to use separate storage ports for the SVC and the hosts. We do not advise using inter-VSAN routes to enable port sharing.)
25
7521SAN Fabric20111209.fm
can be triggered erroneously if an SVC port from fabric A is zoned through a SAN router so that an SVC port from the same node in fabric B can log into the fabric A port. To prevent this situation from happening, it is important that whenever implementing advanced SAN FCR functions, be careful to ensure that the routing configuration is correct.
2.3 Zoning
Because it differs from traditional storage devices, properly zoning the SVC into your SAN fabric is a source of misunderstanding and errors. Despite the misunderstandings and errors, zoning the SVC into your SAN fabric is not particularly complicated. Note: Errors caused by improper SVC zoning are often fairly difficult to isolate, so create your zoning configuration carefully. Here are the basic SVC zoning steps: 1. 2. 3. 4. 5. 6. Create SVC internode communications zone. Create SVC clustered system. Create SVC Back-end storage subsystem zones. Assign back-end storage to the SVC. Create host SVC zones. Create host definitions on the SVC.
The zoning scheme that we describe next is slightly more restrictive than the zoning described in the IBM System Storage SAN Volume Controller V6.2.0 - Software Installation and Configuration Guide, GC27-2286. The Configuration Guide is a statement of what is supported, nevertheless this publication is a statement of our understanding of the best way to set up zoning, even if other ways are possible and supported.
26
7521SAN Fabric20111209.fm
27
7521SAN Fabric20111209.fm
Aliases
We strongly recommend that you use zoning aliases when creating your SVC zones if they are available on your particular type of SAN switch. Zoning aliases make your zoning easier to configure and understand and cause fewer possibilities for errors. One approach is to include multiple members in one alias, because zoning aliases can normally contain multiple members (just like zones). We recommend that you create aliases for: One that holds all the SVC node ports on each fabric One for each storage subsystem (or controller blade, in the case of DS4x00 units) One for each I/O Group port pair (that is, it needs to contain the 1st node in the I/O Group, port 2, and the 2nd node in the I/O Group, port 2) Host aliases can be omitted in smaller environments, as in our lab environment.
28
7521SAN Fabric20111209.fm
CtrlA_Fa bricA
1 2 3 4
CtrlA_Fa bricB
1 2 3 4
SAN Fabric B
CtrlB_Fa bricB
Netwo rk
SVC nod es
For more information about zoning the IBM System Storage IBM DS5000 or IBM DS4000 within the SVC, refer to the following IBM Redbooks publication: IBM Midrange System Storage Implementation and Best Practices Guide, SG24-6363.
To take advantage of the combined capabilities of SVC and XIV, you should zone two ports (one per fabric) from each interface module with the SVC ports. You need to decide which XIV ports you are going to use for the connectivity with the SVC. If you do not use and do not have plans to use XIV remote mirroring, you must change the role of port 4 from initiator to target on all XIV interface modules and use ports 1 and 3 from every interface module into the fabric for the SVC attachment. Otherwise, you must use ports 1 and 2 from every interface modules instead of ports 1 and 3. Each HBA port on the XIV Interface Module is designed and set to sustain up to 1400 concurrent I/Os. However port 3 will only sustain up to 1000 concurrent I/Os if port 4 is defined as initiator. In Figure 2-8 we show how to zone an XIV frame as an SVC storage controller.
Note: Only single rack XIV configurations are supported by SVC. Multiple single racks can be supported where each single rack is seen by SVC as a single controller.
29
7521SAN Fabric20111209.fm
2 1 2 1 2 1 2 1 2 1 2 1
4 3 4 3 4 3 4 3 4 3 4 3 1 2 3 4 1 2 3 4
SAN Fabric A
SAN Fabric B
Network
SVC nodes
Storwize V7000
Storwize V7000 external storage systems can present volumes to a SAN Volume Controller. A Storwize V7000 system, however, cannot present volumes to another Storwize V7000 system. To zone the Storwize V7000 as an SVC back-end storage controller, the minimum requirement is having every SVC node with the same Storwise V7000 view which should be at least one port per Storwize 7000 canister. Figure 2-9 shows an example of how the SVC can be zoned with the Storwize V7000.
30
7521SAN Fabric20111209.fm
Storwize V7000
Canister 1 Canister 2
SAN Fabric A
3 4
1 2 3 4
1 1 2 2 3 4
SAN Fabric B
3 4
Network
SVC nodes
31
7521SAN Fabric20111209.fm
D
I/O Group 0
SVC Node
SVC Node
Zone Bar_Slot2_SAN_A
Zone Bar_Slot8_SAN_B
Host Foo
Host Bar
The IBM System Storage SAN Volume Controller V6.2.0 - Software Installation and Configuration Guide, GC27-2286, discusses putting many hosts into a single zone as a supported configuration under certain circumstances. While this design usually works just fine, instability in one of your hosts can trigger all sorts of impossible to diagnose problems in the other hosts in the zone. For this reason, you need to only have a single host in each zone (single initiator zones). It is a supported configuration to have eight paths to each volume, but this design provides no performance benefit (indeed, under certain circumstances, it can even reduce performance), and it does not improve reliability or availability by any significant degree. To obtain the best overall performance of the system and to prevent overloading, the workload to each SAN Volume Controller port must be equal. This can typically involve zoning approximately the same number of host Fibre Channel ports to each SAN Volume Controller Fibre Channel port.
32
7521SAN Fabric20111209.fm
The reason that we do not just assign one HBA to each of the paths is because, for any specific volume, one node solely serves as a backup node (a preferred node scheme is used). The load is never going to get balanced for that particular volume. It is better to load balance by I/O Group instead, and let the volume get automatically assigned to nodes.
Switch A
Switch B
Peter
Barry
Jon
Ian
Thorsten
Ronda
Deon
Foo
Aliases
Unfortunately, you cannot nest aliases, so several of these WWPNs appear in multiple aliases. Also, do not be concerned if none of your WWPNs looks like the example; we made a few of them up when writing this book. Note that certain switch vendors (for example, McDATA) do not allow multiple-member aliases, but you can still create single-member aliases. While creating single-member aliases
33
7521SAN Fabric20111209.fm
does not reduce the size of your zoning configuration, it still makes it easier to read than a mass of raw WWPNs. For the alias names, we have appended SAN_A on the end where necessary to distinguish that these alias names are the ports on SAN A. This system helps if you ever have to perform troubleshooting on both SAN fabrics at one time.
34
7521SAN Fabric20111209.fm
DS4k_23K45_Blade_A_SAN_A 20:04:00:a0:b8:17:44:32 20:04:00:a0:b8:17:44:33 DS4k_23K45_Blade_B_SAN_A 20:05:00:a0:b8:17:44:32 20:05:00:a0:b8:17:44:33 DS8k_34912_SAN_A 50:05:00:63:02:ac:01:47 50:05:00:63:02:bd:01:37 50:05:00:63:02:7f:01:8d 50:05:00:63:02:2a:01:fc
Zones
Remember when naming your zones that they cannot have identical names as aliases. Here is our sample zone set, utilizing the aliases that we have just defined.
SVC_Cluster_Zone_SAN_A: SVC_Cluster_SAN_A
SVC_DS4k_23K45_Zone_Blade_A_SAN_A: SVC_Cluster_SAN_A
35
7521SAN Fabric20111209.fm
WinPeter_Slot3: 21:00:00:e0:8b:05:41:bc SVC_Group0_Port1 WinBarry_Slot7: 21:00:00:e0:8b:05:37:ab SVC_Group0_Port3 WinJon_Slot1: 21:00:00:e0:8b:05:28:f9 SVC_Group1_Port1 WinIan_Slot2: 21:00:00:e0:8b:05:1a:6f SVC_Group1_Port3 AIXRonda_Slot6_fcs1: 10:00:00:00:c9:32:a8:00 SVC_Group0_Port1 AIXThorsten_Slot2_fcs0: 10:00:00:00:c9:32:bf:c7 SVC_Group0_Port3 AIXDeon_Slot9_fcs3: 10:00:00:00:c9:32:c9:6f SVC_Group1_Port1 36
SAN Volume Controller Best Practices and Performance Guidelines
7521SAN Fabric20111209.fm
37
7521SAN Fabric20111209.fm
7521SAN Fabric20111209.fm
39
7521SAN Fabric20111209.fm
40
7521SAN Fabric20111209.fm
default value is 1500, with a maximum of 9000. An MTU of 9000 (jumbo frames), enables you to save CPU utilization and increases the efficiency. It reduces the overhead and increases the payload. Jumbo frames provide you with improved iSCSI performance. Hosts may use standard NICs or Converged Network Adapters. For standard NICs you need to use the Operating System iSCSI Host Attachment software driver. Converged Network Adapters are able to offload TCP/IP processing and some of them even the iSCSI protocol. These intelligent adapters release CPU cycles for the main host applications. For a complete list of supported software and hardware iSCSI host attachment drivers, please consult the SAN Volume Controller Supported Hardware List, Device Driver, Firmware and Recommended Software Levels V6.2, S1003797 https://www-304.ibm.com/support/docview.wss?uid=ssg1S1003797.
41
7521SAN Fabric20111209.fm
42
7521SVC Cluster.fm
Chapter 3.
43
7521SVC Cluster.fm
44
7521SVC Cluster.fm
45
7521SVC Cluster.fm
pilot projects and yet can grow to manage extremely large storage environments up to 32 PB of virtualized storage.
46
7521SVC Cluster.fm
Table 3-1 Maximum configurations for an I/O Group Objects SAN Volume Controller nodes I/O Groups Volumes per I/O Group Host IDs per I/O Group Maximum number Eight Four 2048 256 (Cisco, Brocade, or McDATA) 64 QLogic 512 (Cisco, Brocade, or McDATA) 128 QLogic 1024 TB Comments Arranged as four I/O Groups Each containing two nodes Includes managed-mode and image-mode volumes A host object may contain both Fibre Channel ports and iSCSI names N/A
Host ports (FC and iSCSI) per I/O Group Metro/Global Mirror volume capacity per I/O Group
There is a per I/O Group limit of 1024 TB on the amount of Primary and Secondary volume address space, which can participate in Metro/Global Mirror relationships. This maximum configuration will consume all 512 MB of bitmap space for the I/O Group and allow no FlashCopy bitmap space. The default is 40 TB which consumes 20MB of bitmap memory. This is a per I/O group limit on the amount of FlashCopy mappings using bitmap space from a given I/O Group. This maximum configuration will consume all 512 MB of bitmap space for the I/O Group and allow no Metro Mirror or Global Mirror bitmap space.The default is 40 TB which consumes 20MB of bitmap memory.
1024 TB
47
7521SVC Cluster.fm
Table 3-2 Maximum SVC clustered system limits Objects SAN Volume Controller nodes MDisks Maximum number Eight 4 096 Comments Arranged as four I/O Groups The maximum number of logical units that can be managed by SVC. This number includes disks that have not been configured into storage pools. Includes managed-mode and image-mode volumes. The maximum requires an 8 node clustered system. Maximum requires an extent size of 8192 MB to be used. A host object may contain both Fibre Channel ports and iSCSI names N/A
8 192
Total storage capacity manageable by SVC Host objects (IDs) per clustered system
32 PB 1 024 (Cisco, Brocade, and McDATA fabrics) 155 CNT 256 QLogic 2048 (Cisco, Brocade, and McDATA fabrics) 310 CNT 512 QLogic
If you exceed one of the current maximum configuration limits for the fully deployed SVC clustered system, you then scale out by adding a new SVC clustered system and distributing the workload to it. Because the current maximum configuration limits can change, use the following link to get a complete table of the current SVC restrictions: https://www-304.ibm.com/support/docview.wss?uid=ssg1S1003799 Splitting an SVC system or having a secondary SVC system provides you with the ability to implement a disaster recovery option in the environment. Having two SVC clustered systems in two locations allows work to continue even if one site is down. With the SVC Advanced Copy functions, you can copy data from the local primary environment to a remote secondary site. The maximum configuration limits apply here as well. Another advantage of having two clustered systems is the option of using the SVC Advanced Copy functions. The licensing is based on: The total amount of storage (in gigabytes) that is virtualized The Metro Mirror and Global Mirror capacity in use (primary and secondary) The FlashCopy source capacity in use In each case, the number of terabytes (TBs) to order for Metro Mirror and Global Mirror is the total number of source TBs and target TBs participating in the copy operations. FlashCopy is licensed so that SVC now counts as the main source in FlashCopy relationships.
48
7521SVC Cluster.fm
49
7521SVC Cluster.fm
Note: This move involves an outage from the host system point of view, because the worldwide port name (WWPN) from the subsystem (SVC I/O Group) does change. You can use the volume managed mode to image mode migration to move workload from one SVC clustered system to the new SVC clustered system. Migrate a volume from managed mode to image mode, reassign the disk (logical unit number (LUN) masking) from your storage subsystem point of view, introduce the disk to your new SVC clustered system, and use the image mode to manage mode migration. We describe this scenario in Chapter 6, Volumes on page 99. Note: This scenario also invokes an outage to your host systems and the I/O to the involved SVC volumes. From a user perspective, the first option (creating an SVC clustered system and start putting workload on it) is the easiest way to expand your system workload. The second is more difficult, involves more steps (replication services), and requires more preparation in advance. The third option (managed to image mode migration) is the choice that involves the longest outage to the host systems, and therefore, we do not prefer this option. It is not very common to reduce the number of I/O groups. It can happen when replacing old nodes with new more powerful ones. It can also occur in a remote partnership when more bandwidth is required on one side and there is spare bandwidth on the other side. Adding or upgrading SVC node hardware. If you have a clustered system of six or fewer nodes of older hardware, and you have purchased new hardware, you can choose to either start a new clustered system for the new hardware or add the new hardware to the old clustered system. Both configurations are supported. While both options are practical, we recommend that you add the new hardware to your existing clustered system. This recommendation is only true if, in the short term, you are not scaling the environment beyond the capabilities of this clustered system. By utilizing the existing clustered system, you maintain the benefit of managing just one clustered system. Also, if you are using mirror copy services to the remote site, you might be able to continue to do so without having to add SVC nodes at the remote site.
Upgrading hardware
You have a couple of choices to upgrade an existing SVC systems hardware. The choices depend on the size of the existing clustered system.
Up to six nodes
If your clustered system has up to six nodes, you have these options available: Add the new hardware to the clustered system, migrate volumes to the new nodes and then retire the older hardware when it is no longer managing any volumes. This method requires a brief outage to the hosts to change the I/O Group for each volume. Swap out one node in each I/O Group at a time and replace it with the new hardware. We recommend that you engage an IBM service support representative (IBM SSR) to help you with this process. You can perform this swap without an outage to the hosts.
50
7521SVC Cluster.fm
Up to eight nodes
If your clustered system has eight nodes, the options are similar: Swap out a node in each I/O Group one at a time and replace it with the new hardware. We recommend that you engage an IBM SSR to help you with this process. You can perform this swap without an outage to the hosts, and you need to swap a node in one I/O Group at a time. Do not change all I/O Groups in a multi-I/O Group clustered system at one time. Move the volumes to another I/O Group so that all volumes are on three of the four I/O Groups. You can then remove the remaining I/O Group with no volumes and add the new hardware to the clustered system. As each pair of new nodes is added, volumes can then be moved to the new nodes, leaving another old I/O Group pair that can be removed. After all the old pairs are removed, the last two new nodes can be added, and if required, volumes can be moved onto them. Unfortunately, this method requires several outages to the host, because volumes are moved between I/O Groups. This method might not be practical unless you need to Implement the new hardware over an extended period of time, and the first option is not practical for your environment.
51
7521SVC Cluster.fm
Furthermore, certain concurrent upgrade paths are only available through an intermediate level. Refer to the following Web page for more information, SAN Volume Controller Concurrent Compatibility and Code Cross Reference (S1001707): https://www-304.ibm.com/support/docview.wss?uid=ssg1S1001707
52
7521Storage Controller.fm
Chapter 4.
Backend storage
In this chapter we describes aspects and characteristics to consider when planning the attachment of a Backend Storage Device to be virtualized by an IBM System Storage SAN Volume Controller (SVC).
53
7521Storage Controller.fm
54
7521Storage Controller.fm
The same information can be obtained from the Controller section when viewing the Storage Subsystem Profile from the Storage Manager GUI, which will list the WWPN and WWNN information for each host port: World-wide port identifier: 20:27:00:80:e5:17:b5:bc World-wide node identifier: 20:06:00:80:e5:17:b5:bc If the controllers are setup with different WWNNs, then you should run the script SameWWN.script that is bundled with the Storage Manager client download file to have it changed. Caution: This procedure is intended for initial configuration of the DS4000/DS5000. The script must not be run in a live environment because all hosts accessing the storage subsystem will be affected by the changes.
55
7521Storage Controller.fm
56
7521Storage Controller.fm
A larger number of disks in an array increases the rebuild time for disk failures, which can have a negative effect on performance. Additionally, more disks in an array increase the probability of having a second drive fail within the same array prior to the rebuild completion of an initial drive failure, which is an inherent exposure to the RAID 5 architecture. Best practice: For the DS4000/DS5000, we recommend array widths of 4+p and 8+p.
Segment size
With direct-attached hosts, considerations are often made to align device data partitions to physical drive boundaries within the storage controller. For the SVC, aligning device data partitions to physical drive boundaries within the storage controller is less critical based on the caching that the SVC provides and based on the fact that there is less variation in its I/O profile, which is used to access back-end disks. For the SVC, the only opportunity for full stride writes occurs with large sequential workloads, and in that case, the larger the segment size is, the better. Larger segment sizes can adversely affect random I/O, however. The SVC and controller cache do a good job of hiding the RAID 5 write penalty for random I/O, and therefore, larger segment sizes can be accommodated. The primary consideration for selecting segment size is to ensure that a single host I/O will fit within a single segment to prevent accessing multiple physical drives. Testing has shown that the best compromise for handling all workloads is to use a segment size of 256 KB. Best practice: We recommend a segment size of 256 KB as the best compromise for all workloads.
57
7521Storage Controller.fm
a. For the newest models (on firmware 7.xx and higher) use 8 KB.
58
7521Storage Controller.fm
A7
Assign
Normal
5 (6+P+S)
S26
R7
146.0
ENT
dscli> lsrank -l Date/Time: Aug 9, 2008 2:23:18 AM CEST IBM DSCLI Version: 5.2.410.299 DS: IBM.2107-75L2321 ID Group State datastate Array RAIDtype extpoolID extpoolnam stgtype exts usedexts ====================================================================================== R0 0 Normal Normal A0 5 P0 extpool0 fb 779 779 R1 1 Normal Normal A1 5 P1 extpool1 fb 779 779 R2 0 Normal Normal A2 5 P2 extpool2 fb 779 779 R3 1 Normal Normal A3 5 P3 extpool3 fb 779 779 R4 1 Normal Normal A4 5 P5 extpool5 fb 779 779 R5 0 Normal Normal A5 5 P4 extpool4 fb 779 779 R6 1 Normal Normal A6 5 P7 extpool7 fb 779 779 R7 0 Normal Normal A7 5 P6 extpool6 fb 779 779
Cache
For the DS8000, you cannot tune the array and cache parameters. The arrays will be either 6+p or 7+p, depending on whether the array site contains a spare and whether the segment size (contiguous amount of data that is written to a single disk) is 256 KB for fixed block volumes. Caching for the DS8000 is done on a 64 KB track boundary.
59
7521Storage Controller.fm
insufficient number of the larger arrays are available to handle access to the higher capacity. In order to avoid this situation, ensure that the smaller capacity arrays do not represent more than 50% of the total number of arrays within the Storage Pool. Best practice: When mixing 6+p arrays and 7+p arrays in the same Storage Pool, avoid having smaller capacity arrays comprise more than 50% of the arrays.
The DS8000 populate Fibre Channel (FC) adapters across two to eight I/O enclosures, depending on configuration. Each I/O enclosure represents a separate hardware domain. Ensure that adapters configured to different SAN networks do not share the same I/O enclosure as part of our goal of keeping redundant SAN networks isolated from each other. Best practices that we recommend: Configure a minimum of eight ports per DS8000. Configure 16 ports per DS8000 when > 48 ranks are presented to the SVC cluster. Configure a maximum of two ports per four port DS8000 adapter. Configure adapters across redundant SAN networks from different I/O enclosures.
60
7521Storage Controller.fm
Example 4-2 The showvolgrp command output dscli> showvolgrp V0 Date/Time: August 3, 2011 3:03:15 PM PDT IBM DSCLI Version: 7.6.10.511 DS: IBM.2107-75L3001 Name SVCCF8 ID V0 Type SCSI Mask Vols 1001 1002 1003 1004 1005 1006 1007 1008 1101 1102 1103 1104 1105 1106 1107 1108
Example 4-3 on page 61 shows lshostconnect output from the DS8000. Here, you can see that all 8 ports of the 2-node cluster are assigned to the same volume group (V0) and, therefore, have been assigned to the same four LUNs.
Example 4-3 The lshostconnect command output dscli> lshostconnect Date/Time: August 3, 2011 3:04:13 PM PDT IBM DSCLI Version: 7.6.10.511 DS: IBM.2107-75L3001 Name ID WWPN HostType Profile portgrp volgrpID ESSIOport =========================================================================================== SVCCF8_N1P1 0000 500507680140BC24 San Volume Controller 0 V0 I0003,I0103 SVCCF8_N1P2 0001 500507680130BC24 San Volume Controller 0 V0 I0003,I0103 SVCCF8_N1P3 0002 500507680110BC24 San Volume Controller 0 V0 I0003,I0103 SVCCF8_N1P4 0003 500507680120BC24 San Volume Controller 0 V0 I0003,I0103 SVCCF8_N2P1 0004 500507680140BB91 San Volume Controller 0 V0 I0003,I0103 SVCCF8_N2P3 0005 500507680110BB91 San Volume Controller 0 V0 I0003,I0103 SVCCF8_N2P2 0006 500507680130BB91 San Volume Controller 0 V0 I0003,I0103 SVCCF8_N2P4 0007 500507680120BB91 San Volume Controller 0 V0 I0003,I0103 dscli>
Additionally, you can see from the lshostconnect output that only the SVC WWPNs are assigned to V0. Important: Data corruption can occur if LUNs are assigned to both SVC nodes and non-SVC nodes, that is, direct-attached hosts. Next, we show you how the SVC sees these LUNs if the zoning is properly configured. The Managed Disk Link Count (mdisk_link_count) represents the total number of MDisks presented to the SVC cluster by that specific controller. Example 4-4 shows the output storage controller general details via SVC command line interface (CLI).
Example 4-4 lscontroller command output IBM_2145:svccf8:admin>svcinfo lscontroller DS8K75L3001 id 1 controller_name DS8K75L3001 WWNN 5005076305FFC74C mdisk_link_count 16
61
7521Storage Controller.fm
max_mdisk_link_count 16 degraded no vendor_id IBM product_id_low 2107900 product_id_high product_revision 3.44 ctrl_s/n 75L3001FFFF allow_quorum yes WWPN 500507630500C74C path_count 16 max_path_count 16 WWPN 500507630508C74C path_count 16 max_path_count 16 IBM_2145:svccf8:admin>
In this case, we can see that the Managed Disk Link Count is 16, which is correct for our example. Example 4-4 on page 61 also shows the storage controller port details. Here, a path_count represents a connection from a single node to a single LUN. Because we have two nodes and sixteen LUNs in this example configuration, we expect to see a total of 32 paths with all paths evenly distributed across the available storage ports. We have validated that this configuration is correct, because we see sixteen paths on one WWPN and sixteen paths on the other WWPN for a total of 32 paths.
WWPN format for DS8000 = 50050763030XXYNNN XX = adapter location within storage controller Y = port number within 4-port adapter NNN = unique identifier for storage controller IO Bay Slot XX IO Bay Slot XX Port Y B1 S1 S2 S4 S5 00 01 03 04 B5 S1 S2 S4 S5 20 21 23 24 P1 0 P2 4 P3 8 B2 S1 S2 S4 S5 08 09 0B 0C B6 S1 S2 S4 S5 28 29 2B 2C P4 C B3 S1 S2 S4 S5 10 11 13 14 B7 S1 S2 S4 S5 30 31 33 34 B4 S1 S2 S4 S5 18 19 1B 1C B8 S1 S2 S4 S5 38 39 3B 3C
62
7521Storage Controller.fm
SVC supports a maximum of 16 ports from any disk system. The IBM XIV System supports from 8 to 24 FC ports, depending on the configuration (from 6 to 15 modules). Table 4-3 indicates port usage for each IBM XIV System configuration.
63
7521Storage Controller.fm
Table 4-3 Number of SVC ports and XIV Modules Number of IBM XIV Modules 6 9 10 11 12 13 14 15 IBM XIV System Modules with FC Ports Module 4,5 Module 4,5,7,8 Module 4,5,7,8 Module 4,5,7,8,9 Module 4,5,7,8,9 Module 4,5,6,7,8,9 Module 4,5,6,7,8,9 Module 4,5,6,7,8,9 Number of FC ports available on IBM XIV 8 16 16 20 20 24 24 24 Ports Used per card on IBM XIV 1 1 1 1 1 1 1 1 Number of SVC ports utilized 4 8 8 10 10 12 12 12
Creating a host object for SVC for an IBM XIV type 2810
Although a single host instance can be created for use in defining and then implementing the SVC, the ideal host definition for use with SVC is to consider each node of the SVC (a minimum of two) an instance of a cluster. When creating the SVC host definition, first select Add Cluster and give the SVC host definition a name. Next, select Add Host and give the first node instance a Name making sure to click the Cluster drop-down box and select the SVC cluster you just created. 64
SAN Volume Controller Best Practices and Performance Guidelines
7521Storage Controller.fm
After these have been added, repeat the steps for each instance of a node in the cluster. From there, right-click a node instance and select Add Port. In Figure 4-3 on page 65, note that four ports per node can be added to ensure the host definition is accurate.
By implementing the SVC as listed above, host management will ultimately be simplified and statistical metrics will be more effective because performance can be determined at the node level instead of the SVC cluster level. For example, after the SVC is successfully configured with the XIV Storage System, if an evaluation of the volume management at the I/O Group level is needed to ensure efficient utilization among the nodes, a comparison of the nodes can achieved using the XIV Storage System statistics.
4.4.3 Restrictions
Here we list restrictions for using the XIV as backend storage for the SVC.
65
7521Storage Controller.fm
Make sure you select one disk drive per enclosure and that each enclosure selected is part of the same chain (when possible). The recommendation when defining V7000 internal storage is to create a 1 by 1 relationship meaning: 1 Storage Pool to 1 MDisk (array) to 1 Volume, then map the volume to the SVC host. Note: The SVC level 6.2 supports V7000 MDisks larger than 2 TB. Since V7000 can have mixed disk drive type, such as SSDs, SAS and Nearline SAS, pay attention when mapping V7000 volume to the SVC Storage Pools (as MDisks), assigning the same disk drive type (array) to the same SVC Storage Pool characteristic. For example, assuming you have 2 V7000 arrays where one (model A) of them is configured as a RAID5 using 300 GB SAS drives, and the other one (model B) is configured as a RAID5 using 2TB Nearline SAS drives, when mapping to the SVC, assign the model A to one specific Storage Pool (model A) and the model B to another specific Storage Pool (model B). Important: Make sure you are using the same extent size value on both sides (V7000 and SVC). We recommend the usage of 256 MB as extent size.
66
7521Storage Controller.fm
when attached to the SVC, because the SVC enforces a WWNN maximum of four per storage controller. Because of this behavior, you must be sure to group the ports if you want to connect more than four target ports to an SVC.
4.9 Using Tivoli Storage Productivity Center to identify storage controller boundaries
It is often desirable to map the virtualization layer to determine which volumes and hosts are utilizing resources for a specific hardware boundary on the storage controller. For example, when a specific hardware component, such as a disk drive, is failing, and the administrator is
Chapter 4. Backend storage
67
7521Storage Controller.fm
interested in performing an application level risk assessment. Information learned from this type of analysis can lead to actions taken to mitigate risks, such as scheduling application downtime, performing volume migrations, and initiating FlashCopy. Tivoli Storage Productivity Center allows the mapping of the virtualization layer to occur quickly, and using Tivoli Storage Productivity Center eliminates mistakes that can be made by using a manual approach. Figure 4-4 on page 68 shows how a failing disk on a storage controller can be mapped to the MDisk that is being used by an SVC cluster. To display this panel, click Physical Disk RAID5 Array Logical Volume MDisk.
Figure 4-5 completes the end-to-end view by mapping the MDisk through the SVC to the attached host. Click MDisk MDGroup VDisk host disk.
68
7521Storage Controller.fm
69
7521Storage Controller.fm
70
Chapter 5.
71
72
A significant consideration when comparing native performance characteristics between storage subsystem types, is the amount of scaling that is required to meet the performance objectives. While lower performing subsystems can typically be scaled to meet performance objectives, the additional hardware that is required lowers the availability characteristics of the SVC cluster. Remember that all storage subsystems possess an inherent failure rate, and therefore, the failure rate of a Storage Pool becomes the failure rate of the storage subsystem times the number of units. Of course, there might be other factors that lead you to select one storage subsystem over another, such as utilizing available resources or a requirement for additional features and functions, like the IBM System z attach capability.
73
aggregate load increases across all available arrays. For example, if you have a total of eight arrays and are striping across all eight arrays, your performance is much better than if you were striping across only four arrays. However, if the eight arrays are divided into two LUNs each and are also included in another Storage Pool, the performance advantage drops as the load of Storage Pool2 approaches that of Storage Pool1, which means that when workload is spread evenly across all Storage Pools, there will be no difference in performance. More arrays in the Storage Pool have more of an effect with lower performing storage controllers. So, for example, we require fewer arrays from a DS8000 than we do from a DS4000 to achieve the same performance objectives. Table 5-1 shows the recommended number of arrays per Storage Pool that is appropriate for general cases. Again, when it comes to performance, there can always be exceptions. Refer to Chapter 10, Backend performance considerations on page 233.
Table 5-1 Recommended number of arrays per Storage Pool Controller type DS4000/DS5000 DS6000/DS8000 IBM Storwise V7000 Arrays per Storage Pool 4 - 24 4 - 12 4 - 12
Table 5-2 provides our recommended guidelines for array provisioning on IBM storage subsystems.
Table 5-2 Array provisioning Controller type IBM System Storage DS4000/DS5000 IBM System Storage DS6000/DS8000 IBM Storwise V7000 LUNs per array 1 1-2 1
The selection of LUN attributes for Storage Pools require the following primary considerations: Selecting array size Selecting LUN size Number of LUNs per array Number of physical disks per array Important: We generally recommend that LUNs are created to use the entire capacity of the array. All LUNs (MDisks) for a Storage Pool creation must have the same performance characteristics. If MDisks of varying performance levels are placed in the same Storage Pool, the performance of the Storage Pool can be reduced to the level of the poorest performing MDisk. Likewise, all LUNs must also possess the same availability characteristics. Remember that the SVC does not provide any Redundant Array of Independent Disks (RAID) capabilities within a Storage Pool. The loss of access to any one of the MDisks within the Storage Pool impacts the entire Storage Pool. However, with the introduction of Volume Mirroring in SVC 4.3, you can protect against the loss of a Storage Pool by mirroring a volume across multiple Storage Pools. Refer to Chapter 6, Volumes on page 99 for more information. We recommend these best practices for LUN selection within a Storage Pool: LUNs are the same type. LUNs are the same RAID level. LUNs are the same RAID width (number of physical disks in array). LUNs have the same availability and fault tolerance characteristics. MDisks created on LUNs with varying performance and availability characteristics must be placed in separate Storage Pools.
75
SVC has a maximum of 511 LUNs that can be presented from the IBM XIV System, and SVC does not currently support dynamically expanding the size of the MDisk. Because the IBM XIV System configuration grows from 6 to 15 modules, use the SVC rebalancing script (refer to 5.7, Restriping (balancing) extents across a Storage Pool on page 81) to restripe volume extents to include new MDisks. For a fully populated rack, with 12 ports, you should create 48 volumes of 1632 GB each. Tip: Always use the largest volumes possible without exceeding 2 TB. The table below shows the number of 1632 GB LUNs created, depending on the XIV capacity:
Table 5-3 Values using 1632 GB LUNs Number of LUNs (MDisks) at 1632 GB each 16 26 30 33 37 40 44 48 IBM XIV System TB used 26.1 42.4 48.9 53.9 60.4 65.3 71.8 78.3 IBM XIV System TB Capacity Available 27 43 50 54 61 66 73 79
The best use of the SVC virtualization solution with the XIV Storage System can be achieved by executing LUN allocation using these basic parameters: Allocate all LUNs, known to the SVC as MDisks, to one Storage Pool. If multiple IBM XIV Storage Systems are being managed by SVC, there should be a separate Storage Pool for each physical IBM XIV System. This design provides a good queue depth on the SVC to drive XIV adequately. Use 1 GB or larger extent sizes because this large extent size ensures that data is striped across all XIV Storage System drives.
76
Important: Do not assign internal SVC SSD drives as a quorum disk. Even when there is only a single storage subsystem, but multiple Storage Pools created from this, the quorum disk must be allocated from several Storage Pools to avoid an array failure causing the loss of the quorum. Reallocating quorum disks can be done from either the SVC GUI or from the SVC command line interface (CLI). To list SVC cluster Quorum MDisks and view their number and status, issue the svcinfo lsquorum command as shown in Example 5-1.
Example 5-1 lsquorum command
IBM_2145:ITSO-CLS4:admin>svcinfo lsquorum quorum_index status id name controller_id 0 online 0 mdisk0 0 1 online 1 mdisk1 0 2 online 2 mdisk2 0
controller_name active object_type ITSO-4700 yes mdisk ITSO-4700 no mdisk ITSO-4700 no mdisk
To move one of your SVC Quorum MDisks from one MDisk to another, or from one storage subsystem to another, use the svctask chquorum command as shown in Example 5-2.
Example 5-2 chquorum command
IBM_2145:ITSO-CLS4:admin>svctask chquorum -mdisk 9 2 IBM_2145:ITSO-CLS4:admin>svcinfo lsquorum quorum_index status id name controller_id 0 online 0 mdisk0 0 1 online 1 mdisk1 0 2 online 2 mdisk9 1
controller_name active object_type ITSO-4700 yes mdisk ITSO-4700 no mdisk ITSO-XIV no mdisk
As you can see in Example 5-2, the quorum index 2 has been moved from MDisk2 on ITSO-4700 controller to MDisk9 on ITSO-XIV controller. Note: Although the command setquorum (deprecated) still works, we recommend to use the chquorum command to change the quorum association. The cluster uses the quorum disk for two purposes: as a tie breaker in the event of a SAN fault, when exactly half of the nodes that were previously members of the cluster are present; and to hold a copy of important cluster configuration data. There is only one active quorum disk in a cluster; however, the cluster uses three MDisks as quorum disk candidates. The cluster automatically selects the actual active quorum disk from the pool of assigned quorum disk candidates. If a tiebreaker condition occurs, then the one-half portion of the cluster nodes that is able to reserve the quorum disk after the split has occurred locks the disk and continues to operate. The other half stops its operation. This design prevents both sides from becoming inconsistent with each other.
77
Note: To be considered eligible as a quorum disk, these criteria must be followed: A MDisk must be presented by a disk subsystem that is supported to provide SVC quorum disks. The controller has been manually allowed to be a quorum disk candidate using the svctask chcontroller -allowquorum yes command. A MDisk must be in managed mode (no image mode disks). A MDisk must have sufficient free extents to hold the cluster state information, plus the stored configuration metadata. A MDisk must be visible to all of the nodes in the cluster. There are special considerations concerning the placement of the active quorum disk for a stretched or split cluster and split I/O Group configurations. Details are available at this website: http://www-01.ibm.com/support/docview.wss?rs=591&uid=ssg1S1003311 Note: Running an SVC cluster without a quorum disk can seriously affect your operation. A lack of available quorum disks for storing metadata will prevent any migration operation (including a forced MDisk delete). Mirrored volumes can be taken offline if there is no quorum disk available. This behavior occurs because synchronization status for mirrored volumes is recorded on the quorum disk. During the normal operation of the cluster, the nodes communicate with each other. If a node is idle for a few seconds, a heartbeat signal is sent to ensure connectivity with the cluster. If a node fails for any reason, the workload that is intended for it is taken over by another node until the failed node has been restarted and readmitted to the cluster (which happens automatically). If the microcode on a node becomes corrupted, resulting in a failure, the workload is transferred to another node. The code on the failed node is repaired, and the node is readmitted to the cluster (again, all automatically). The number of extents required depends on the extent size for the Storage Pool containing the MDisk. Table 5-4 provides the number of extents reserved for quorum use by extent size.
Table 5-4 Number of extents reserved by extent size Extent size (MB) 16 32 64 128 256 512 1024 2048 4096 8192 Number of extents reserved for quorum use. 17 9 5 3 2 1 1 1 1 1
78
79
80
Note that when multiple tiers of storage exist on the same SVC cluster, you might also want to indicate the storage tier in the name as well. For example, you can use R5 and R10 to differentiate RAID levels or you can use T1, T2, and so on to indicate defined tiers. Best practice: Use a naming convention for MDisks that associates the MDisk with its corresponding controller and array within the controller, for example, DS8K_R5_12345.
81
The lib\IBM\SVC.pm file, which must be copied to the Perl lib directory. With ActivePerl installed in C:\Perl, copy it to C:\Perl\lib\IBM\SVC.pm. The examples\balance\balance.pl file, which is the rebalancing script.
IBM_2145:itsosvccl1:admin>lsmdisk -filtervalue "mdisk_grp_name=itso_ds45_18gb" id name status mode mdisk_grp_id mdisk_grp_name capacity ctrl_LUN_# controller_name UID 0 mdisk0 online managed 1 itso_ds45_18gb 18.0GB 0000000000000000 itso_ds4500 600a0b80001744310000011a4888478c00000000000000000000000000000000 1 mdisk1 online managed 1 itso_ds45_18gb 18.0GB 0000000000000001 itso_ds4500 600a0b8000174431000001194888477800000000000000000000000000000000 2 mdisk2 online managed 1 itso_ds45_18gb 18.0GB 0000000000000002 itso_ds4500 600a0b8000174431000001184888475800000000000000000000000000000000 3 mdisk3 online managed 1 itso_ds45_18gb 18.0GB 0000000000000003 itso_ds4500 600a0b8000174431000001174888473e00000000000000000000000000000000 4 mdisk4 online managed 1 itso_ds45_18gb 18.0GB 0000000000000004 itso_ds4500 600a0b8000174431000001164888472600000000000000000000000000000000 5 mdisk5 online managed 1 itso_ds45_18gb 18.0GB 0000000000000005 itso_ds4500 600a0b8000174431000001154888470c00000000000000000000000000000000 6 mdisk6 online managed 1 itso_ds45_18gb 18.0GB 0000000000000006 itso_ds4500 600a0b800017443100000114488846ec00000000000000000000000000000000 7 mdisk7 online managed 1 itso_ds45_18gb 18.0GB 0000000000000007 itso_ds4500 600a0b800017443100000113488846c000000000000000000000000000000000 IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent mdisk0 id number_of_extents copy_id 0 64 0 2 64 0 1 64 0 4 64 0 IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent mdisk1 id number_of_extents copy_id 0 64 0 2 64 0 1 64 0 4 64 0 IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent mdisk2 id number_of_extents copy_id 82
SAN Volume Controller Best Practices and Performance Guidelines
0 64 0 2 64 0 1 64 0 4 64 0 IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent id number_of_extents copy_id 0 64 0 2 64 0 1 64 0 4 64 0 IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent
mdisk3
The balance.pl script was then run on the Master Console using the command: C:\SVCTools\examples\balance>perl balance.pl itso_ds45_18gb -k "c:\icat.ppk" -i 9.43.86.117 -r -e In this command: itso_ds45_18gb is the Storage Pool to be rebalanced. -k "c:\icat.ppk" gives the location of the PuTTY private key file, which is authorized for administrator access to the SVC cluster. -i 9.43.86.117 gives the IP address of the cluster. -r requires that the optimal solution is found. If this option is not specified, the extents can still be somewhat unevenly spread at completion, but not specifying -r will often require fewer migration commands and less time. If time is important, it might be preferable to not use -r at first, and then rerun the command with -r if the solution is not good enough. -e specifies that the script will actually run the extent migration commands. Without this option, it will merely print the commands that it might have run. This option can be used to check that the series of steps is logical before committing to migration. In this example, with 4 x 8 GB volumes, the migration completed within around 15 minutes. You can use the command svcinfo lsmigrate to monitor progress; this command shows a percentage for each extent migration command issued by the script. After the script had completed, we checked that the extents had been correctly rebalanced. Example 5-4 shows that the extents had been correctly rebalanced. In a test run of 40 minutes of I/O (25% random, 70/30 R/W) to the four volumes, performance for the balanced Storage Pool was around 20% better than for the unbalanced Storage Pool.
Example 5-4 The lsmdiskextent output showing a balanced Storage Pool
IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent mdisk0 id number_of_extents copy_id 0 32 0 2 32 0 1 32 0 4 32 0 IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent mdisk1 id number_of_extents copy_id 0 32 0 2 32 0 1 32 0
Chapter 5. Storage pools and Managed Disks
83
4 31 0 IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent id number_of_extents copy_id 0 32 0 2 32 0 1 32 0 4 32 0 IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent id number_of_extents copy_id 0 32 0 2 32 0 1 32 0 4 32 0 IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent id number_of_extents copy_id 0 32 0 2 32 0 1 32 0 4 33 0 IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent id number_of_extents copy_id 0 32 0 2 32 0 1 32 0 4 32 0 IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent id number_of_extents copy_id 0 32 0 2 32 0 1 32 0 4 32 0 IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent id number_of_extents copy_id 0 32 0 2 32 0 1 32 0 4 32 0
mdisk2
mdisk3
mdisk4
mdisk5
mdisk6
mdisk7
84
IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent mdisk14 id number_of_extents copy_id 5 16 0 3 16 0 6 16 0 8 13 1 9 23 0 8 25 0 Specify the -force flag on the svctask rmmdisk command, or check the corresponding checkbox in the GUI. Either action causes the SVC to automatically move all used extents on the MDisk to the remaining MDisks in the Storage Pool. Alternatively, you might want to manually perform the extent migrations, otherwise, the automatic migration will randomly allocate extents to MDisks (and areas of MDisks). After all extents have been manually migrated, the MDisk removal can proceed without the -force flag.
85
the Controller LUN Number, you must check that you are managing the correct storage controller and check that you are looking at the mappings for the correct SVC host object.
Tip: Having your back-end storage controllers renamed as recommended, will help you also on MDisks identification.
For details on how to correlate back-end volumes (LUNs) to MDisks, refer to the next section 5.8.3, LUNs to MDisk translation
DS4000
The DS4000 volumes should be identified using the Logical Drive ID along with the LUN Number associated with the host mapping. For the following example we will refer to these values: Logical Drive ID = 600a0b80001744310000c60b4e2eb524 LUN Number = 3 To identify the Logical Drive ID using the Storage Manager Software, right click on a volume and go to the Properties option. Refer to Figure 5-1 on page 87 as an example.
86
To identify your LUN Number, go to the Mappings View, select your SVC Host Group then look at the LUN column on the right side. Refer to Figure 5-2 as an example.
In order to correlate the above LUN with your correspondent MDisk, look at the MDisk details and check the UID field. The first 32 bits (600a0b80001744310000c60b4e2eb524) of the MDisk UID field should be exactly the same as your DS4000 Logical Drive ID. Then make sure that the associated DS4000 LUN Number is correlating with the SVC ctrl_LUN_#, for this task convert your DS4000 LUN Number in Hexdecimal and check the latest 2 bits on the SVC ctrl_LUN_# field. In our example on Figure 5-3 on page 88, its 0000000000000003. Note: The command line interface (CLI) references the Controller LUN Number as ctrl_LUN_#, and the Graphical User Interface (GUI) reference as LUN.
87
DS8000
The LUN ID will only uniquely identify LUNs within the same storage controller. If multiple storage devices are attached to the same SVC cluster, the LUN ID needs to be combined with the WWNN attribute in order to uniquely identify LUNs within the SVC cluster. To get the world wide node name (WWNN) of the DS8000 controller take the first 16 digits of the MDisk uid, and change the first digit from 6 to 5. As an example, from 5005076305ffc74c to 6005076305ffc74c The DS8000 LUN when viewed as SVC ctrl_LUN_# is decodes as: 40XX40YY00000000 - where XX is the LSS (Logical Subsystem) and YY is the LUN within the LSS. The LUN ID as seen by the DS8000 is the 4 digits starting from the 29th. 6005076305ffc74c000000000000100700000000000000000000000000000000 Figure 5-4 on page 89 shows LUN ID fields that are displayed from the DS8000 Storage Manager.
88
From the MDisk details panel in Figure 5-5, the Controller LUN Number field is 4010400700000000, which translates to LUN ID 0x1007 (represented in Hex).
We can also identify the storage controller from the Storage Subsystem field as DS8K75L3001, which had been manually assigned.
89
IBM XIV
The XIV volumes should be identified using the volume Serial Number along with the LUN Number associated with the host mapping. For the following example we will reference to these values: Serial Number = 897 LUN Number = 2 To identify the volume Serial Number, right click on a volume and go to the Properties option. Refer to Figure 5-6 as an example.
To identify your LUN Number, go to the Volumes by Hosts view, expand your SVC Host Group then refer to the LUN column. Refer to Figure 5-7 on page 91 as an example.
90
The MDisk UID field is composed by part of the controller WWNN from bits 2 to 13. You might check those bits with the svcinfo lscontroller command as showed on Example 5-6 below.
Example 5-6 lscontroller command
IBM_2145:tpcsvc62:admin>svcinfo lscontroller 10 id 10 controller_name controller10 WWNN 5001738002860000 ... The correlation can now be performed by taking the first 16 bits from the MDisk UID field. The bits from 1 to 13 are referent to the controller WWNN as showed above. The bits 14 to 16 are the XIV volume Serial Number (897) in Hexdecimal format (that results in 381 Hex). See translation details below: 0017380002860381000000000000000000000000000000000000000000000000 0017380002860 = controller WWNN (bits 2 to 13) 381 = XIV volume Serial Number converted in Hex In order to correlate the SVC ctrl_LUN_#, take the XIV Volume Number and convert in Hexdecimal format, then check the latest 3 bits from the SVC ctrl_LUN_#. In our example its 0000000000000002 as showed on Figure 5-8 on page 92.
91
V7000
The IBM Storwize V7000 solution is built upon the IBM SAN Volume Controller (SVC) technology base and use similar terminology, so on the first time correlating V7000 Volumes with SVC MDisks can be confused. Looking at the V7000 side first, you will have to check the Volume UID that was presented to the SVC Host. Refer to Figure 5-9 on page 93 as an example.
92
Right after, check the SCSI ID number for that specific volume on the Host Maps tab. This value will be used to match the SVC ctrl_LUN_# (in Hexdecimal format). Refer to Figure 5-10 as an example.
93
On the SVC side, look at the MDisk details and compare the MDisk UID field with the V7000 Volume UID, they should be exactly the same (the first 32 bits). Refer to Figure 5-11 as an example.
Then, double check that the SVC ctrl_LUN_# is the V7000 SCSI ID number in Hexdecimal format. In our example its 0000000000000004
page 87 for an example of the Logical Drive Properties panel for a DS4000 logical drive. This panel shows Logical Drive ID (UID) and SSID.
To change extent allocation so that each extent alternates between even and odd extent pools, the MDisks can be removed from the Storage Pool then re added to the Storage Pool in their new order. Table 5-6 on page 96 shows how the MDisks have been re added to the Storage Pool in their new order, so the extent allocation will alternate between even and odd extent pools.
95
Table 5-6 MDisks re added LUN ID 1000 1100 1001 1101 1002 1102 MDisk ID 1 4 2 5 3 6 MDisk name mdisk01 mdisk04 mdisk02 mdisk05 mdisk03 mdisk06 Controller resource DA pair/extent pool DA2/P0 DA0/P9 DA6/P16 DA4/P23 DA7/P30 DA5/P39
There are two options available for volume creation. We describe both options along with the differences between the two options: Option A: Explicitly select the candidate MDisks within the Storage Pool that will be used (through the command line interface - CLI - only). Note that when explicitly selecting the MDisk list, the extent allocation will round-robin across MDisks in the order that they are represented on the list starting with the first MDisk on the list: Example A1: Creating a volume with MDisks from the explicit candidate list order: md001, md002, md003, md004, md005, and md006. The volume extent allocations then begin at md001 and alternate round-robin around the explicit MDisk candidate list. In this case, the volume is distributed in the following order: md001, md002, md003, md004, md005, and md006. Example A2: Creating a volume with MDisks from the explicit candidate list order: md003, md001, md002, md005, md006, and md004. The volume extent allocations then begin at md003 and alternate round-robin around the explicit MDisk candidate list. In this case, the volume is distributed in the following order: md003, md001, md002, md005, md006, and md004. Option B: Do not explicitly select the candidate MDisks within a Storage Pool that will be used (through the command line interface (CLI) or GUI). Note that when the MDisk list is not explicitly defined, the extents will be allocated across MDisks in the order that they were added to the Storage Pool, and the MDisk that will receive the first extent will be randomly selected. Example B1: Creating a volume with MDisks from the candidate list order (based on this definitive list from the order that the MDisks were added to the Storage Pool: md001, md002, md003, md004, md005, and md006. The volume extent allocations then begin at a random MDisk starting point (let us assume md003 is randomly selected) and alternate round-robin around the explicit MDisk candidate list based on the order that they were added to the Storage Pool originally. In this case, the volume is allocated in the following order: md003, md004, md005, md006, md001, and md002. Be advised that when creating Striped volumes specifying the MDisks order (if not well planned) you might have the first extent for several volumes in only one MDisk, what can lead to poor performance for workloads that place a large I/O load on the first extent of each volume, or that create multiple sequential streams. Recommendation: In a daily basis administration, create the Striped volumes without specifying the MDisks order.
96
3. Check by using svcinfo lsvdisk that the volume is no longer displayed. You must wait until it is removed to allow cached data to destage to disk. 4. Change the back-end storage LUN mappings to prevent the source SVC cluster from seeing the disk, and then make it available to the target cluster. 5. Perform an svctask detectmdisk command on the target cluster. 6. Import the MDisk to the target cluster. If it is not a Thin-provisioned volume, you will use the svctask mkvdisk command with the -image option. If it is a Thin-provisioned volume, you will also need to use two other options: -import instructs the SVC to look for thin volume metadata on the specified MDisk. -rsize indicates that the disk is Thin-provisioned. The value given to -rsize must be at least the amount of space that the source cluster used on the Thin-provisioned volume. If it is smaller, a 1862 error will be logged. In this case, delete the volume and try the svctask mkvdisk command again. The volume is now online. If it is not, and the volume is Thin-provisioned, check the SVC error log for an 1862 error; if an 1862 error is present, it will indicate why the volume import failed (for example, metadata corruption). You might then be able to use the repairsevdiskcopy command to correct the problem.
97
98
7521VDisks.fm
Chapter 6.
Volumes
In this chapter, we discuss Volumes (formerly VDisks) and the usage of Flashcopy. We describe creating them, managing them, and migrating them across I/O Groups.
99
7521VDisks.fm
100
7521VDisks.fm
Note: We strongly recommend that you have warning threshold enabled (via email or SNMP trap) when working with Thin-provisioned volumes - on the volume and on the Storage Pool side - specially when you do not use the Auto-Expand mode. Otherwise, the thin volume will go offline in case of runs out of space. Autoexpand will not cause the real capacity to grow much beyond the virtual capacity. The real capacity can be manually expanded to more than the maximum that is required by the current virtual capacity, and the contingency capacity will be recalculated. A thin-provisioned volume can be converted nondisruptively to a fully allocated volume, or vice versa, by using the volume mirroring function. For example, you can add a thin-provisioned copy to a fully allocated primary volume and then remove the fully allocated copy from the volume after they are synchronized. The fully allocated to thin-provisioned migration procedure uses a zero-detection algorithm so that grains containing all zeros do not cause any real capacity to be used. Tip: Consider using thin-provisioned volumes as targets in Flash Copy relationships.
Chapter 6. Volumes
101
7521VDisks.fm
Thin-provisioned volumes only save capacity if the host server does not write to the whole volumes. Whether the thin-provisioned volume works well is partly dependent on how the filesystem allocated the space: Certain filesystems (for example, NTFS (NT File System)) will write to the whole volume before overwriting deleted files, while other filesystems will reuse space in preference to allocating new space. Filesystem problems can be moderated by tools, such as defrag or by managing storage using host Logical Volume Managers (LVMs). The thin-provisioned volume is also dependent on how applications use the filesystem, for example, certain applications only delete log files when the filesystem is nearly full. Note: There is no recommendation for thin-provisioned volume and best performance or practice. As already explained, it depends on what is used in the particular environment. For the absolute best performance, use fully allocated volumes instead of a thin provisioned volume. For more considerations on performace, refer to Part 1, Performance best practices on page 225.
102
7521VDisks.fm
Table 6-2 Maximum thin volume virtual capacities for given grain size Grain size, KB 32 64 128 256 Max thin virtual capacity, GB 260,000 520,000 1,040,000 2,080,000
103
7521VDisks.fm
This might be helpful in these cases: If it is known that the already formatted MDisk space will be used for mirrored volumes. If it is just not required, that the copies are synchronized.
104
7521VDisks.fm
Note: Migrating volumes across I/O Groups is a disruptive action. Therefore, it is best to specify the correct I/O Group at the time of volume creation. By default, the preferred node, which owns a volume within an I/O Group, is selected on a load balancing basis. At the time of volume creation, the workload to be put on the volume might not be known. But it is important to distribute the workload evenly on the SVC nodes within an I/O Group. The preferred node cannot easily be changed. If you need to change the preferred node, refer to 6.3.2, Changing the preferred node within an I/O Group on page 106. The maximum number of volumes per I/O Group is 2048. The maximum number of volumes per cluster is 8192 (eight node cluster). The smaller the extent size that you select, the finer the granularity of the volume of space occupied on the underlying storage controller. A volume occupies an integer number of extents, but its length does not need to be an integer multiple of the extent size. The length does need to be an integer multiple of the block size. Any space left over between the last logical block in the volume and the end of the last extent in the volume is unused. A small extent size is used in order to minimize this unused space. The counter view to this view is that the smaller the extent size, the smaller the total storage volume that the SVC can virtualize. The extent size does not affect performance. For most clients, extent sizes of 128 MB or 256 MB give a reasonable balance between volume granularity and cluster capacity. There is no longer a default value set. Extent size is set during the Managed Disk (MDisk) Group creation. Important: volumes can only be migrated between Storage Pools that have the same extent size, except for mirrored volumes. The two copies can be in different Storage Pools with different extent sizes. As mentioned in the first section of this chapter, a volume can be created as thin-provisioned or fully allocated, in one of these three modes: striped, sequential, or image and with one or two copies (volume mirroring). With extremely few exceptions, you must always configure volumes using striping mode. Note: Electing to use sequential mode over striping requires a detailed understanding of the data layout and workload characteristics in order to avoid negatively impacting the system performance.
Chapter 6. Volumes
105
7521VDisks.fm
As you can see also from the Figure 6-1, changing the preferred node is disruptive to host trafic, so the best practice to perform this operation is: a. Cease I/O operations to the volume. b. Disconnect the volume from the host operating system. For example, in Windows, remove the drive letter. c. On the SVC, unmap the volume from the host. d. On the SVC, change the preferred node. e. On the SVC, remap the volume to the host. f. Rediscover the volume on the host. g. Resume I/O operations on the host.
106
7521VDisks.fm
When migrating a volume between I/O Groups, you have the ability to specify the preferred node, if desired, or you can let SVC assign the preferred node. Ensure that when you migrate a volume to a new I/O Group, you quiesce all I/O operations for the volume. Determine the hosts that use this volume and make sure its properly zoned to the target SVC I/O group. Stop or delete any FlashCopy mappings or Metro/Global Mirror relationships that use this volume. To check if the volume is part of a relationship or mapping, issue the svcinfo lsvdisk command that is shown in Example 6-1 where vdiskname/id is the name or ID of the volume.
Example 6-1 Output of lsvdisk command
IBM_2145:svccf8:admin>svcinfo lsvdisk TEST_1 id 2 name TEST_1 IO_group_id 0 IO_group_name io_grp0 status online mdisk_grp_id many mdisk_grp_name many capacity 1.00GB type many formatted no mdisk_id many mdisk_name many FC_id FC_name RC_id RC_name vdisk_UID 60050768018205E12000000000000002 ... Look for the FC_id and RC_id fields. If these fields are not blank, the volume is part of a mapping or a relationship. The procedure is: 1. Cease I/O operations to the volume. 2. Disconnect the volume from the host operating system. For example, in Windows, remove the drive letter. 3. Stop any copy operations. 4. Issue the command to move the volume (refer to Example 6-2). This command does not work while there is data in the SVC cache that is to be written to the volume. After two minutes, the data automatically destages if no other condition forces an earlier destaging. 5. On the host, rediscover the volume. For example in Windows, run a rescan, then either mount the volume or add a drive letter. Refer to Chapter 8, Hosts on page 191. 6. Resume copy operations as required. 7. Resume I/O operations on the host. After any copy relationships are stopped, you can move the volume across I/O Groups with a single command in an SVC: svctask chvdisk -iogrp newiogrpname/id vdiskname/id
Chapter 6. Volumes
107
7521VDisks.fm
In this command, newiogrpname/id is the name or ID of the I/O Group to which you move the volume and vdiskname/id is the name or ID of the volume. Example 6-2 shows the command to move the volume named TEST_1 from its existing I/O Group, io_grp0, to io_grp1.
Example 6-2 Command to move a volume to another I/O Group
IBM_2145:svccf8:admin>svctask chvdisk -iogrp io_grp1 TEST_1 Migrating volumes between I/O Groups can be a potential issue if the old definitions of the volumes are not removed from the configuration prior to importing the volumes to the host. Migrating volumes between I/O Groups is not a dynamic configuration change. It must be done with the hosts shut down. Then, follow the procedure listed in Chapter 8, Hosts on page 191 for the reconfiguration of SVC volumes to hosts. We recommend that you remove the stale configuration and reboot the host to reconfigure the volumes that are mapped to a host. For details about how to dynamically reconfigure IBM Subsystem Device Driver (SDD) for the specific host operating system, refer to Multipath Subsystem Device Driver: Users Guide, GC52-1309, where this procedure is also described in great depth. Note: Do not move a volume to an offline I/O Group under any circumstances. You must ensure that the I/O Group is online before moving the volumes to avoid any data loss.
This command will not work if there is any data in the SVC cache, which has to be flushed out first. There is a -force flag; however, this flag discards the data in the cache rather than flushing it to the volume. If the command fails due to outstanding I/Os, it is better to wait a couple of minutes after which the SVC will automatically flush the data to the volume. Note: Using the -force flag can result in data integrity issues.
Migrating a volume from one Storage Pool to another is non-disruptive to the host application using the volume. Depending on the workload of the SVC, there might be a slight performance impact. For this reason, we recommend that you migrate a volume from one Storage Pool to another when there is a relatively low load on the SVC.
108
7521VDisks.fm
Rule: For the migration to be acceptable, the source and destination storage pool must have the same extent size. Note that volume mirroring can also be used to migrate a volume between storage pools. This method can be used if the extent sizes of the two pools are not the same. Below we discuss the best practices to follow when you perform volume migrations.
IBM_2145:svccf8:admin>svctask migratevdisk -mdiskgrp MDG1DS4K -threads 4 -vdisk Migrate_sample This command migrates our volume, Migrate_sample, to the Storage Pool, MDG1DS4K, and uses four threads while migrating. Note that instead of using the volume name, you can use its ID number. You can monitor the migration process by using the command svcinfo lsmigrate as showed on Example 6-4.
Example 6-4 Monitoring the migration process
IBM_2145:svccf8:admin>svcinfo lsmigrate migrate_type MDisk_Group_Migration progress 0 migrate_source_vdisk_index 3 migrate_target_mdisk_grp 2 max_thread_count 4 migrate_source_vdisk_copy_id 0 IBM_2145:svccf8:admin>
109
7521VDisks.fm
In order to migrate a striped type volume to an image type volume, you must be able to migrate to an available unmanaged MDisk. The destination MDisk must be greater than or equal to the size of the volume you want to migrate. Regardless of the mode in which the volume starts, it is reported as managed mode during the migration. Both of the MDisks involved are reported as being in image mode during the migration. If the migration is interrupted by a cluster recovery, the migration will resume after the recovery completes. You must perform these command line steps: 1. To determine the name of the volume to be moved, issue the command: svcinfo lsvdisk The output is in the form that is shown in Example 6-5.
Example 6-5 The lsvdisk output
IBM_2145:svccf8:admin>svcinfo lsvdisk -delim : id:name:IO_group_id:IO_group_name:status:mdisk_grp_id:mdisk_grp_name:capacity:t ype:FC_id:FC_name:RC_id:RC_name:vdisk_UID:fc_map_count:copy_count:fast_write_st ate:se_copy_count 0:NYBIXTDB02_T03:0:io_grp0:online:3:MDG4DS8KL3331:20.00GB:striped:::::600507680 18205E12000000000000000:0:1:empty:0 1:NYBIXTDB02_2:0:io_grp0:online:0:MDG1DS8KL3001:5.00GB:striped:::::600507680182 05E12000000000000007:0:1:empty:0 2:TEST_1:0:io_grp0:online:many:many:1.00GB:many:::::60050768018205E120000000000 00002:0:2:empty:0 3:Migrate_sample:0:io_grp0:online:2:MDG1DS4K:2.00GB:striped:::::60050768018205E 12000000000000012:0:1:empty:0 2. In order to migrate the volume, you need the name of the MDisk to which you will migrate it. Example 6-6 shows the command that you use.
Example 6-6 The lsmdisk command output
IBM_2145:svccf8:admin>lsmdisk -delim : id:name:status:mode:mdisk_grp_id:mdisk_grp_name:capacity:ctrl_LUN_#:controller_ name:UID:tier 0:D4K_ST1S12_LUN1:online:managed:2:MDG1DS4K:20.0GB:0000000000000000:DS4K:600a0b 8000174233000071894e2eccaf00000000000000000000000000000000:generic_hdd 1:mdisk0:online:array:3:MDG4DS8KL3331:136.2GB::::generic_ssd 2:D8K_L3001_1001:online:managed:0:MDG1DS8KL3001:20.0GB:4010400100000000:DS8K75L 3001:6005076305ffc74c000000000000100100000000000000000000000000000000:generic_h dd ... 33:D8K_L3331_1108:online:unmanaged:::20.0GB:4011400800000000:DS8K75L3331:600507 6305ffc747000000000000110800000000000000000000000000000000:generic_hdd 34:D4K_ST1S12_LUN2:online:managed:2:MDG1DS4K:20.0GB:0000000000000001:DS4K:600a0 b80001744310000c6094e2eb4e400000000000000000000000000000000:generic_hdd From this command, we can see that D8K_L3331_1108 is candidate for the image type migration, because it is unmanaged. 3. We now have enough information to enter the command to migrate the volume to image type, and you can see the command in Example 6-7 on page 111.
110
7521VDisks.fm
IBM_2145:svccf8:admin>svctask migratetoimage -vdisk Migrate_sample -threads 4 -mdisk D8K_L3331_1108 -mdiskgrp IMAGE_Test 4. If there is no unmanaged MDisk to which to migrate, you can remove an MDisk from an Storage Pool. However, you can only remove an MDisk from an Storage Pool if there are enough free extents on the remaining MDisks in the group to migrate any used extents on the MDisk that you are removing.
By default, the SVC assigns ownership of even-numbered volumes to one node of a caching pair and the ownership of odd-numbered volumes to the other node. It is possible for the ownership distribution in a caching pair to become unbalanced if volume sizes are
Chapter 6. Volumes
111
7521VDisks.fm
significantly different between the nodes or if the volume numbers assigned to the caching pair are predominantly even or odd. To provide flexibility in making plans to avoid this problem, the ownership for a specific volume can be explicitly assigned to a specific node when the volume is created. A node that is explicitly assigned as an owner of a volume is known as the preferred node. Because it is expected that hosts will access volumes through the preferred nodes, those nodes can become overloaded. When a node becomes overloaded, volumes can be moved to other I/O Groups, because the ownership of a volume cannot be changed after the volume is created. We described this situation in 6.3.3, Moving a volume to another I/O Group on page 106. SDD is aware of the preferred paths that SVC sets per volume. SDD uses a load balancing and optimizing algorithm when failing over paths; that is, it tries the next known preferred path. If this effort fails and all preferred paths have been tried, it load balances on the non-preferred paths until it finds an available path. If all paths are unavailable, the volume goes offline. It can take time, therefore, to perform path failover when multiple paths go offline. SDD also performs load balancing across the preferred paths where appropriate.
IBM_2145:svccf8:admin>svcinfo lsvdisk TEST_1 id 2 name TEST_1 IO_group_id 0 IO_group_name io_grp0 status online mdisk_grp_id many mdisk_grp_name many capacity 1.00GB type many
112
7521VDisks.fm
formatted no mdisk_id many mdisk_name many FC_id FC_name RC_id RC_name vdisk_UID 60050768018205E12000000000000002 throttling 0 preferred_node_id 2 fast_write_state empty cache readwrite ... The throttle setting of zero indicates that no throttling has been set. Having checked the volume, you can then run the chvdisk command. To just modify the throttle setting, we run: svctask chvdisk -rate 40 -unitmb TEST_1 Running the lsvdisk command now gives us the output that is shown in Example 6-9.
Example 6-9 Output of lsvdisk command
IBM_2145:svccf8:admin>svcinfo lsvdisk TEST_1 id 2 name TEST_1 IO_group_id 0 IO_group_name io_grp0 status online mdisk_grp_id many mdisk_grp_name many capacity 1.00GB type many formatted no mdisk_id many mdisk_name many FC_id FC_name RC_id RC_name vdisk_UID 60050768018205E12000000000000002 virtual_disk_throttling (MB) 40 preferred_node_id 2 fast_write_state empty cache readwrite ... This example shows that the throttle setting (virtual_disk_throttling) is 40 MBps on this volume. If we had set the throttle setting to an I/O rate by using the I/O parameter, which is the default setting, we do not use the -unitmb flag: svctask chvdisk -rate 2048 TEST_1 You can see in Example 6-10 that the throttle setting has no unit parameter, which means that it is an I/O rate setting.
Chapter 6. Volumes
113
7521VDisks.fm
IBM_2145:svccf8:admin>svctask chvdisk -rate 2048 TEST_1 IBM_2145:svccf8:admin>svcinfo lsvdisk TEST_1 id 2 name TEST_1 IO_group_id 0 IO_group_name io_grp0 status online mdisk_grp_id many mdisk_grp_name many capacity 1.00GB type many formatted no mdisk_id many mdisk_name many FC_id FC_name RC_id RC_name vdisk_UID 60050768018205E12000000000000002 throttling 2048 preferred_node_id 2 fast_write_state empty cache readwrite ... Note: An I/O governing rate of 0 (displayed as virtual_disk_throttling in the CLI output of the lsvdisk command) does not mean that zero IOPS (or MBs per second) can be achieved. It means that no throttle is set.
114
7521VDisks.fm
rather than through the SVC. You can use the SVC copy services with the image mode volume representing the primary site of the controller remote copy relationship. It does not make sense to use SVC copy services with the volume at the secondary site, because the SVC does not see the data flowing to this LUN through the controller . Figure 6-2 shows the relationships between the SVC, the volume, and the underlying storage controller for a cache-disabled volume.
6.6.2 Using underlying controller flash copy with SVC cache disabled volumes
Where Flash Copy is used in the underlying storage controller, the controller LUNs for both the source and the target must be mapped through the SVC as image mode disks with the SVC cache disabled as shown in Figure 6-3 on page 116. Note that, of course, it is possible to access either the source or the target of the Flash Copy from a host directly rather than through the SVC.
Chapter 6. Volumes
115
7521VDisks.fm
IBM_2145:svccf8:admin>svctask mkvdisk -name VDISK_IMAGE_1 -iogrp 0 -mdiskgrp IMAGE_Test -vtype image -mdisk D8K_L3331_1108 Virtual Disk, id [9], successfully created IBM_2145:svccf8:admin>svcinfo lsvdisk VDISK_IMAGE_1 id 9 name VDISK_IMAGE_1 IO_group_id 0 IO_group_name io_grp0 status online mdisk_grp_id 5 mdisk_grp_name IMAGE_Test capacity 20.00GB 116
SAN Volume Controller Best Practices and Performance Guidelines
7521VDisks.fm
type image formatted no mdisk_id 33 mdisk_name D8K_L3331_1108 FC_id FC_name RC_id RC_name vdisk_UID 60050768018205E12000000000000014 throttling 0 preferred_node_id 1 fast_write_state empty cache readwrite udid fc_map_count 0 sync_rate 50 copy_count 1 se_copy_count 0 ... IBM_2145:svccf8:admin>svctask chvdisk -cache none VDISK_IMAGE_1 IBM_2145:svccf8:admin>svcinfo lsvdisk VDISK_IMAGE_1 id 9 name VDISK_IMAGE_1 IO_group_id 0 IO_group_name io_grp0 status online mdisk_grp_id 5 mdisk_grp_name IMAGE_Test capacity 20.00GB type image formatted no mdisk_id 33 mdisk_name D8K_L3331_1108 FC_id FC_name RC_id RC_name vdisk_UID 60050768018205E12000000000000014 throttling 0 preferred_node_id 1 fast_write_state empty cache none udid fc_map_count 0 sync_rate 50 copy_count 1 se_copy_count 0 ... Note: By default, the volumes are created with the cache mode enabled (readwrite), but you can specify the cache mode during the volume creation using the -cache option.
Chapter 6. Volumes
117
7521VDisks.fm
118
7521VDisks.fm
Thus, to calculate the average I/O per volume before overloading the Storage Pool, use this formula: I/O rate = (I/O Capability) / (No volumes + Weighting Factor) So, using the example Storage Pool as defined previously, if we added 20 volumes to the Storage Pool and that Storage Pool was able to sustain 5250 IOPS, and there were two FlashCopy mappings that also have random reads and writes, the maximum I/O per volumes is: 5250 / (20 + 28) = 110 Note that this is an average I/O rate, so if half of the volumes sustain 200 I/Os and the other half of the volumes sustain 10 I/Os, the average is still 110 IOPS.
Conclusion
As you can see from the previous examples, Tivoli Storage Productivity Center is an extremely useful and powerful tool for analyzing and solving performance problems. If you want a single parameter to monitor to gain an overview of your systems performance, it is the read and write response times for both volumes and MDisks. This parameter shows everything that you need in one view. It is the key day-to-day performance validation metric. It is relatively easy to notice that a system that usually had 2 ms writes and 6 ms reads suddenly has 10 ms writes and 12 ms reads and is getting overloaded. A general monthly check of CPU usage will show you how the system is growing over time and highlight when it is time to add a new I/O Group (or cluster). In addition, there are useful rules for OLTP-type workloads, such as the maximum I/O rates for back-end storage arrays, but for batch workloads, it really is a case of it depends.
it is pointless if the operating system, or more importantly, the application, cannot use the copied disk.
Data stored to a disk from an application normally goes through these steps: 1. The application records the data using its defined application programming interface. Certain applications might first store their data in application memory before sending it to disk at a later time. Normally, subsequent reads of the block just being written will get the block in memory if it is still there. 2. The application sends the data to a file. The file system accepting the data might buffer it in memory for a period of time. 3. The file system will send the I/O to a disk controller after a defined period of time (or even based on an event). 4. The disk controller might cache its write in memory before sending the data to the physical drive. If the SVC is the disk controller, it will store the write in its internal cache before sending the I/O to the real disk controller. 5. The data is stored on the drive.
Chapter 6. Volumes
119
7521VDisks.fm
At any point in time, there might be any number of unwritten blocks of data in any of these steps, waiting to go to the next step. It is also important to realize that sometimes the order of the data blocks created in step 1 might not be the same order that is used when sending the blocks to steps 2, 3, or 4. So it is possible, that at any point in time, data arriving in step 4 might be missing a vital component that has not yet been sent from step 1, 2, or 3. FlashCopy copies are normally created with data that is visible from step 4. So, to maintain application integrity, when a FlashCopy is created, any I/O that is generated in step 1 must make it to step 4 when the FlashCopy is started. In other words, there must not be any outstanding write I/Os in steps 1, 2, or 3. If there were outstanding write I/Os, the copy of the disk that is created at step 4 is likely to be missing those transactions, and if the FlashCopy is to be used, these missing I/Os can make it unusable.
IBM_2145:svccf8:admin>svcinfo lsvdisk -delim : id:name:IO_group_id:IO_group_name:status:mdisk_grp_id:mdisk_grp_name:capacity:type :FC_id:FC_name:RC_id:RC_name:vdisk_UID:fc_map_count:copy_count:fast_write_state:se _copy_count 0:NYBIXTDB02_T03:0:io_grp0:online:3:MDG4DS8KL3331:20.00GB:striped:::::600507680182 05E12000000000000000:0:1:empty:0 1:NYBIXTDB02_2:0:io_grp0:online:0:MDG1DS8KL3001:5.00GB:striped:::::60050768018205E 12000000000000007:0:1:empty:0 3:Vdisk_1:0:io_grp0:online:2:MDG1DS4K:2.00GB:striped:::::60050768018205E1200000000 0000012:0:1:empty:0 9:VDISK_IMAGE_1:0:io_grp0:online:5:IMAGE_Test:20.00GB:image:::::60050768018205E120 00000000000014:0:1:empty:0 ... The VDISK_IMAGE, which is used in our example, is an image-mode volume. In this case, you need to know its exact size in bytes. In Example 6-13 on page 121, we use the -bytes parameter of the svcinfo lsvdisk command to find its exact size. Thus, the target volume must be created with a size of 21474836480 bytes, not 20 GB. 120
SAN Volume Controller Best Practices and Performance Guidelines
7521VDisks.fm
Example 6-13 Find the exact size of an image mode volume using the command line interface
IBM_2145:svccf8:admin>svcinfo lsvdisk -bytes VDISK_IMAGE_1 id 9 name VDISK_IMAGE_1 IO_group_id 0 IO_group_name io_grp0 status online mdisk_grp_id 5 mdisk_grp_name IMAGE_Test capacity 21474836480 type image formatted no mdisk_id 33 mdisk_name D8K_L3331_1108 FC_id FC_name RC_id RC_name vdisk_UID 60050768018205E12000000000000014 ... 3. Create a target volume of the required size as identified by the source volume. The target volume can be either an image, sequential, or striped mode volume; the only requirement is that it must be exactly the same size as the source volume. The target volume can be cache-enabled or cache-disabled. 4. Define a FlashCopy mapping, making sure that you have the source and target disks defined in the correct order. (If you use your newly created volume as a source and the existing hosts volume as the target, you will destroy the data on the volume if you start the FlashCopy.) 5. As part of the define step, you can specify the copy rate from 0 to 100. The copy rate will determine how quickly the SVC will copy the data from the source volume to the target volume. Setting the copy rate to 0 (NOCOPY), the SVC will only copy blocks that have changed since the mapping was started on the source volume to the target volume (if the target volume is mounted, read write to a host). 6. The prepare process for the FlashCopy mapping can take several minutes to complete, because it forces the SVC to flush any outstanding write I/Os belonging to the source volumes to the storage controllers disks. After the preparation completes, the mapping has a Prepared status and the target volume behaves as though it was a cache-disabled volume until the FlashCopy mapping is either started or deleted. Note: If you create a FlashCopy mapping where the source volume is a target volume of an active Metro Mirror relationship, you add additional latency to that existing Metro Mirror relationship (and possibly affect the host that is using the source volume of that Metro Mirror relationship as a result). The reason for the additional latency is that the FlashCopy prepares and disables the cache on the source volume (which is the target volume of the Metro Mirror relationship), and thus, all write I/Os from the Metro Mirror relationship need to commit to the storage controller before the completion is returned to the host.
Chapter 6. Volumes
121
7521VDisks.fm
7. After the FlashCopy mapping is prepared, you can then quiesce the host by forcing the host and the application to stop I/Os and flush any outstanding write I/Os to disk. This process is different for each application and for each operating system. One guaranteed way to quiesce the host is to stop the application and unmount the volume from the host. 8. As soon as the host completes its flushing, you can then start the FlashCopy mapping. The FlashCopy starts extremely quickly (at most, a few seconds). 9. When the FlashCopy mapping has started, you can then unquiesce your application (or mount the volume and start the application), at which point the cache is re-enabled for the source volumes. The FlashCopy continues to run in the background and ensures that the target volume is an exact copy of the source volume when the FlashCopy mapping was started. You can perform step 1 on page 120 through step 5 on page 121 while the host that owns the source volume performs its typical daily activities (that means no downtime). While step 6 on page 121 is running, which can last several minutes, there might be a delay in I/O throughput, because the cache on the volume is temporarily disabled. Step 7 must be performed when the application I/O is completly stopped (or suspended). However, the steps 8 and 9 complete quickly and application unavailability is minimal. The target FlashCopy volume can now be assigned to another host, and it can be used for read or write even though the FlashCopy process has not completed. Note: If you intend to use the target volume on the same host (as the source volume is) at the same time that the source volume is visible to that host, you might need to perform additional preparation steps to enable the host to access volumes that are identical.
7521VDisks.fm
1. Your host is currently writing to the volumes as part of its daily activities. These volumes will become the source volumes in our FlashCopy mappings. 2. Identify the size and type (image, sequential, or striped) of each source volume. If any of the source volumes is an image mode volume, you will need to know its size in bytes. If any of the source volumes are sequential or striped mode volumes, their size as reported by the SVC GUI or SVC command line will be sufficient. 3. Create a target volume of the required size for each source identified in the previous step. The target volume can be either an image, sequential, or striped mode volume; the only requirement is that they must be exactly the same size as their source volume. The target volume can be cache-enabled or cache-disabled. 4. Define a FlashCopy Consistency Group. This Consistency Group will be linked to each FlashCopy mapping that you have defined, so that data integrity is preserved between each volume. 5. Define a FlashCopy mapping for each source volume, making sure that you have the source disk and the target disk defined in the correct order. (If you use any of your newly created volumes as a source and the existing hosts volume as the target, you will destroy the data on the volume if you start the FlashCopy). When defining the mapping, make sure that you link this mapping to the FlashCopy Consistency Group that you defined in the previous step. As part of defining the mapping, you can specify the copy rate from 0 to 100. The copy rate will determine how quickly the SVC will copy the source volumes to the target volumes. Setting the copy rate to 0 (NOCOPY), the SVC will only copy blocks that have changed on any volume since the Consistency Group was started on the source volume or the target volume (if the target volume is mounted read/write to a host). 6. Prepare the FlashCopy Consistency Group. This preparation process can take several minutes to complete, because it forces the SVC to flush any outstanding write I/Os belonging to the volumes in the Consistency Group to the storage controllers disks. After the preparation process completes, the Consistency Group has a Prepared status and all source volumes behave as though they were cache-disabled volumes until the Consistency Group is either started or deleted. Note: If you create a FlashCopy mapping where the source volume is a target volume of an active Metro Mirror relationship, this mapping adds additional latency to that existing Metro Mirror relationship (and possibly affects the host that is using the source volume of that Metro Mirror relationship as a result). The reason for the additional latency is that the FlashCopy Consistency Group preparation process disables the cache on all source volumes (which might be target volumes of a Metro Mirror relationship), and thus, all write I/Os from the Metro Mirror relationship need to commit to the storage controller before the complete status is returned to the host. 7. After the Consistency Group is prepared, you can then quiesce the host by forcing the host and the application to stop I/Os and flush any outstanding write I/Os to disk. This process differs for each application and for each operating system. One guaranteed way to quiesce the host is to stop the application and unmount the volumes from the host. 8. As soon as the host completes its flushing, you can then start the Consistency Group. The FlashCopy start completes extremely quickly (at most, a few seconds).
Chapter 6. Volumes
123
7521VDisks.fm
9. When the Consistency Group has started, you can then unquiesce your application (or mount the volumes and start the application), at which point the cache is re-enabled. The FlashCopy continues to run in the background and preserves the data that existed on the volumes when the Consistency Group was started. Step 1 on page 123 through step 6 on page 123 can be performed while the host that owns the source volumes is performing its typical daily duties (that is, no downtime). While step 6 on page 123 is running, which can take several minutes, there might be a delay in I/O throughput, because the cache on the volumes is temporarily disabled. You must perform 7 on page 123 when the application I/O is completly stopped (or suspended). However, the steps 8 and 9 complete quickly and application unavailability is minimal. The target FlashCopy volumes can now be assigned to another host and used for read or write even though the FlashCopy processes have not completed. Note: If you intend to use any of the target volumes on the same host as their source volume at the same time that the source volume is visible to that host, you might need to perform additional preparation steps to enable the host to access volumes that are identical.
124
7521VDisks.fm
You can use thin volumes for cascaded FlashCopy and multiple target FlashCopy. It is also possible to mix thin with normal volumes, and it can be used for incremental FlashCopy too, but using thin volumes for incremental FlashCopy only makes sense if the source and target are Thin-provisioned. The recommendation for Thin-provisioned Flash Copy: Thin-provisioned volume grain size must be equal to the FlashCopy grain size. Thin-provisioned volume grain size must be 64 KB for the best performance and the best space efficiency. The exception is where the thin target volume is going to become a production volume (will be subjected to ongoing heavy I/O). In this case, the 256 KB thin-provisioned grain size is recommended to provide better long term I/O performance at the expense of a slower initial copy. Note: Even if the 256 KB thin-provisioned volume grain size is chosen, it is still beneficial if you keep the FlashCopy grain size to 64 KB. It is then possible to minimize the performance impact to the source volume, even though this size increases the I/O workload on the target volume. Clients with extremely large numbers of FlashCopy/Remote Copy relationships might still be forced to choose a 256 KB grain size for FlashCopy due to constraints on the amount of bitmap memory.
Chapter 6. Volumes
125
7521VDisks.fm
126
7521VDisks.fm
6. Zone the hosts to the SVC (while maintaining their current zone to their storage) so that you can discover and define the hosts to the SVC. 7. At an appropriate time, install the IBM SDD onto the hosts that will soon use the SVC for storage. If you have performed testing to ensure that the host can use both SDD and the original driver, you can perform this step anytime before the next step. 8. Quiesce or shut down the hosts so that they no longer use the old storage. 9. Change the masking on the LUNs on the old storage controller so that the SVC now is the only user of the LUNs. You can change this masking one LUN at a time so that you can discover them (in the next step) one at a time and not mix any LUNs up. 10.Use svctask detectmdisk to discover the LUNs as MDisks. We recommend that you also use svctask chmdisk to rename the LUNs to something more meaningful. 11.Define a volume from each LUN and note its exact size (to the number of bytes) by using the svcinfo lsvdisk command. 12.Define a FlashCopy mapping and start the FlashCopy mapping for each volume by using the steps in 6.8.1, Steps to making a FlashCopy volume with application data integrity on page 120. 13.Assign the target volumes to the hosts and then restart your hosts. Your host sees the original data with the exception that the storage is now an IBM SVC LUN. With these steps, you have made a copy of the existing storage, and the SVC has not been configured to write to the original storage. Thus, if you encounter any problems with these steps, you can reverse everything that you have done, assigning the old storage back to the host, and continue without the SVC. By using FlashCopy in this example, any incoming writes go to the new storage subsystem and any read requests that have not been copied to the new subsystem automatically come from the old subsystem (the FlashCopy source). You can alter the FlashCopy copy rate, as appropriate, to ensure that all the data is copied to the new controller. After the FlashCopy completes, you can delete the FlashCopy mappings and the source volumes. After all the LUNs have been migrated across to the new storage controller, you can remove the old storage controller from the SVC node zones and then, optionally, remove the old storage controller from the SAN fabric. You can also use this process if you want to migrate to a new storage controller and not keep the SVC after the migration. At step 2 on page 126, make sure that you create LUNs that are the same size as the original LUNs. Then, at step 11, use image mode volumes. When the FlashCopy mappings complete, you can shut down the hosts and map the storage directly to them, remove the SVC, and continue on the new storage controller.
Chapter 6. Volumes
127
7521VDisks.fm
Be careful if you want to map the target flash-copied volume to the same host that already has the source volume mapped to it. Check that your operating system supports this configuration. The target volume must be the same size as the source volume; however, the target volume can be a different type (image, striped, or sequential mode) or have different cache settings (cache-enabled or cache-disabled). If you stop a FlashCopy mapping or a Consistency Group before it has completed, you will lose access to the target volumes. If the target volumes are mapped to hosts, they will have I/O errors. A volume cannot be a source in one FlashCopy mapping and a target in another FlashCopy mapping. A volume can be the source for up to 256 targets. Starting on SVC 6.2.0.0, you are allowed to create a FlashCopy mapping using a target volume that is part of a remote copy relationship. This enables the reverse feature to be used in conjunction with a disaster recovery implementation. It also enables fast failback from a consistent copy held on a FlashCopy target volume at the auxiliary cluster to the master copy.
6.8.10 IBM System Storage Support for Microsoft Volume Shadow Copy Service
The SAN Volume Controller provides support for the Microsoft Volume Shadow Copy Service and Virtual Disk Service. The Microsoft Volume Shadow Copy Service can provide a point-in-time (shadow) copy of a Windows host volume while the volume is mounted and files are in use. The Microsoft Virtual Disk Service provides a single vendor and technology-neutral interface for managing block storage virtualization, whether done by operating system software, RAID storage hardware, or other storage virtualization engines. The following components are used to provide support for the service: SAN Volume Controller The cluster CIM server IBM System Storage hardware provider, known as the IBM System Storage Support for Microsoft Volume Shadow Copy Service and Virtual Disk Service software
128
7521VDisks.fm
Microsoft Volume Shadow Copy Service The vSphere Web Services when it is in a VMware virtual platform The IBM System Storage hardware provider is installed on the Windows host. To provide the point-in-time shadow copy, the components complete the following process: 1. A backup application on the Windows host initiates a snapshot backup. 2. The Volume Shadow Copy Service notifies the IBM System Storage hardware provider that a copy is needed. 3. The SAN Volume Controller prepares the volumes for a snapshot. 4. The Volume Shadow Copy Service quiesces the software applications that are writing data on the host and flushes file system buffers to prepare for the copy. 5. The SAN Volume Controller creates the shadow copy using the FlashCopy Copy Service. 6. The Volume Shadow Copy Service notifies the writing applications that I/O operations can resume, and notifies the backup application that the backup was successful. The Volume Shadow Copy Service maintains a free pool of volumes for use as a FlashCopy target and a reserved pool of volumes. These pools are implemented as virtual host systems on the SAN Volume Controller. For more details on how to implement and work with IBM System Storage Support for Microsoft Volume Shadow Copy Service, refer to Implementing the IBM System Storage SAN Volume Controller V6.1, SG24-7933-00.
Chapter 6. Volumes
129
7521VDisks.fm
130
7521CopyServices.fm
Chapter 7.
131
7521CopyServices.fm
When considering a remote copy solution, it is essential that we consider each of these process and the traffic they generate on the SAN, and intercluster link. It is important to
132
7521CopyServices.fm
understand how much traffic the SAN can take, without disruption, and how much traffic your application and copy services processes generate. Successful implementation depends on taking an holistic approach, were we consider all components, and their associated properties. This includes, host application sensitivity, local and remote SAN configurations, local and remote cluster and storage configuration, and the intercluster link.
133
7521CopyServices.fm
The diagram in Figure 7-1 on page 134 shows some of the definitions, described above, in pictorial form.
A successful implementation of an intercluster remote copy service is dependent on quality and configuration of the intercluster link (ISL). The intercluster link must be able to provide a dedicated bandwidth for remote copy traffic.
Link Latency is the time taken by data to move across a network from one location to
another and is measured in milliseconds. The longer the time, the greater the performance impact. Link Bandwidth is the network capacity to move data as measured in millions of bits per second (Mbps) or a billions of bits per second (Gbps). The term bandwidth is also used in the following context: Storage Bandwidth: The ability of the backend storage to process I/O. Measures the amount of data (in bytes) that can be sent in a specified amount of time. GM Partnership Bandwidth (parameter): The rate at which background write synchronization is attempted. (unit MB/s).
134
7521CopyServices.fm
Warning: With SVC version 5.1.0 the Bandwidth parameter must be explicitly defined by the Client when making a MM/GM partnership. Previously the default value of 50 MB/s was used. The removal of the default is intended to stop users from using the default bandwidth with a link which does not have sufficient capacity. Inter-cluster communication: As well as supporting Mirrored Foreground and background I/O a proportion of the link is also used to carry traffic associated with the exchange of low level messaging between the nodes of the local and remote clusters. A dedicated amount of the link bandwidth is required for: the exchange of heartbeat messages, and the initial configuration of intercluster partnerships,
Summary
Interlink bandwidth as shown in Figure 7-2 on page 135 must be capable of supporting the combined traffic of: Mirrored Foreground writes, as generated by foreground processes at peak times, Background Write Synchronization, as defined by GM Bandwidth Parameter, and Inter-cluster Communication (heartbeat messaging
135
7521CopyServices.fm
Zoning considerations
The zoning requirements have been revised and are covered in detail in the following flash https://www-304.ibm.com/support/docview.wss?uid=ssg1S1003634link These are also covered in section Intercluster (Remote) link on page 147 of this chapter.
136
7521CopyServices.fm
the time, in some disaster recovery scenarios, in which the MM/GM relationship is in a inconsistent state.
If corruption occurs on source volume A, or the relationship stops and becomes inconsistent, we may want to recover from the last incremental FlashCopy taken. Unfortunately recovering, pre SVC 6.2.0, meant the destruction of the MM/GM relationship. This means the Remote Copy needs to not be running when a FlashCopy process changes the state of the volume. If both processes were running concurrently a given volume could be subject to simultaneous data changes.
In release 6.1 and before, you couldnt Remote Copy (Global or Metro Mirror) a FlashCopy target So you could take a FlashCopy of a Remote Copy secondary for protecting consistency when resynchronising, or to record an important state of the disk G
But you couldnt copy it back to B without deleting the remote copy, then recreating the Remote Copy means we have to copy everything to A
A
Figure 7-4 Remote copy of flash copy target volumes
Destruction of the MM/GM relationship means that the a complete Background copy would be required before the relationship was once again in a consistent-synchronized state. Which would mean an extended period of time in which the host applications were unprotected.
137
7521CopyServices.fm
With the release of 6.2.0 the relationship does not need to be destroyed and a consistent-synchronised state can be achieved more quickly. This means host applications are unprotected for a reduced period of time. Note: SVC has always supported the ability to FlashCopy away from either a MM/GM source or target volume i.e. volumes in remote copy relationships have been able to act as source volumes of FlashCopy relationship. Some caveats: When you prepare a FlashCopy mapping, the SVC puts the source volumes into a temporary cache-disabled state. This temporary state adds additional latency to the remote copy relationship. I/Os that are normally committed to SVC cache, must now be directly committed destaged to the backend storage controller.
Global Mirror
Release 4.1.1: Initial release of GlobalMirror (Asynchronous remote copy) Changes for release 4.2.0 Release 4.2.0: Increase size of non-volatile bitmap space, copy-able vdisk space 16TB Allow 40TB of remote copy per IO group Release 5.1.0: Introduce Multi-Cluster Mirroring Release 6.2.0: Allow a Metro or Global Mirror disk to be a FlashCopy target.
Metro Mirror
Release 1.1.0: Initial release of remote copy Release 2.1.0: Initial release as MetroMirror Release 4.1.1: Algorithms employed to maintain synchronisation through error recovery have been changed to leverage the same Non-volatile journal as GlobalMirror. Release 4.2.0: Increase size of non-volatile bitmap space, copy-able vdisk space 16TB Allow 40TB of remote copy per IO group Release 5.1.0: Introduce Multi-Cluster Mirroring Release 6.2.0: Allow a Metro or Global Mirror disk to be a FlashCopy target.
7521CopyServices.fm
If you are looking for in-depth information on setting up remote copy partnerships and relationships, or administering remote copy relationships refer to the book Implementing the IBM System Storage SAN Volume Controller V6.1, SG24-7933
Note: Although mirrored foreground writes are performed asynchronously they are inter-related, at a GM process level, with foreground write I/O. A slow responses along the intercluster link may lead to a backlog of GM process events, or an inability to secure process resource on remote nodes. This in turn delays GMs ability to process foreground writes, and hence causes slower writes at application level.
The bandwidth and gmlinktolerance features used with GM are further defined by: relationship_bandwidth_limit: Maximum resynchronization limit, at relationship level. gm_max_hostdelay: Maximum acceptable delay of host I/O attributable to GM.
The GM partnership bandwidth parameter, is the parameter which specifies the rate, in megabytes per second (MBps), at which the background write resynchronization processes are attempted. i.e. the total bandwidth they consume. With SVC release 5.1.0 the granularity, of control, at a volume relationship level, for Background Write Resynchronization can be additionally modified using he relationship_bandwidth_limit parameter. Unlike its co-parameter it does have a default value, that being 25 MB/s. The parameter defines, at a cluster wide level, the maximum rate at which individual source to target volume background write resynchronization is attempted. Background write resynchronization is attempted at the lowest level, of the combination of these two parameters. Note: The term Background write (re)synchronization, when used in conjunction with SVC, may also be referred to as GM Background copy, within this and other IBM publications. Although asynchronous GM does add some additional overhead to foreground write I/O, as it requires a dedicated portion of the interlink bandwidth to function. Controlling this overhead is
Chapter 7. Remote Copy services
139
7521CopyServices.fm
critical with respected to foreground write I/O performance, and is achieved through the use of the gmlinktolerance parameter. This parameter defines the amount of time that GM processes can run on a poor performing link without adversely affecting foreground write I/O. By setting the gmlinktolerance time limit parameter, you define a safety valve which suspends GM processes, in order that foreground application write activity continues at acceptable performance levels. When creating a GM Partnership a default limit of 300s is used, but this adjustable. The parameter can also be set to 0 which effectively turns off the safety valve, meaning a poor performing link could adversely affect foreground write I/O. The gmlinktolerance parameter does not define: What constitutes a poor performing link, or explicitly define the latency that is acceptable for host applications With release 5.1.0, using the gm_max_hostdelay parameter, you define what constitutes a poor performing link. By using gm_max_hostdelay you can specify the maximum allowable overhead increase in processing foreground write I/O, in milliseconds, that is attributable to effect of running GM processes. If this threshold limit is exceeded, the link is considered to be performing poorly and gmlinktolerance parameter comes into play. The Global Mirror link tolerance timer starts counting down. y threshold value defines the maximum allowable additional impact that Global Mirror operations can add to the response times of foreground writes, on Global Mirror source volumes. The parameter may be used to increase the threshold limit from its default value of 5 milliseconds
140
7521CopyServices.fm
mkpartnership command
The mkpartnership command establishes a one-way Metro Mirror or Global Mirror relationship between the local cluster and a remote cluster. When making a partnership the client must set a Remote Copy bandwidth rate (in MBps), which specifies the proportion of the total intercluster link bandwidth used for MM/GM background copy operations. Note: To establish a fully functional Metro Mirror or Global Mirror partnership, you must issue this command from both clusters.
mkrcrelationship command
Once the partnership is established a Global Mirror relationship can be created between volumes of equal size, on the Master (local) and Auxiliary (remote) clusters. The volumes on the local cluster are Master Volumes, and have an initial role as the source volumes. The volumes on the remote cluster are defined as Auxiliary Volumes, and have the initial role as the target volumes. Notes: After the initial synchronization is complete, the copy direction can be changed, and the role of the Master and Auxiliary volumes can swap i.e source becomes target. As with FlashCopy Volumes can be maintained as Consistency Groups. Once background (re)synchronization is complete a Global Mirror relationship provides and maintains a consistent mirrored copy of a source volume to a target volume, but without requiring the hosts, connected to the local cluster, to wait for the full round-trip delay of the long distance inter-cluster link. i.e. the same function as Metro Mirror Remote Copy, but over longer distance using links with higher latency. Note: Global Mirror is an asynchronous remote copy service Writes to the target volume are made asynchronously, meaning that host writes to the source volume will provide the host with confirmation that the write is complete prior to the I/O completing on the target volume
141
7521CopyServices.fm
Host
(1) Write (2) Ack
The incoming (1) write transparently passes through the RC component of the software stack and into cache, where the write is (2) Acknowledged
Cache
Master volume
142
7521CopyServices.fm
The process that marshals the sequential sets of IOs operates at the remote cluster, and so is not subject to the latency of the long distance link. Definition: A consistent image is defined as PIT Point In Time consistency Figure 7-7 shows that a write operation to the master volume is acknowledged back to the host issuing the write before the write operation is mirrored to the cache for the auxiliary volume.
Host
(1) Write (2)
(1) Foreground write from host is processed by RC component, and then cached. (2) Foreground Write is acknowledged as complete by SVC to host application. Sometime later, a (3) Mirrored Foreground Write is sent to Aux volumne. (3) Mirrored Foreground Write Acknowledged.
Master volume
Auxillary volume
With Global Mirror, a confirmation is sent to the Host server before the Host receives a confirmation of the completion at the Auxiliary Volume. When a write is sent to a Master Volume, it is assigned a sequence number. Mirror writes sent to the Auxiliary Volume are committed in sequential number order. If a write is issued while another write is outstanding, it might be given the same sequence number. This functionality operates to maintain a consistent image at the Auxiliary Volume all times. It identifies sets of I/Os that are active concurrently at the primary VDisk, assigning an order to those sets, and applying these sets of I/Os in the assigned order at the Auxiliary Volume. If further write is received from a host while the secondary write is still active for the same block, even though the primary write might have completed, the new host write on the Auxiliary Volume will be delayed until the previous write has been completed.
143
7521CopyServices.fm
latency of the long distance link. These two elements of the protocol ensure that the throughput of the total cluster can be grown by increasing cluster size, while maintaining consistency across a growing data set. In a failover scenario, where the secondary site needs to become the master source of data, certain updates might be missing at the secondary site. Therefore, any applications that will use this data must have an external mechanism for recovering the missing updates and reapplying them, for example, such as a transaction log replay.
144
7521CopyServices.fm
These numbers correspond to the numbers in Figure 7-8: (1) First write is performed from the host to LBA X. (2) Host is provided acknowledgment that the write it complete even though the mirrored write to the auxiliary volume has note yet completed. (1) and (2) occur asynchronously with the first write. (3) Second write is performed from host also to LBA X, if this write occurs prior to (2) the write will be written to the journal file. (4) Host is provided acknowledgment that the second write is complete.
Link speed
The speed of a communication link determines how much data can be transported and how long the transmission takes. The faster the link the more data can be transferred within a given amount of time.
Latency
Latency is the time taken by data to move across a network from one location to another and is measured in milliseconds. The longer the time, the greater the performance impact. Latency depends on the speed of light (c = 3 x108m/s, vacuum = 3.3 microsec/km (microsec represents microseconds, one millionth of a second)). The bits of data travel at about two-thirds the speed of light in an optical fiber cable.
However, some latency is added when packets are processed by switches and routers and then forwarded to their destination. While the speed of light may seem infinitely fast, over continental and global distances latency becomes a noticeable factor. There is a direct relationship between distance and latency. Speed of light propagation dictates about one millisecond latency for every 100 miles. For some synchronous remote copy solutions, even a few milliseconds of additional delay may be unacceptable. Latency is a difficult challenge because bandwidth, spending more money for higher speeds reduces latency.
145
7521CopyServices.fm
Tip: SCSI write over Fibre Channel requires two round trips per I/O operation, we have 2 (round trips) x 2 (operations) x 5 microsec/km = 20 microsec/km. At 50 km we have an additional latency of 20 microsec/km x 50 km = 1000 microsec = 1 msec (msec represents millisecond). Each SCSI I/O has one msec of additional service time. At 100 km it becomes two msec additional service time
Bandwidth
Bandwidth with respect to Fibre Channel networks, is the network capacity to move data as measured in millions of bits per second (Mbps) or a billions of bits per second (Gbps). Whereas in storage terms, bandwidth measures the amount of data that can be sent in a specified amount of time. Storage applications issue read and write requests to storage devices, and these requests are satisfied at a certain speed commonly called the data rate. Usually disk and tape device data rates are measured in bytes per unit of time and not in bits. Most modern technology storage device LUNs or volumes can manage sequential sustained data rates in the order of 10 MBps to 80-90 MBps. Some manage higher rates. For example an application writes to disk at 80 MBps. Assuming a conversion ratio of 1 MB to 10 Mbits (this is reasonable because it accounts for protocol overhead) we have a data rate of 800 Mbits. It is always useful to check and make sure that you correctly co-relate MBps to Mbps. Warning: When setting up a GM Partnership, using mkpartnership, the -bandwidth parameter does not refer to the general bandwidth characteristic of the links between a local and remote cluster; instead it refers to the background copy (or write resynchronization) rate, as determined by the client, that the intercluster link can sustain.
146
7521CopyServices.fm
These rules will be considered in greater details in section Global Mirror parameters on page 154.
Master Volume
Auxiliary Volume
Copy direction
Role Primary
Role Secondary
Role Secondary
Copy direction
Role Primary
Warning: When the direction of the relationship is changed, the roles of the volumes are altered. A consequence of this is that the read/write properties are also changed. This means the master volume takes on a secondary role and becomes read-only.
147
7521CopyServices.fm
Basic Topology and Problems: Due to the nature of Fibre Channel, it is extremely important to avoid inter-switch link (ISL) congestion, this applies equally whether within individual SANs or across the intercluster link. While Fibre Channel (and the SVC) can, under most circumstances, handle a host or storage array that has become overloaded, the mechanisms in Fibre Channel for dealing with congestion in the fabric itself are not effective. The problems caused by fabric congestion can range anywhere from dramatically slow response time all the way to storage access loss. These issues are common with all high-bandwidth SAN devices and are inherent to Fibre Channel; they are not unique to the SVC. When a Fibre Channel network becomes congested, the Fibre Channel switches stop accepting additional frames until the congestion clears. Additionally they may also drop frames. Congestion may quickly move upstream in the fabric and clogs the end devices (such as the SVC) from communicating anywhere. This behavior is referred to as head-of-line blocking, and while modern SAN switches internally have a non-blocking architecture, head-of-line-blocking still exists as a SAN fabric problem. Head-of-line-blocking can result in your SVC nodes being unable to communicate with your storage subsystems or mirror their write caches, just because you have a single congested link leading to an edge switch.
7.4.3 Zoning
The zoning requirement have been revised and are covered in detail by the following flash: https://www-304.ibm.com/support/docview.wss?uid=ssg1S1003634link
148
7521CopyServices.fm
Although Multi-Cluster-Mirroring is support from release 5.1.0, which by its nature increases the potential to zone multiple cluster (nodes) together in a usable (future proof) configurations, however this is not the recommended configuration.
Abstract
SVC nodes in Metro or Global Mirror inter-cluster partnerships may experience lease expiry reboot events if an inter-cluster link to a partner system becomes overloaded. These reboot events may occur on all nodes simultaneously, leading to a temporary loss of host access to Volumes.
Content
If an inter-cluster link becomes severely and abruptly overloaded, it is possible for the local fibre channel fabric to become congested to the extent that no fibre channel ports on the local SVC nodes are able to perform local intra-cluster heartbeat communication. This may result in the nodes experiencing lease expiry events, in which a node will reboot in order to attempt to re-establish communication with the other nodes in the system. If all nodes lease expire simultaneously, this may lead to a loss of host access to Volumes for the duration of the reboot events.
Workaround
The recommended default zoning recommendation for inter-cluster Metro and Global Mirror partnerships has now been revised to ensure that, if link-induced congestion occurs, only two of the four fibre channel ports on each node are able to be subjected to this congestion. The remaining two ports on each node will remain unaffected, and therefore able to continue performing intra-cluster heartbeat communication without interruption. The revised zoning recommendation is as follows: For each node in a clustered system, exactly two fibre channel ports should be zoned to exactly two fibre channel ports from each node in the partner system. This implies that for
each system, there will be two ports on each SVC node that have no remote zones, only local zones
If dual-redundant ISLs are available, then the two ports from each node should be split evenly between the two ISLs, i.e. 1 port from each node should be zoned across each ISL. Local system zoning should continue to follow the standard requirement for all ports on all nodes in a clustered system to be zoned to one another.
149
7521CopyServices.fm
Note: Distance extension must only be utilized for links between SVC clusters. It must not be used for intra-cluster. Technically, distance extension is supported for relatively short distances, such as a few kilometers (or miles). Refer to the IBM System Storage SAN Volume Controller Restrictions, S1003903, for details explaining why this arrangement is not recommended.
150
7521CopyServices.fm
If the link between the sites is configured with redundancy, so that it can tolerate single failures, the link must be sized so that the bandwidth and latency statements continue to be accurate even during single failure conditions.
151
7521CopyServices.fm
Tip: SCSI write over Fibre Channel requires two round trips per I/O operation have 2 (round trips) x 2 (operations) x 5 microsec/km = 20 microsec/km. At 50 km we have an additional latency of 20 microsec/km x 50 km = 1000 microsec = 1 msec (msec represents millisecond). Each SCSI I/O has one msec of additional service time. At 100 km it becomes two msec additional service time. The decibel (dB) is a convenient way of expressing an amount of signal loss or gain within a system or the amount of loss or gain caused by some component of a system. When signal power is lost, you never lose a fixed amount of power. The rate at which you lose power is not linear. Instead you lose a portion of power: one half, one quarter, and so on. This makes it difficult to add up the lost power along a signals path through the network if measuring signal loss in watts. For example, a signal loses half its power through a bad connection, then it loses another quarter of its power on a bent cable. You cannot add 1/2 plus 1/4 to find the total loss. You must multiply 1/2 by 1/4. This makes calculating large network dB loss both time-consuming and difficult. Decibels, though, are logarithmic, allowing us to easily calculate the total loss/gain characteristics of a system just by adding them up. Keep in mind that they scale logarithmically. If your signal gains 3dB, the signal doubles in power. If your signal loses 3dB, the signal halves in power. It is important to remember that the decibel is a ratio of signal powers. You must have a reference point. For example, you can say, There is a 5dB drop over that connection. But you cannot say, The signal is 5dB at the connection. A decibel is not a measure of signal strength; instead, it is a measure of signal power loss or gain. A decibel milliwatt (dBm) is a measure of signal strength. People often confuse dBm with dB. A dBm is the signal power in relation to one milliwatt. A signal power of zero dBm is one milliwatt, a signal power of three dBm is two milliwatts, six dBm is four milliwatts, and so on. Do not be misled by minus signs. It has nothing to do with signal direction. The more negative the dBm goes, the closer the power level gets to zero. A good link has a very small rate of frame loss. A re-transmission occurs when a frame is lost, directly impacting performance. SVC aims to support retransmissions at 0.2 / 0.1.
7.4.10 Hops
The hop count as such is not increased by the inter-site connection architecture. For example, if we have our SAN extension based on DWDM, the DWDM components are transparent to the number of hops. The hop count limit within a fabric is set by the fabric devices (switch or director) operating system and it is used to derive a frame hold time value for each fabric device. This hold time value is the maximum amount of time that a frame can be held in a switch before it is dropped or the fabric is busy condition is returned. For example, a frame may be held if its destination port is not available. The hold time is derived from a formula using the error detect time-out value and the resource allocation time-out value. The discussion on these fabric values is beyond the scope of this book. However further information can be found in IBM TotalStorage: SAN Product, Design, and Optimization Guide, SG24-6384. If these times become excessive, the fabric experiences undesirable time outs. It is considered that every extra hop adds about 1.2 microseconds of latency to the transmission. Currently, SVC Remote Copy Services supports three hops when protocol conversion exists. That means if you have DWDM extended between primary and secondary sites, three SAN directors or switches can exist between primary and secondary SVC. 152
SAN Volume Controller Best Practices and Performance Guidelines
7521CopyServices.fm
153
7521CopyServices.fm
Asynchronous remote copy of volumes dispersed over metropolitan scale distances is supported. SVC implements the Global Mirror relationship between a volume pair. SVC supports intracluster Global Mirror, where both volumes belong to the same cluster (and I/O Group). However this functionality is better suited to Metro Mirror. SVC supports intercluster Global Mirror, where each volume belongs to its separate SVC cluster. A given SVC cluster can be configured for partnership with between one and three other clusters. This is Multi-Cluster-Mirroring (introduced in Release 5.1.0). Warnings: Clusters running software 6.1.0 or higher cannot form partnerships with clusters running software lower than 4.3.1. SAN Volume Controller clusters cannot form partnerships with Storwize V7000 clusters and vice versa. Intercluster and intracluster Global Mirror can be used concurrently within a cluster for separate relationships. SVC does not require a control network or fabric to be installed to manage Global Mirror. For intercluster Global Mirror, the SVC maintains a control link between the two clusters. This control link is used to control the state and to coordinate the updates at either end. The control link is implemented on top of the same FC fabric connection that the SVC uses for Global Mirror I/O. Note: Although not separate this control does require a dedicated portion of intercluster Link bandwidth. SVC implements a configuration state model that maintains the Global Mirror configuration and state through major events, such as failover, recovery, and re-synchronization. SVC implements flexible re-synchronization support, enabling it to resynthesized volume pairs that have experienced write I/Os to both disks and to re-synchronize only those regions that are known to have changed. Colliding writes are supported. An optional feature for Global Mirror permits a delay simulation to be applied on writes that are sent to auxiliary volumes. Remote Copy maintains write consistency where it ensures that, while the primary VDisk and the secondary VDisk are synchronized, the VDisks stays in sync even in the case of failure in the primary cluster or other failures that cause the results of writes to be uncertain.
154
7521CopyServices.fm
Of particular importance with respect to GM/MM are the following features: partnership (GM) bandwidth The GM partnership bandwidth parameter specifies the rate, in megabytes per second (MBps), at which the (background copy) write resynchronization process is attempted. From release 5.1.0 onwards, this parameter has no default value. (Previously 50MB/s). relationship_bandwidth_limit; (25) (Optional) Specifies the new background copy bandwidth in megabytes per second (MBps), from 1 - 1000. The default is 25 MBps. This parameter operates cluster-wide and defines the maximum background copy bandwidth that any relationship can adopt. The existing background copy bandwidth settings defined on a partnership continue to operate, with the lower of the partnership and VDisk rates attempted. Note: Do not set this value higher than the default without establishing that the higher bandwidth can be sustained. gm_link_tolerance; (300) (Optional) Specifies the length of time, in seconds, for which an inadequate intercluster link is tolerated for a Global Mirror operation. The parameter accepts values from 60 to 400 seconds in steps of 10 seconds. The default is 300 seconds. You can disable the link tolerance by entering a value of zero (0) for this parameter. Note: For later releases there is no default setting. This parameter must be explicitly defined by the client. gm_max_host_delay; (5) -gm_max_host_delay max_host_delay (Optional) Specifies the maximum time delay, in milliseconds, above which the Global Mirror link tolerance timer starts counting down. This threshold value determines the additional impact that Global Mirror operations can add to the response times of the Global Mirror source volumes. You can use this parameter to increase the threshold from the default value of 5 milliseconds. gm_inter_cluster_delay_simulation;0 (Optional) Specifies the intercluster delay simulation, which simulates the Global Mirror round trip delay between two clusters, in milliseconds. The default is 0; the valid range is 0 to 100 milliseconds. gm_intra_cluster_delay_simulation;(0) Optional) Specifies the intracluster delay simulation, which simulates the Global Mirror round trip delay in milliseconds. The default is 0; the valid range is 0 to 100 milliseconds.
155
7521CopyServices.fm
svctask copartnership -stop cluster1 For more details on using MM/GM commands see the Redbook Implementing the IBM System Storage SAN Volume Controller, SG24-7933 or use the command line help option (-h).
156
7521CopyServices.fm
Figure 7-10 shows an example of Global Mirror resources that are not optimized. Volumes from the Local Cluster are replicated to the Remote Cluster, where all Volumes with a preferred node of Node 1 are replicated to the Remote Cluster, where the target Volumes also have a preferred node of Node 1. With this configuration, the Remote Cluster Node 1 resources reserved for Local Cluster Node 2 are not used. Nor are the resources for Local Cluster Node 1 used for Remote Cluster Node 2.
If the configuration was changed to the configuration shown in Figure 7-11, all Global Mirror resources for each node are used, and SVC Global Mirror operates with better performance than that of the configuration shown in Figure 7-11.
157
7521CopyServices.fm
Increasing Latency of Foreground I/O: If GM bandwidth parameter, is set too high, with respect to the actual intercluster link capability, the background copy resynchronization writes consume too much of the intercluster link; starving the link of the ability to service (a)synchronous Mirrored Foreground Writes. Delays in processing the Mirrored Foreground Writes increase the latency of the foreground I/O as perceived by applications. Read I/O overload of Primary Storage: If the GM bandwidth parameter (background copy rate) is set too high, the additional read I/Os, associated with background copy writes, can overload the storage at the primary site and delay foreground (read and write) I/Os. Write I/O overload of Auxiliary Storage: If the GM bandwidth parameter (background copy rate) is set too high for the storage at the secondary site, background copy writes overload the secondary storage, and again delay the (a)synchronous Mirrored Foreground Write I/Os. Note: An increase in the peak foreground workload would also have a detrimental effect on foreground I/O by pushing more mirrored foreground write traffic along the intercluster link (which may not have the bandwidth to sustain it) and potentially overload the primary storage. To set the background copy bandwidth optimally, make sure that you take into consideration all aspects of your environments. The three biggest contributing resources are: the primary storage, the inter-cluster link bandwidth, and the secondary storage. As discussed changes in the environment, or loading of it, may result in foreground I/O being impacted. SVC provides the client with a means of monitoring, and a parameter for controlling how foreground I/O is affected by running remote copy processes. SVC code monitors the delivery of the Mirrored Foreground writes, and if Latency / Performance of these extends beyond a (predefined / client defined) limit for a defined period of time, the remote copy relationship is suspended. This cut off valve parameter is called gmlinktolerance.
158
7521CopyServices.fm
To detect these specific scenarios, Global Mirror measures the time taken to perform the messaging to assign and record the sequence number for a write IO. If this process exceeds the expected average over a period of 10s, then this period is treated as being over loaded. Users set maxhostdelay and gmlinktolerance to control how software responds to these delays. maxhostdelay is a value in milliseconds that can go up to 100. Every 10 seconds, Global Mirror takes a sample of all Global Mirror writes and determines how much of a delay it added. If over half of these writes are greater than maxhostdelay, that sample period is marked as bad. Software keeps a running count of bad periods. Each time there is a bad period, this count goes up by one. Each time there is a good period, this count goes down by 1, to a minimum value of 0. If the link is overloaded for a number of consecutive seconds greater than the gmlinktolerance value, then a 1920 (or other GM related error codes) will recorded against the volume that has consumed the most Global Mirror resource over recent time. A period without overload decrements the count of consecutive periods of overload. So an error log will also be raised if, over any given period of time, the amount of time in overload exceeds the amount of non-overloaded time by gmlinktolerance.
Edge Case
The worst possible situation, achieved by setting the gm_max_host_delay and gmlinktolerance parameters to their minimum settings (1ms and 20s). With these settings we only need two consecutive bad sample periods before a 1920 error conditions fires. If the foreground write I/O is very light IO load, say a single IO happens in 20s, then with some very unlucky timing: A single Bad I/O" (i.e. a write I/O that took over 1ms in RC), and The Bad I/O spans the boundary of two 10s sample periods This single Bad I/O could theoretically be counted as 2 x Bad Periods and trigger a 1920. A higher gmlinktolerance, gm_max_host_delay setting, or higher IO load would all reduce the risk of encountering this edge case.
159
7521CopyServices.fm
160
7521CopyServices.fm
We recommend that you use a SAN performance monitoring tool, such as IBM Tivoli Storage Productivity Center, which allows you to continuously monitor the SAN components for error conditions and performance problems. Tivoli Storage Productivity Canter can alert you as soon as there is a performance problem or if a Global (or Metro Mirror) link has been automatically suspended by the SVC. A remote copy relationship that remains stopped without intervention can severely impact your recovery point objective. Additionally, restarting a link that has been suspended for a long period of time can add additional burden to your links while the synchronization catches up. The gmlinktolerance parameter of the remote copy partnership must be set to an appropriate value. The default value of 300 seconds (5 minutes) is appropriate for most clients. If you plan to perform SAN maintenance that might impact SVC GM relationships: Pick a maintenance window where application I/O workload is reduced for the duration of the maintenance Disable the gmlinktolerance feature or increase the gmlinktolerance value (meaning that application hosts might see extended response times from Global Mirror Volumes) Stop the Global Mirror relationships
161
7521CopyServices.fm
Problem are not always related to Remote Copy Services or intercluster link, but rather hot spots on the disks subsystems. Be sure these problems are resolved. Is your secondary storage capable of handling the additional workload it receives? This is basically the same backend workload as generated by the primary applications.
162
7521CopyServices.fm
To maximize the number of I/Os that applications can perform to Global Mirror and Metro Mirror Volumes: Global Mirror and Metro Mirror Volume s at the remote cluster must be in dedicated MDisk Groups. The MDisk Groups must not contain non-mirror Volume s. Storage controllers must be configured to support the mirror workload that is required of them, which might be achieved by: Dedicating storage controllers to only Global Mirror and Metro Mirror Volume s Configuring the controller to guarantee sufficient quality of service for the disks used by Global Mirror and Metro Mirror Ensuring that physical disks are not shared between Global Mirror or Metro Mirror Volume s and other I/O Verifying that MDisks within a mirror MDisk group must be similar in their characteristics (for example, Redundant Array of Independent Disks (RAID) level, physical disk count, and disk speed)
163
7521CopyServices.fm
164
7521CopyServices.fm
4. Stop each mirror relationship by using the -access option, which enables write access to the target VDisks. We will need this write access later. 5. Make a copy of the source Volume to the alternate media by using the dd command to copy the contents of the Volume to tape. Another option might be using your backup tool (for example, IBM Tivoli Storage) to make an image backup of the Volume. Note: Even though the source is being modified while you are copying the image, the SVC is tracking those changes. Your image that you create might already have some of the changes and is likely to have missed some of the changes as well. When the relationship is restarted, the SVC will apply all of the changes that occurred since the relationship was stopped in step 1. After all the changes are applied, you will have a consistent target image.
6. Ship your media to the remote site and apply the contents to the targets of the Metro/Global Mirror relationship; you can mount the Metro Mirror and Global Mirror target Volume s to a UNIX server and use the dd command to copy the contents of the tape to the target Volume. If you used your backup tool to make an image of the Volume, follow
165
7521CopyServices.fm
the instructions for your tool to restore the image to the target Volume. Do not forget to remove the mount, if this is a temporary host. Note: It will not matter how long it takes to get your media to the remote site and perform this step. The quicker you can get it to the remote site and loaded, the quicker SVC is running and maintaining the Metro Mirror and Global Mirror.
7. Unmount the target volumes from your host. When you start the Metro Mirror and Global Mirror relationship later, the SVC will stop write access to the Volume while the mirror relationship is running. 8. Start your Metro Mirror and Global Mirror relationships. While the mirror relationship catches up, the target Volume is not usable at all. As soon as it reaches Consistent Copying, your remote Volume is ready for use in a disaster.
166
7521CopyServices.fm
7. Start the new Metro Mirror or Global Mirror relationship. 8. Remap the source Volume s to the host if you unmapped them in step 3. 9. Start the host and the application. Extremely important: If the relationship is not stopped in the consistent state, or if any host I/O takes place between stopping the old Metro Mirror or Global Mirror relationship and starting the new Metro Mirror or Global Mirror relationship, those changes will never be mirrored to the target volumes. As a result, the data on the source and target volumes is not exactly the same, and the SVC will be unaware of the inconsistency.
167
7521CopyServices.fm
Software level restrictions for Multiple Cluster Mirroring: Partnership between a cluster running 6.1.0 and a cluster running a version earlier than 4.3.1 is not supported. Clusters in a partnership where one cluster is running 6.1.0 and the other is running 4.3.1 cannot participate in additional partnerships with other clusters. Clusters that are all running either 6.1.0 or 5.1.0 can participate in up to three cluster partnerships.
Note: SVC 6.1 supports object names up to 63 characters. Previous levels only supported up to 15 characters. When SVC 6.1 clusters are partnered with 4.3.1 and 5.1.0 clusters, various object names will be truncated at 15 characters when displayed from 4.3.1 and 5.1.0 clusters.
Figure 7-13 shows four clusters in a star topology, with cluster A at the center. Where Cluster A can be a central DR site for the three other locations. Using a star topology, you can migrate applications by using a process like the one described in the following example: 1. Suspend application at A. 2. Remove the A B relationship. 3. Create the A C relationship (or alternatively, the B C relationship). 4. Synchronize to cluster C, and ensure A C is established: A B, A C, A D, B C, B D, and C D A B, A C, and B C
168
7521CopyServices.fm
Three clusters in a triangle topology. A potential use case here could be the that data center B is being migrated to C. If we assume data center A is the host production site, and that both B and C are DR sites. Using cluster-star topology it is possible to migrate different applications at different times using a process like 1. suspend application at A 2. take down A-B relationship 3. create A-C relationship (or alternatively B-C relationship) 4. synchronize to C, and ensure A-C is established
and by doing different applications over a series of weekends provide a phased migration capability.
Figure 7-15 is a fully connected mesh where every cluster has a partnership to each of the three other clusters. This allows volumes to be replicated between any pair of clusters.
169
7521CopyServices.fm
Note: This configuration is not recommended, unless relationships are needed between every pair of clusters. Intercluster zoning should be restricted to where necessary only.
Note that although clusters can have up to three partnerships, volumes can only be part of one Remote Copy relationship, for example A B.
This is unsupported, because five clusters are indirectly connected. If the cluster can detect this at the time of the fourth mkpartnership command, it will be rejected with an error message. Sometimes, however, this will not be possible - in this case, an error will appear in the errorlog of each cluster in the connected set.
170
7521CopyServices.fm
Important: The SVC only supports copy services between two clusters.
In Figure 7-18, the Primary Site uses SVC copy services (Global Mirror or Metro Mirror) to the secondary site. Thus, in the event of a disaster at the primary site, the storage administrator enables access to the target Volume (from the secondary site), and the business application continues processing. While the business continues processing at the secondary site, the storage controller copy services replicate to the third site.
171
7521CopyServices.fm
The storage controller might, as part of its Advanced Copy Services function, take a LUN offline or suspend reads or writes. The SVC does not understand why this happens; therefore, the SVC might log errors when these events occur. If you mask target LUNs to the SVC and rename your MDisks as you discover them and if the Advanced Copy Services function prohibits access to the LUN as part of its processing, the MDisk might be discarded and rediscovered with an SVC-assigned MDisk name.
7521CopyServices.fm
One method of avoiding this latency is to temporarily stop the Metro Mirror or Global Mirror relationship before preparing the FlashCopy mapping. When the Metro Mirror or Global Mirror relationship is stopped, the SVC records all changes that occur to the source Volume s and applies those changes to the target when the remote copy mirror is restarted. The steps to temporarily stop the Metro Mirror or Global Mirror relationship before preparing the FlashCopy mapping are: 1. Stop each mirror relationship by using the -access option, which enables write access to the target Volume s. We will need this access later. 2. Make a copy of the source volume to the alternate media by using the dd command to copy the contents of the volume to tape. Another option might be using your backup tool (for example, IBM Tivoli Storage Manager) to make an image backup of the volume. Note: Even though the source is being modified while you are copying the image, the SVC is tracking those changes. Your image that you create might already have part of the changes and is likely to have missed part of the changes as well. When the relationship is restarted, the SVC will apply all changes that have occurred since the relationship was stopped in step 1. After all the changes are applied, you will have a consistent target image. 3. Ship your media to the remote site and apply the contents to the targets of the Metro/Global mirror relationship; you can mount the Metro Mirror and Global Mirror target volumes to a UNIX server and use the dd command to copy the contents of the tape to the target Volume. If you used your backup tool to make an image of the volume, follow the instructions for your tool to restore the image to the target volume. Do not forget to remove the mount if this is a temporary host. Note: It will not matter how long it takes to get your media to the remote site and perform this step. The quicker you can get it to the remote site and loaded, the quicker the SVC is running and maintaining the Metro Mirror and Global Mirror. 4. Unmoant the target volumes from your host. When you start the Metro Mirror and Global Mirror relationship later, the SVC will stop write access to the volume while the mirror relationship is running. 5. Start your Metro Mirror and Global Mirror relationships. While the mirror relationship catches up, the target volume is not usable at all. As soon as it reaches Consistent Copying, your remote volume is ready for use in a disaster.
173
7521CopyServices.fm
Additional notes: If all clusters are running software version 5.1.0 or higher, each cluster can be partnered with up to three other clusters. This supports MCM. If a cluster is running a software level earlier than version 5.1.0, each cluster can be partnered with only one other cluster.
174
7521CopyServices.fm
Figure 7-20 Considerations for MM / GM and FlashCopy relationships prior to SVC 6.2
175
7521CopyServices.fm
Figure 7-21 Considerations for MM / GM and FlashCopy relationships with SVC 6.2
176
7521CopyServices.fm
Then, the administrator must ensure that commands are issued: A GM new relationship is created, using mkrcrelationship with the -sync flag. A new relationship is started, using startrcrelationship with the -clean flag.
Attention: Failure to perform these steps correctly can cause Global Mirror to report the relationship as consistent when it is not, thereby creating a data loss or data integrity exposure for hosts accessing data on the auxiliary volume.
177
7521CopyServices.fm
Otherwise, you must specify the -force option, and the Global Mirror relationship then enters the InconsistentCopying state while the background copy is started.
178
7521CopyServices.fm
all new write I/O, since relationship started, is processed through the background copy processes, and as such is subject to sequence and ordering of the MM/GM internal
processes, which differ from the real world ordering of the application.
At background copy completion, the relationship enters a Consistent-Synchronized state, all new write I/O is replicated as it is received, from the host in a consistent-synchronized relationship, the primary and secondary Volumes are different only in regions where writes from the host are outstanding. In this state the target volume is also available in read-only mode. As the state diagram shows, there are two possible states that a relationship may enter from consistent-synchronized, either: Consistent-stopped (state entered when we post a 1920 error), or Idling: Both source and target volumes have a common point-in-time consistent state, and both are made available in read/write mode. Write available means both could be used to service host applications, but at any additional writing to volumes in this state will cause the relationship to become inconsistent. Note: Moving from this point usually involves a period of inconsistent copying and therefore loss of redundancy. Errors occurring in this state, become even more critical as an Inconsistent stopped volume does not provide a known Consistent Level of redundancy it is unavailable in respect to read-only, or write / read.
InconsistentStopped
InconsistentStopped is a connected state. In this state, the master is accessible for read and write I/O, but the auxiliary is inaccessible for either read or write I/O. A copy process needs to be started to make the auxiliary consistent. This state is entered when the relationship or Consistency Group was InconsistentCopying and has either suffered a persistent error or received a stop command that has caused the copy process to stop. A start command causes the relationship or Consistency Group to move to the InconsistentCopying state. A stop command is accepted, but has no effect. If the relationship or Consistency Group becomes disconnected, the auxiliary side transitions to InconsistentDisconnected. The master side transitions to IdlingDisconnected.
InconsistentCopying
InconsistentCopying is a connected state. In this state, the master is accessible for read and write I/O, but the auxiliary is inaccessible for either read or write I/O.
179
7521CopyServices.fm
This state is entered after a start command is issued to an InconsistentStopped relationship or Consistency Group. It is also entered when a forced start is issued to an Idling or ConsistentStopped relationship or Consistency Group. In this state, a background copy process runs, which copies data from the master to the auxiliary volume. In the absence of errors, an InconsistentCopying relationship is active, and the copy progress increases until the copy process completes. In certain error situations, the copy progress might freeze or even regress. A persistent error or stop command places the relationship or Consistency Group into the InconsistentStopped state. A start command is accepted, but has no effect. If the background copy process completes on a stand-alone relationship, or on all relationships for a Consistency Group, the relationship or Consistency Group transitions to the ConsistentSynchronized state. If the relationship or Consistency Group becomes disconnected, the auxiliary side transitions to InconsistentDisconnected. The master side transitions to IdlingDisconnected.
ConsistentStopped
ConsistentStopped is a connected state. In this state, the auxiliary contains a consistent image, but it might be out-of-date with respect to the master. This state can arise when a relationship is in the ConsistentSynchronized state and experiences an error that forces a Consistency Freeze. It can also arise when a relationship is created with a CreateConsistentFlag set to true. Normally, following an I/O error, subsequent write activity causes updates to the master, and the auxiliary is no longer synchronized (set to false). In this case, to reestablish synchronization, consistency must be given up for a period. A start command with the -force option must be used to acknowledge this situation, and the relationship or Consistency Group transitions to InconsistentCopying. Issue this command only after all of the outstanding events are repaired. In the unusual case where the master and auxiliary are still synchronized (perhaps following a user stop, and no further write I/O was received), a start command takes the relationship to ConsistentSynchronized. No -force option is required. Also, in this unusual case, a switch command is permitted that moves the relationship or Consistency Group to ConsistentSynchronized and reverses the roles of the master and the auxiliary. If the relationship or Consistency Group becomes disconnected, then the auxiliary side transitions to ConsistentDisconnected. The master side transitions to IdlingDisconnected. An informational status log is generated every time a relationship or Consistency Group enters the ConsistentStopped with a status of Online state. This can be configured to enable an SNMP trap and provide a trigger to automation software to consider issuing a start command following a loss of synchronization.
ConsistentSynchronized
This is a connected state. In this state, the master volume is accessible for read and write I/O. The auxiliary volume is accessible for read-only I/O. Writes that are sent to the master volume are sent to both master and auxiliary volumes. Either successful completion must be received for both writes; the write must be failed to the
180
7521CopyServices.fm
host; or a state must transition out of the ConsistentSynchronized state before a write is completed to the host. A stop command takes the relationship to the ConsistentStopped state. A stop command with the -access parameter takes the relationship to the Idling state. A switch command leaves the relationship in the ConsistentSynchronized state, but reverses the master and auxiliary roles. A start command is accepted, but has no effect. If the relationship or Consistency Group becomes disconnected, the same transitions are made as for ConsistentStopped.
Idling
Idling is a connected state. Both master and auxiliary disks are operating in the master role. Consequently, both master and auxiliary disks are accessible for write I/O. In this state, the relationship or Consistency Group accepts a start command. Global Mirror maintains a record of regions on each disk that received write I/O while Idling. This record is used to determine what areas need to be copied following a start command. The start command must specify the new copy direction. A start command can cause a loss of consistency if either volume in any relationship has received write I/O, which is indicated by the synchronized status. If the start command leads to loss of consistency, you must specify a -force parameter. Following a start command, the relationship or Consistency Group transitions to ConsistentSynchronized if there is no loss of consistency, or to InconsistentCopying if there is a loss of consistency. Also, while in this state, the relationship or Consistency Group accepts a -clean option on the start command. If the relationship or Consistency Group becomes disconnected, both sides change their state to IdlingDisconnected.
181
7521CopyServices.fm
that the remote copies are consistent. This work can add a delay to the local write. Normally this delay is low. Users set maxhostdelay and gmlinktolerance to control how software responds to these delays. maxhostdelay is a value in milliseconds that can go up to 100. Every 10 seconds, Global Mirror takes a sample of all Global Mirror writes and determines how much of a delay it added. If over half of these writes are greater than maxhostdelay, that sample period is marked as bad. Software keeps a running count of bad periods. Each time there is a bad period, this count goes up by one. Each time there is a good period, this count goes down by one, to a minimum value of 0. The gmlinktolerance dictates the maximum allowable count of bad periods. The gmlinktolerance is given in seconds, in intervals of 10s. Whatever the gmlinktolerance is set to, this is divided by 10, and used as the maximum bad period count. Thus, if it is 300s, then the maximum bad period count is 30. Once this is reached, the 1920 fires. Bad periods do not need to be consecutive. 10 bad periods, followed by 5 good periods, followed by 10 bad periods, would result in a bad period count of 15.
182
7521CopyServices.fm
Note: Contact IBM Level 2 Support for assistance in collecting log information for 1920 errors. They can provided collection scripts that can be used during problem recreates, or deployed during proof on concept activities.
Intercluster link
For diagnostic purposes, the following questions should be asked regarding the Intercluster link. Was Link Maintenance Being Performed?: Hardware or Software maintenance associated with Intercluster Link. For example, updating firmware or adding additional capacity. The Intercluster link is overloaded? Indications of this can be found by statistical analysis, using I/O stats and/or Tivoli Storage Productivity Center , of inter-node communications and/or storage controller performance. Using Tivoli Storage productivity Center, you can check the storage metrics either before for GM relationships were stopped (this may be 10s of minutes depending in gmlinktolerance): Diagnose overloaded link using the following: a) High response time for inter-node communication An overloaded long-distance link causes high response times in the inter-node messages sent by SVC. If delays persist, the messaging protocols will exhaust their tolerance elasticity and the GM protocol will be forced to delay handling new foreground writes, whilst waiting for resources to free up. b) Storage Metrics (before 1920 posted)
Chapter 7. Remote Copy services
183
7521CopyServices.fm
If the write throughput, on the target volume, is approximately equal to your link bandwidth, it is extremely likely that your link is overloaded, check what is driving this: Peak foreground write activity does it exceed bandwidth, or does a Combination of this peak I/O and background copy exceed link capacity? Source Volume write throughput approaches link bandwidth This write throughput represents only the I/O performed by the application hosts. If this number approaches the link bandwidth, you might need to either upgrade the links bandwidth. Alternatively: reduce the foreground write I/O that the application is attempting to perform, or reduce number of RC relationships. Target Volume write throughput greater than source Volume write throughput If this condition exists, then the situation suggests a high level of background copy (in addition to mirrored foreground write I/O. Under these circumstances decreasing the GM partnerships background copy rate parameter; to bring the combined mirrored foreground I/O, and background copy I/O rate back within the remote links bandwidth. Storage Metrics (after 1920 posted) Source Volume write throughput after the GM relationships were stopped. If write throughput increases greatly (by 30% or more) after the GM relationships were stopped, this indicates that the application host was attempting to perform more I/O than the remote link can sustain. This is because while the GM relationships are active, the overloaded remote link causes higher response times to the application host, which in turn decreases the throughput of application host I/O at the source volume. Once the GM relationships have stopped, the application host I/O sees a lower response times, and the true write thoughput returns. To resolve this issue: i) Increase remote link bandwidth, ii) reduce application host I/O, or iii) Reduce number of GM relationships
Storage Controllers
Investigate the primary and remote storage controllers, starting at the remote site. If the back-end storage at the secondary cluster is overloaded, or other problem impacts cache there, then the GM protocol there will fail to keep up and this will similarly exhaust the (gmlinktolerance) elasticity and have similar impact at the primary cluster. Are the storage controller(s) at remote cluster overloaded (pilfering slowly)? Use TPC to obtain the back-end write response time for each MDisk at the remote cluster. Response time for any individual MDisk, which exhibits a sudden increase of 50 ms or more, or that is higher than 100 ms, generally indicates a problem with the backend. Note: Any of the MDisks on the remote backend storage controller, that are providing poor response times, may be underlying cause of a1920 error. i.e. if this response is such that it prevents application I/O from proceeding at the rate required by the application host and the gmlinktolerance parameter is fired - causing 1920. However if you have followed the specified back-end storage controller requirements, and have been running without problems until recently, it is most likely that the error has been caused by a decrease in controller performance due to maintenance actions or a hardware failure of the controller. Check the following:
184
7521CopyServices.fm
Is there an error condition on the storage controller; such as media errors, a failed physical disk, or a recovery activity, such as RAID array rebuilding, taking additional bandwidth. If there is an error, fix the problem and restart the Global Mirror relationships. If there is no error, consider whether the secondary controller is capable of processing the required level of application host I/O. It might be possible to improve the performance of the controller by: Adding more, or faster physical disks to a RAID array Changing the RAID level of the array Changing the controllers cache settings (and checking that the cache batteries are healthy, if applicable) Changing other controller-specific configuration parameter
Are the storage controllers at the primary site are overloaded? Analyze the performance of the primary back-end storage using the same steps you use for the remote back-end storage. The main effect of bad performance is to limit the amount of I/O that can be performed by application hosts. Therefore, back-end storage at the primary site must be monitored regardless of Global Mirror. However, if bad performance continues for a prolonged period, it is possible that a false 1920 error will be flagged. For example, the algorithms that access the impact of running Global Mirror will incorrectly interpret slow foreground write activity - and the slow background write activity associated with it - as being slow as a consequence of running Global Mirror and the Global Mirror relationships will stop.
185
7521CopyServices.fm
7.10.3 Recovery
After a 1920 error has occurred, the Global Mirror auxiliary VDisks are no longer in the consistent_synchronized state. The cause of the problem must be established, and fixed before the relationship can be restarted. Once restarted the relationship will need to re synchronize. During this period the data on the Metro Mirror or Global Mirror auxiliary VDisks on the secondary cluster is inconsistent and the VDisks could not be used as backup disks by your applications Note: If the relationship has stopped in a consistent state it is possible to use the data on auxiliary Volume, at remote cluster, as backup. Creating a Flash Copy of this volume before restarting the relationship gives additional data protection; as the Flash Copy Volume created maintains the current, consistent, image until such time that Metro Mirror or Global Mirror relationship is again synchronized, and back in a consistent state. To ensure the system has the capacity to handle the background copy load you may want to delay restarting the Metro Mirror or Global Mirror relationship until there is a quiet period. If the required link capacity is not available, you might experience another 1920 error and the Metro Mirror or Global Mirror relationship will stop in an inconsistent state.
svcinfo lsrcconsistgrp -filtervalue state=consistent_stopped -nohdr -delim : | while IFS=: read id name mci mcn aci acn p state junk; do echo "Restarting group: $name ($id)" svctask startrcconsistgrp -force $name echo "Clearing errors..." svcinfo lserrlogbyrcconsistgrp -unfixed $name | while read id type fixed snmp err_type node seq_num junk; do if [ "$id" != "id" ]; then echo "Marking $seq_num as fixed" svctask cherrstate -sequencenumber $seq_num fi done done svcinfo lsrcrelationship -filtervalue state=consistent_stopped -nohdr -delim : | while IFS=: read id name mci mcn mvi mvn aci acn avi avn p cg_id cg_name state junk; do if [ "$cg_id" == "" ]; then 186
SAN Volume Controller Best Practices and Performance Guidelines
7521CopyServices.fm
echo "Restarting relationship: $name ($id)" svctask startrcrelationship -force $name echo "Clearing errors..." svcinfo lserrlogbyrcrelationship -unfixed $name | while read id type fixed snmp err_type node seq_num junk; do if [ "$id" != "id" ]; then echo "Marking $seq_num as fixed" svctask cherrstate -sequencenumber $seq_num fi done fi done
187
7521CopyServices.fm
On the secondary cluster, review the maintenance logs to determine if the cluster was operating with reduced capability at the time the error was reported. The reduced capability might be due to a software upgrade, hardware maintenance to a 2145 node, maintenance to a backend disk subsystem or maintenance to the SAN. On the secondary 2145 cluster, correct any errors that are not fixed. On the intercluster link, review the logs of each link component for any incidents that would cause reduced capability at the time of the error. Ensure the problems are fixed. On the primary and secondary cluster reporting the error, examine internal IOStats. On the intercluster link, examine the performance of each component using an appropriate SAN productivity monitoring tool to ensure that they are operating as expected. Resolve any issues.
188
7521CopyServices.fm
Time must also be less than 100 ms. If response time is greater than 100 ms, application hosts might see extended response times if the SVCs cache becomes full. Write Data Rate for Global Mirror MDisk groups at the remote cluster This data rate indicates the amount of data that is being written by Global Mirror. If this number approaches either the inter-cluster link bandwidth or the storage controller throughput limit, be aware that further increases can cause overloading of the system and monitor this number appropriately.
Note: IBM support have a number of automated systems that support analysis of Tivoli Storage Productivity Center data. These systems rely on the default naming conventions (filenames) being used. The default names for Tivoli Storage Productivity Center files are: StorageSubsystemPerformance ByXXXXXX.csv Where XXXXXX is: IOGroup, ManagedDiskGroup, ManagedDisk, Node or Volume.
Hints and Tips for Tivoli Storage Productivity Center stats collection
Analysis requires either Tivoli Storage Productivity Center Statistics (CSV) or SVC Raw Statistics (XML). You can export statistics from your Tivoli Storage Productivity Center instance. Because these files get large very quickly, you may take action to limit this. For instance, you can filter the stats files so that individual records that are below a certain threshold are not exported.
189
7521CopyServices.fm
190
7521Hosts.fm
Chapter 8.
Hosts
This chapter describes best practices for monitoring host systems attached to the SAN Volume Controller (SVC). A host system is an Open Systems computer that is connected to the switch through a Fibre Channel (FC) interface. The most important part of tuning, troubleshooting, and performance considerations for a host attached to an SVC will be in the host. There are three major areas of concern: Using multipathing and bandwidth (physical capability of SAN and back-end storage) Understanding how your host performs I/O and the types of I/O Utilizing measurement and test tools to determine host performance and for tuning This topic supplements the IBM System Storage SAN Volume Controller V6.2.0 Information Center and Guides at: https://www-304.ibm.com/support/docview.wss?uid=ssg1S4000968 http://publib.boulder.ibm.com/infocenter/svc/ic/index.jsp
191
7521Hosts.fm
We have measured the effect of multipathing on performance as shown in the following tables. As the charts show, the differences in performance are generally minimal, but the
192
7521Hosts.fm
differences can reduce performance by almost 10% for specific workloads. These numbers were produced with an AIX host running IBM Subsystem Device Driver (SDD) against the SVC. The host was tuned specifically for performance by adjusting queue depths and buffers. We tested a range of reads and writes, random and sequential, cache hits and misses, at 512 byte, 4 KB, and 64 KB transfer sizes. Table 8-1 on page 193 shows the effects of multipathing.
Table 8-1 4.3.0 Effect of multipathing on write performance R/W test Write Hit 512 b Sequential IOPS Write Miss 512 b Random IOPS 70/30 R/W Miss 4K Random IOPS 70/30 R/W Miss 64K Random MBps 50/50 R/W Miss 4K Random IOPS 50/50 R/W Miss 64K Random MBps Four paths 81 877 60 510.4 130 445.3 1 810.8138 97 822.6 1 674.5727 Eight paths 74 909 57 567.1 124 547.9 1 834.2696 98 427.8 1 678.1815 Difference
-8.6%
-5.0% -5.6% 1.3% 0.6% 0.2%
While these measurements were taken with 4.3.0 SVC code, the number of paths affect on performance will not change with subsequent SVC versions.
Chapter 8. Hosts
193
7521Hosts.fm
194
7521Hosts.fm
This LUN mapping is called the Small Computer System Interface ID (scsi id), and the SVC software will automatically assign the next available ID if none is specified. There is also a unique identifier on each volume called the LUN serial number. The best practice recommendation is to allocate SAN boot OS volume as the lowest SCSI ID (zero for most hosts) and then allocate the various data disks. While not required, if you share a volume among multiple hosts, control the SCSI ID so the IDs are identical across the hosts. This consistency will ensure ease of management at the host level. If you are using image mode to migrate a host into the SVC, allocate the volumes in the same order that they were originally assigned on the host from the back-end storage. An invocation example: svcinfo lshostvdiskmap -delim The resulting output: id:name:SCSI_id:vdisk_id:vdisk_name:wwpn:vdisk_UID 2:host2:0:10:vdisk10:0000000000000ACA:6005076801958001500000000000000A 2:host2:1:11:vdisk11:0000000000000ACA:6005076801958001500000000000000B 2:host2:2:12:vdisk12:0000000000000ACA:6005076801958001500000000000000C 2:host2:3:13:vdisk13:0000000000000ACA:6005076801958001500000000000000D 2:host2:4:14:vdisk14:0000000000000ACA:6005076801958001500000000000000E For example, VDisk 10, in this example, has a unique device identifier (UID) of 6005076801958001500000000000000A, while the SCSI_ id that host2 used for access is 0. svcinfo lsvdiskhostmap -delim : EEXCLS_HBin01 id:name:SCSI_id:host_id:host_name:wwpn:vdisk_UID 950:EEXCLS_HBin01:14:109:HDMCENTEX1N1:10000000C938CFDF:600507680191011D48000000000 00466 950:EEXCLS_HBin01:14:109:HDMCENTEX1N1:10000000C938D01F:600507680191011D48000000000 00466 950:EEXCLS_HBin01:13:110:HDMCENTEX1N2:10000000C938D65B:600507680191011D48000000000 00466 950:EEXCLS_HBin01:13:110:HDMCENTEX1N2:10000000C938D3D3:600507680191011D48000000000 00466 950:EEXCLS_HBin01:14:111:HDMCENTEX1N3:10000000C938D615:600507680191011D48000000000 00466 950:EEXCLS_HBin01:14:111:HDMCENTEX1N3:10000000C938D612:600507680191011D48000000000 00466 950:EEXCLS_HBin01:14:112:HDMCENTEX1N4:10000000C938CFBD:600507680191011D48000000000 00466 950:EEXCLS_HBin01:14:112:HDMCENTEX1N4:10000000C938CE29:600507680191011D48000000000 00466 950:EEXCLS_HBin01:14:113:HDMCENTEX1N5:10000000C92EE1D8:600507680191011D48000000000 00466 950:EEXCLS_HBin01:14:113:HDMCENTEX1N5:10000000C92EDFFE:600507680191011D48000000000 00466 If using IBM multipathing software (IBM Subsystem Device Driver (SDD) or SDDDSM), the command datapath query device shows the vdisk_UID (unique identifier) and so enables easier management of volumes. The SDDPCM equivalent command is pcmpath query device.
Chapter 8. Hosts
195
7521Hosts.fm
The SCSI ID field in the host mapping might not be unique for a volume for a host, because it does not completely define the uniqueness of the LUN. The target port is also used as part of the identification. If there are two I/O Groups of volumes assigned to a host port, one set will start with SCSI ID 0 and then increment (given the default), and the SCSI ID for the second I/O Group will also start at zero and then increment by default. Refer to Example 8-1 on page 196 for a sample of this type of host map. Volume s-0-6-4 and volume s-1-8-2 both have a SCSI ID of ONE, yet they have different LUN serial numbers.
Example 8-1 Host mapping for one host from two I/O Groups
IBM_2145:ITSOCL1:admin>svcinfo lshostvdiskmap senegal id name SCSI_id vdisk_id wwpn vdisk_UID 0 senegal 1 60 210000E08B89CCC2 60050768018101BF28000000000000A8 0 senegal 2 58 210000E08B89CCC2 60050768018101BF28000000000000A9 0 senegal 3 57 210000E08B89CCC2 60050768018101BF28000000000000AA 0 senegal 4 56 210000E08B89CCC2 60050768018101BF28000000000000AB 0 senegal 5 61 210000E08B89CCC2 60050768018101BF28000000000000A7 0 senegal 6 36 210000E08B89CCC2 60050768018101BF28000000000000B9 0 senegal 7 34 210000E08B89CCC2 60050768018101BF28000000000000BA 0 senegal 1 40 210000E08B89CCC2 60050768018101BF28000000000000B5 0 senegal 2 50 210000E08B89CCC2 60050768018101BF28000000000000B1 0 senegal 3 49 210000E08B89CCC2 60050768018101BF28000000000000B2 0 senegal 4 42 210000E08B89CCC2 60050768018101BF28000000000000B3 0 senegal 5 41 210000E08B89CCC2 60050768018101BF28000000000000B4
vdisk_name s-0-6-4 s-0-6-5 s-0-5-1 s-0-5-2 s-0-6-3 big-0-1 big-0-2 s-1-8-2 s-1-4-3 s-1-4-4 s-1-4-5 s-1-8-1
Example 8-2 shows the datapath query device output of this Windows host. Note that the order of the two I/O Groups volumes is reversed from the host map. Volume s-1-8-2 is first, followed by the rest of the LUNs from the second I/O Group, then volume s-0-6-4, and the rest of the LUNs from the first I/O Group. Most likely, Windows discovered the second set of LUNS first. However, the relative order within an I/O Group is maintained.
Example 8-2 Using datapath query device for the host map
DEV#: 0 DEVICE NAME: Disk1 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000B5 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk1 Part0 OPEN NORMAL 0 0 196
SAN Volume Controller Best Practices and Performance Guidelines
7521Hosts.fm
1 2 3
Scsi Port2 Bus0/Disk1 Part0 Scsi Port3 Bus0/Disk1 Part0 Scsi Port3 Bus0/Disk1 Part0
1342 0 1444
0 0 0
DEV#: 1 DEVICE NAME: Disk2 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000B1 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk2 Part0 OPEN NORMAL 1405 0 1 Scsi Port2 Bus0/Disk2 Part0 OPEN NORMAL 0 0 2 Scsi Port3 Bus0/Disk2 Part0 OPEN NORMAL 1387 0 3 Scsi Port3 Bus0/Disk2 Part0 OPEN NORMAL 0 0 DEV#: 2 DEVICE NAME: Disk3 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000B2 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk3 Part0 OPEN NORMAL 1398 0 1 Scsi Port2 Bus0/Disk3 Part0 OPEN NORMAL 0 0 2 Scsi Port3 Bus0/Disk3 Part0 OPEN NORMAL 1407 0 3 Scsi Port3 Bus0/Disk3 Part0 OPEN NORMAL 0 0 DEV#: 3 DEVICE NAME: Disk4 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000B3 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk4 Part0 OPEN NORMAL 1504 0 1 Scsi Port2 Bus0/Disk4 Part0 OPEN NORMAL 0 0 2 Scsi Port3 Bus0/Disk4 Part0 OPEN NORMAL 1281 0 3 Scsi Port3 Bus0/Disk4 Part0 OPEN NORMAL 0 0 DEV#: 4 DEVICE NAME: Disk5 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000B4 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk5 Part0 OPEN NORMAL 0 0 1 Scsi Port2 Bus0/Disk5 Part0 OPEN NORMAL 1399 0 2 Scsi Port3 Bus0/Disk5 Part0 OPEN NORMAL 0 0 3 Scsi Port3 Bus0/Disk5 Part0 OPEN NORMAL 1391 0 DEV#: 5 DEVICE NAME: Disk6 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000A8 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk6 Part0 OPEN NORMAL 1400 0 1 Scsi Port2 Bus0/Disk6 Part0 OPEN NORMAL 0 0 2 Scsi Port3 Bus0/Disk6 Part0 OPEN NORMAL 1390 0 3 Scsi Port3 Bus0/Disk6 Part0 OPEN NORMAL 0 0 DEV#: 6 DEVICE NAME: Disk7 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000A9 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk7 Part0 OPEN NORMAL 1379 0 1 Scsi Port2 Bus0/Disk7 Part0 OPEN NORMAL 0 0
Chapter 8. Hosts
197
7521Hosts.fm
2 3
OPEN OPEN
NORMAL NORMAL
1412 0
0 0
DEV#: 7 DEVICE NAME: Disk8 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000AA ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk8 Part0 OPEN NORMAL 0 0 1 Scsi Port2 Bus0/Disk8 Part0 OPEN NORMAL 1417 0 2 Scsi Port3 Bus0/Disk8 Part0 OPEN NORMAL 0 0 3 Scsi Port3 Bus0/Disk8 Part0 OPEN NORMAL 1381 0 DEV#: 8 DEVICE NAME: Disk9 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000AB ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk9 Part0 OPEN NORMAL 0 0 1 Scsi Port2 Bus0/Disk9 Part0 OPEN NORMAL 1388 0 2 Scsi Port3 Bus0/Disk9 Part0 OPEN NORMAL 0 0 3 Scsi Port3 Bus0/Disk9 Part0 OPEN NORMAL 1413 0 DEV#: 9 DEVICE NAME: Disk10 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000A7 ============================================================================= Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk10 Part0 OPEN NORMAL 1293 0 1 Scsi Port2 Bus0/Disk10 Part0 OPEN NORMAL 0 0 2 Scsi Port3 Bus0/Disk10 Part0 OPEN NORMAL 1477 0 3 Scsi Port3 Bus0/Disk10 Part0 OPEN NORMAL 0 0 DEV#: 10 DEVICE NAME: Disk11 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000B9 ============================================================================= Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk11 Part0 OPEN NORMAL 0 0 1 Scsi Port2 Bus0/Disk11 Part0 OPEN NORMAL 59981 0 2 Scsi Port3 Bus0/Disk11 Part0 OPEN NORMAL 0 0 3 Scsi Port3 Bus0/Disk11 Part0 OPEN NORMAL 60179 0 DEV#: 11 DEVICE NAME: Disk12 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000BA ============================================================================= Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk12 Part0 OPEN NORMAL 28324 0 1 Scsi Port2 Bus0/Disk12 Part0 OPEN NORMAL 0 0 2 Scsi Port3 Bus0/Disk12 Part0 OPEN NORMAL 27111 0 3 Scsi Port3 Bus0/Disk12 Part0 OPEN NORMAL 0 0 Sometimes, a host might discover everything correctly at initial configuration, but it does not keep up with the dynamic changes in the configuration. The scsi id is therefore extremely important. For more discussion about this topic, refer to 8.2.4, Dynamic reconfiguration on page 201.
198
7521Hosts.fm
Chapter 8. Hosts
199
7521Hosts.fm
Table 8-3 shows the change in throughput for the case of 16 devices and random 4 Kb read miss throughput using the preferred node as opposed to a non-preferred node shown in Table 8-2.
Table 8-3 The 16 device random 4 Kb read miss throughput (IOPS) Preferred node (owner) 105 274.3 Non-preferred node 90 292.3 Delta 14 982
In Table 8-4, we show the effect of using the non-preferred paths compared to the preferred paths on read performance.
Table 8-4 Random (1 TB) 4 Kb read response time (4.1 nodes, usecs) Preferred Node (Owner) 5 074 Non-preferred Node 5 147 Delta 73
Table 8-5 shows the effect of using non-preferred nodes on write performance.
Table 8-5 Random (1 TB) 4 Kb write response time (4.2 nodes, usecs) Preferred node (owner) 5 346 Non-preferred node 5 433 Delta 87
IBM SDD software, SDDDSM software, and SDDPCM software recognize the preferred nodes and utilize the preferred paths.
200
7521Hosts.fm
Chapter 8. Hosts
201
7521Hosts.fm
error-prone and not recommended. However, it is possible to change the SVC volume presentation to the host by remembering several key issues. Hosts do not dynamically reprobe storage unless prompted by an external change or by the users manually causing rediscovery. Most operating systems do not notice a change in a disk allocation automatically. There is saved information about the device database information, such as the Windows registry or the AIX Object Data Manager (ODM) database, that is utilized.
Removing volumes and then later allocating new volumes to the host
The problem surfaces when a user removes a host map on the SVC during the process of removing a volume. After a volume is unmapped from the host, the device becomes unavailable and the SVC reports that there is no such disk on this port. Usage of datapath query device after the removal will show a closed, offline, invalid, or dead state as shown here: Windows host: DEV#: 0 DEVICE NAME: Disk1 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018201BEE000000000000041 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk1 Part0 CLOSE OFFLINE 0 0 1 Scsi Port3 Bus0/Disk1 Part0 CLOSE OFFLINE 263 0 AIX host: DEV#: 189 DEVICE NAME: vpath189 TYPE: 2145 POLICY: Optimized SERIAL: 600507680000009E68000000000007E6 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 fscsi0/hdisk1654 DEAD OFFLINE 0 0 1 fscsi0/hdisk1655 DEAD OFFLINE 2 0 2 fscsi1/hdisk1658 INVALID NORMAL 0 0 3 fscsi1/hdisk1659 INVALID NORMAL 1 0 The next time that a new volume is allocated and mapped to that host, the SCSI ID will be reused if it is allowed to set to the default value, and the host can possibly confuse the new device with the old device definition that is still left over in the device database or system memory. It is possible to get two devices that use identical device definitions in the device database, such as in this example. Note that both vpath189 and vpath190 have the same hdisk definitions while they actually contain different device serial numbers. The path fscsi0/hdisk1654 exists in both vpaths. DEV#: 189 DEVICE NAME: vpath189 TYPE: 2145 POLICY: Optimized SERIAL: 600507680000009E68000000000007E6 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 fscsi0/hdisk1654 CLOSE NORMAL 0 0 1 fscsi0/hdisk1655 CLOSE NORMAL 2 0 2 fscsi1/hdisk1658 CLOSE NORMAL 0 0 202
SAN Volume Controller Best Practices and Performance Guidelines
7521Hosts.fm
3 fscsi1/hdisk1659 CLOSE NORMAL 1 0 DEV#: 190 DEVICE NAME: vpath190 TYPE: 2145 POLICY: Optimized SERIAL: 600507680000009E68000000000007F4 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 fscsi0/hdisk1654 OPEN NORMAL 0 0 1 fscsi0/hdisk1655 OPEN NORMAL 6336260 0 2 fscsi1/hdisk1658 OPEN NORMAL 0 0 3 fscsi1/hdisk1659 OPEN NORMAL 6326954 0 The multipathing software (SDD) recognizes that there is a new device, because at configuration time, it issues an inquiry command and reads the mode pages. However, if the user did not remove the stale configuration data, the Object Data Manager (ODM) for the old hdisks and vpaths still remains and confuses the host, because the SCSI ID as opposed to the device serial number mapping has changed. You can avoid this situation if you remove the hdisk and vpath information from the device configuration database (rmdev -dl vpath189, rmdev -dl hdisk1654, and so forth) prior to mapping new devices to the host and running discovery. Removing the stale configuration and rebooting the host is the recommended procedure for reconfiguring the volumes mapped to a host. Another process that might cause host confusion is expanding a volume. The SVC will tell a host through the scsi check condition mode parameters changed, but not all hosts are able to automatically discover the change and might confuse LUNs or continue to use the old size. Review the IBM System Storage SAN Volume Controller V6.2.0 - Software Installation and Configuration Guide, GC27-2286, for more details and supported hosts: https://www-304.ibm.com/support/docview.wss?uid=ssg1S7003570
C:\Program Files\IBM\Subsystem Device Driver>datapath query device DEV#: 0 DEVICE NAME: Disk1 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000A0 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk1 Part0 OPEN NORMAL 0 0 1 Scsi Port2 Bus0/Disk1 Part0 OPEN NORMAL 1873173 0 2 Scsi Port3 Bus0/Disk1 Part0 OPEN NORMAL 0 0 3 Scsi Port3 Bus0/Disk1 Part0 OPEN NORMAL 1884768 0 DEV#: 1 DEVICE NAME: Disk2 Part0 TYPE: 2145 SERIAL: 60050768018101BF280000000000009F POLICY: OPTIMIZED
Chapter 8. Hosts
203
7521Hosts.fm
============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk2 Part0 OPEN NORMAL 0 0 1 Scsi Port2 Bus0/Disk2 Part0 OPEN NORMAL 1863138 0 2 Scsi Port3 Bus0/Disk2 Part0 OPEN NORMAL 0 0 3 Scsi Port3 Bus0/Disk2 Part0 OPEN NORMAL 1839632 0 If you just quiesce the host I/O and then migrate the volumes to the new I/O Group, you will get closed offline paths for the old I/O Group and open normal paths to the new I/O Group. However, these devices do not work correctly, and there is no way to remove the stale paths without rebooting. Note the change in the pathing in Example 8-4 for device 0 SERIAL:S60050768018101BF28000000000000A0.
Example 8-4 Windows volume moved to new I/O Group dynamically showing the closed offline paths
DEV#: 0 DEVICE NAME: Disk1 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000A0 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk1 Part0 CLOSED OFFLINE 0 0 1 Scsi Port2 Bus0/Disk1 Part0 CLOSED OFFLINE 1873173 0 2 Scsi Port3 Bus0/Disk1 Part0 CLOSED OFFLINE 0 0 3 Scsi Port3 Bus0/Disk1 Part0 CLOSED OFFLINE 1884768 0 4 Scsi Port2 Bus0/Disk1 Part0 OPEN NORMAL 0 0 5 Scsi Port2 Bus0/Disk1 Part0 OPEN NORMAL 45 0 6 Scsi Port3 Bus0/Disk1 Part0 OPEN NORMAL 0 0 7 Scsi Port3 Bus0/Disk1 Part0 OPEN NORMAL 54 0 DEV#: 1 DEVICE NAME: Disk2 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF280000000000009F ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk2 Part0 OPEN NORMAL 0 0 1 Scsi Port2 Bus0/Disk2 Part0 OPEN NORMAL 1863138 0 2 Scsi Port3 Bus0/Disk2 Part0 OPEN NORMAL 0 0 3 Scsi Port3 Bus0/Disk2 Part0 OPEN NORMAL 1839632 0 To change the I/O Group, you must first flush the cache within the nodes in the current I/O Group to ensure that all data is written to disk. The SVC command line interface (CLI) guide recommends that you suspend I/O operations at the host level. The recommended way to quiesce the I/O is to take the volume groups offline, remove the saved configuration (AIX ODM) entries, such as hdisks and vpaths for those that are planned for removal, and then gracefully shut down the hosts. Migrate the volume to the new I/O Group and power up the host, which will discover the new I/O Group. If the stale configuration data was not removed prior to the shutdown, remove it from the stored host device databases (such as ODM if it is an AIX host) at this point. For Windows hosts, the stale registry information is normally ignored after reboot. Doing volume migrations in this way will prevent the problem of stale configuration issues.
204
7521Hosts.fm
Chapter 8. Hosts
205
7521Hosts.fm
intervention. After SVC has submitted I/Os and has Q I/Os per second (IOPS) outstanding for a single MDisk (that is, it is waiting for Q I/Os to complete), it will not submit any more I/O until some I/O completes. That is, any new I/O requests for that MDisk will be queued inside SVC. The graph in Figure 8-1 on page 206 indicates the effect on host volume queue depth for a simple configuration of 32 volumes and one host.
Figure 8-1 (4.3.0) IOPS compared to queue depth for 32 volumes tests on a single host
Figure 8-2 shows another example of queue depth sensitivity for 32 volumes on a single host.
Figure 8-2 (4.3.0) MBps compared to queue depth for 32 volume tests on a single host
While these measurements were taken with 4.3.0 code, the effect that queue depth will have on performance is the same regardless of SVC code version.
206
7521Hosts.fm
Persistent reserve refers to a set of Small Computer Systems Interface-3 (SCSI-3) standard commands and command options that provide SCSI initiators with the ability to establish, preempt, query, and reset a reservation policy with a specified target device. The functionality provided by the persistent reserve commands is a superset of the legacy reserve/release commands. The persistent reserve commands are incompatible with the legacy reserve/release mechanism, and target devices can only support reservations from either the legacy mechanism or the new mechanism. Attempting to mix persistent reserve commands with legacy reserve/release commands will result in the target device returning a reservation conflict error.
Legacy reserve and release mechanisms (SCSI-2) reserved the entire LUN (volume) for exclusive use down a single path, which prevents access from any other host or even access from the same host utilizing a different host adapter. The persistent reserve design establishes a method and interface through a reserve policy attribute for SCSI disks, which specifies the type of reservation (if any) that the OS device driver will establish before accessing data on the disk. Four possible values are supported for the reserve policy: No_reserve: No reservations are used on the disk. Single_path: Legacy reserve/release commands are used on the disk. PR_exclusive: Persistent reservation is used to establish exclusive host access to the disk.
Chapter 8. Hosts
207
7521Hosts.fm
PR_shared: Persistent reservation is used to establish shared host access to the disk. When a device is opened (for example, when the AIX varyonvg command opens the underlying hdisks), the device driver will check the ODM for a reserve_policy and a PR_key_value and open the device appropriately. For persistent reserve, it is necessary that each host attached to the shared disk use a unique registration key value.
Clearing reserves
It is possible to accidently leave a reserve on the SVC volume or even the SVC MDisk during migration into the SVC or when reusing disks for another purpose. There are several tools available from the hosts to clear these reserves. The easiest tools to use are the commands lquerypr (AIX SDD host) and pcmquerypr (AIX SDDPCM host). There is also a Windows SDD/SDDDSM tool, which is menu driven. The Windows Persistent Reserve Tool is called PRTool.exe and is installed automatically when SDD or SDDDSM is installed: C:\Program Files\IBM\Subsystem Device Driver>PRTool.exe It is possible to clear SVC volume reserves by removing all the host mappings when SVC code is at 4.1.0 or higher. Example 8-5 shows how to determine if there is a reserve on a device using the AIX SDD lquerypr command on a reserved hdisk.
Example 8-5 The lquerypr command
[root@ktazp5033]/reserve-checker-> lquerypr -vVh /dev/hdisk5 connection type: fscsi0 open dev: /dev/hdisk5 Attempt to read reservation key... Attempt to read registration keys... Read Keys parameter Generation : 935 Additional Length: 32 Key0 : 7702785F Key1 : 7702785F Key2 : 770378DF Key3 : 770378DF Reserve Key provided by current host = 7702785F Reserve Key on the device: 770378DF This example shows that the device is reserved by a different host. The advantage of using the vV parameter is that the full persistent reserve keys on the device are shown, as well as the errors if the command fails. An example of a failing pcmquerypr command to clear the reserve shows the error: # pcmquerypr -ph /dev/hdisk232 -V connection type: fscsi0 open dev: /dev/hdisk232 couldn't open /dev/hdisk232, errno=16 Use the AIX include file errno.h to find out what the 16 indicates. This error indicates a busy condition, which can indicate a legacy reserve or a persistent reserve from another host (or this host from a different adapter). However, there are certain AIX technology levels (TLs) that have a diagnostic open issue, which prevents the pcmquerypr command from opening the device to display the status or to clear a reserve.
208
7521Hosts.fm
The following hint and tip gives more information about some older AIX TL levels that break the pcmquerypr command: http://www-1.ibm.com/support/docview.wss?rs=540&context=ST52G7&uid=ssg1S1003122&lo c=en_US&cs=utf-8&lang=en
8.5.1 AIX
The following topics describe items specific to AIX.
Transaction-based settings
The following host attachment script will set the default values of attributes for the SVC hdisks: devices.fcp.disk.IBM.rte or devices.fcp.disk.IBM.mpio.rte. You can modify these values, but they are an extremely good place to start. There are additionally HBA parameters that are useful to set for higher performance or large numbers of hdisk configurations. All attribute values that are changeable can be changed using the chdev command for AIX. AIX settings, which can directly affect transaction performance, are the queue_depth hdisk attribute and num_cmd_elem in the HBA attributes.
Chapter 8. Hosts
209
7521Hosts.fm
Throughput-based settings
In the throughput-based environment, you might want to decrease the queue depth setting to a smaller value than the default from the host attach. In a mixed application environment, you do not want to lower the num_cmd_elem setting, because other logical drives might need this higher value to perform. In a purely high throughput workload, this value will have no effect. Best practice: The recommended start values for high throughput sequential I/O environments are lg_term_dma = 0x400000 or 0x800000 (depending on the adapter type) and max_xfr_size = 0x200000. We recommend that you test your host with the default settings first and then make these possible tuning changes to the host parameters to verify if these suggested changes actually enhance performance for your specific host configuration and workload.
210
7521Hosts.fm
Multipathing
When the AIX operating system was first developed, multipathing was not embedded within the device drivers. Therefore, each path to an SVC volume was represented by an AIX hdisk. The SVC host attachment script devices.fcp.disk.ibm.rte sets up the predefined attributes within the AIX database for SVC disks, and these attributes have changed with each iteration of host attachment and AIX technology levels. Both SDD and Veritas DMP utilize the hdisks for multipathing control. The host attachment is also used for other IBM storage devices. The Host Attachment allows AIX device driver configuration methods to properly identify and configure SVC (2145), IBM DS6000 (1750), and IBM DS8000 (2107) LUNs: http://www-1.ibm.com/support/docview.wss?rs=540&context=ST52G7&dc=D410&q1=host+att achment&uid=ssg1S4000106&loc=en_US&cs=utf-8&lang=en
SDD
IBM Subsystem Device Driver (SDD) multipathing software has been designed and updated consistently over the last decade and is an extremely mature multipathing technology. The SDD software also supports many other IBM storage types directly connected to AIX, such as the 2107. SDD algorithms for handling multipathing have also evolved. There are throttling mechanisms within SDD that controlled overall I/O bandwidth in SDD Releases 1.6.1.0 and lower. This throttling mechanism has evolved to be single vpath specific and is called qdepth_enable in later releases. SDD utilizes persistent reserve functions, placing a persistent reserve on the device in place of the legacy reserve when the volume group is varyon. However, if IBM HACMP is installed, HACMP controls the persistent reserve usage depending on the type of varyon used. Also, the enhanced concurrent volume groups (VGs) have no reserves: varyonvg -c for enhanced concurrent and varyonvg for regular VGs that utilize the persistent reserve. Datapath commands are an extremely powerful method for managing the SVC storage and pathing. The output shows the LUN serial number of the SVC volume and which vpath and hdisk represent that SVC LUN. Datapath commands can also change the multipath selection algorithm. The default is load balance, but the multipath selection algorithm is programmable. The recommended best practice when using SDD is also load balance using four paths. The datapath query device output will show a somewhat balanced number of selects on each preferred path to the SVC: DEV#: 12 DEVICE NAME: vpath12 TYPE: 2145 POLICY: Optimized SERIAL: 60050768018B810A88000000000000E0 ====================================================================
Chapter 8. Hosts
211
7521Hosts.fm
Path# 0 1 2 3
Errors 0 0 0 0
We recommend that you verify that the selects during normal operation are occurring on the preferred paths (use datapath query device -l). Also, verify that you have the correct connectivity.
SDDPCM
As Fibre Channel technologies matured, AIX was enhanced by adding native multipathing support called Multipath I/O (MPIO). This structure allows a manufacturer of storage to create software plug-ins for their specific storage. The IBM SVC version of this plug-in is called SDDPCM, which requires a host attachment script called devices.fcp.disk.ibm.mpio.rte: http://www-1.ibm.com/support/docview.wss?rs=540&context=ST52G7&dc=D410&q1=host+att achment&uid=ssg1S4000203&loc=en_US&cs=utf-8&lang=en SDDPCM and AIX MPIO have been continually improved since their release. We recommend that you are at the latest release levels of this software. The preferred path indicator for SDDPCM will not display until after the device has been opened for the first time, which differs from SDD, which displays the preferred path immediately after being configured. SDDPCM features four types of reserve policies: No_reserve policy Exclusive host access single path policy Persistent reserve exclusive host policy Persistent reserve shared host access policy The usage of the persistent reserve now depends on the hdisk attribute: reserve_policy. Change this policy to match your storage security requirements. There are three path selection algorithms: Failover Round-robin Load balancing The latest SDDPCM code of 2.1.3.0 and later has improvements in failed path reclamation by a health checker, a failback error recovery algorithm, Fibre Channel dynamic device tracking, and support for SAN boot device on MPIO-supported storage devices.
212
7521Hosts.fm
With SDDPCM utilizing HACMP, enhanced concurrent volume groups require the no reserve policy for both concurrent and non-concurrent resource groups. Therefore, HACMP uses a software locking mechanism instead of implementing persistent reserves. HACMP used with SDD does utilize persistent reserves based on what type of varyonvg was executed.
SDDPCM pathing
SDDPCM pcmpath commands are the best way to understand configuration information about the SVC storage allocation. The following example shows how much can be determined from this command, pcmpath query device, about the connections to the SVC from this host.
DEV#: 0 DEVICE NAME: hdisk0 TYPE: 2145 ALGORITHM: Load Balance SERIAL: 6005076801808101400000000000037B ====================================================================== Path# Adapter/Path Name State Mode Select Errors 0 fscsi0/path0 OPEN NORMAL 155009 0 1 fscsi1/path1 OPEN NORMAL 155156 0 In this example, both paths are being used for the SVC connections. These counts are not the normal select counts for a properly mapped SVC, and two paths are not an adequate number of paths. Use the -l option on pcmpath query device to check whether these paths are both preferred paths. If they are both preferred paths, one SVC node must be missing from the host view. Using the -l option shows an asterisk on both paths, indicating a single node is visible to the host (and is the non-preferred node for this volume): 0* 1* fscsi0/path0 fscsi1/path1 OPEN OPEN NORMAL NORMAL 9795 0 9558 0
This information indicates a problem that needs to be corrected. If zoning in the switch is correct, perhaps this host was rebooted while one SVC node was missing from the fabric.
Veritas
Veritas DMP multipathing is also supported for the SVC. Veritas DMP multipathing requires certain AIX APARS and the Veritas Array Support Library. It also requires a certain version of the host attachment script devices.fcp.disk.ibm.rte to recognize the 2 145 devices as hdisks rather than MPIO hdisks. In addition to the normal ODM databases that contain hdisk attributes, there are several Veritas filesets that contain configuration data: /dev/vx/dmp /dev/vx/rdmp /etc/vxX.info Storage reconfiguration of volumes presented to an AIX host will require cleanup of the AIX hdisks and these Veritas filesets.
Chapter 8. Hosts
213
7521Hosts.fm
There are two types of volumes that you can create on a VIOS: physical volume (PV) VSCSI
PV VSCSI hdisks are entire LUNs from the VIOS point of view, and if you are concerned about failure of a VIOS and have configured redundant VIOSs for that reason, you must use PV VSCSI hdisks. So, PV VSCSI hdisks are entire LUNs that are volumes from the virtual I/O client (VIOC) point of view. An LV VCSI hdisk cannot be served up from multiple VIOSs. LV VSCSI hdisks reside in LVM volume groups (VGs) on the VIOS and cannot span PVs in that VG, nor be striped LVs. Due to these restrictions, we recommend using PV VSCSI hdisks. Multipath support for SVC attachment to Virtual I/O Server is provided by either SDD or MPIO with SDDPCM. Where Virtual I/O Server SAN Boot or dual Virtual I/O Server configurations are required, only MPIO with SDDPCM is supported. We recommend using MPIO with SDDPCM due to this restriction with the latest SVC-supported levels as shown by:
https://www-304.ibm.com/support/docview.wss?uid=ssg1S1003797#_VIOS
There are many questions answered on the following Web site for usage of the VIOS: http://www14.software.ibm.com/webapp/set2/sas/f/vios/documentation/faq.html One common question is how to migrate data into a VIO environment or how to reconfigure storage on a VIOS. This question is addressed in the previous link. Many clients want to know if SCSI LUNs can be moved between the physical and virtual environment as is. That is, given a physical SCSI device (LUN) with user data on it that resides in a SAN environment, can this device be allocated to a VIOS and then provisioned to a client partition and used by the client as is? The answer is no, this function is not supported at this time. The device cannot be used as is. Virtual SCSI devices are new devices when created, and the data must be put on them after creation, which typically requires a type of backup of the data in the physical SAN environment with a restoration of the data onto the volume.
214
7521Hosts.fm
Possible future enhancements to VIO Due in part to the differences in disk format that we just described, VIO is currently supported for new disk installations only. AIX, VIO, and SDD development are working on changes to make this migration easier in the future. One enhancement is to use the UDID or IEEE method of disk identification. If you use the UDID method, it might be possible to contact IBM technical support to get a method of migrating that might not require restoration. A quick and simple method to determine if a backup and restoration is necessary is to run the command lquerypv -h /dev/hdisk## 80 10 to read the PVID off the disk. If the output is different on both the VIOS and VIOC, you must use backup and restore.
8.5.4 Windows
There are two options of multipathing drivers released for Windows 2003 Server hosts. Windows 2003 Server device driver development has concentrated on the storport.sys driver. This driver has significant interoperability differences from the older scsiport driver set. Additionally, Windows has released a native multipathing I/O option with a storage specific plug-in. SDDDSM was designed to support these newer methods of interfacing with Windows 2003 Server. In order to release new enhancements more quickly, the newer hardware architectures (64-bit EMT and so forth) are only tested on the SDDDSM code stream; therefore, only SDDDSM packages are available. The older version of the SDD multipathing driver works with the scsiport drivers. This version is required for Windows Server 2000 servers, because storport.sys is not available. The SDD software is also available for Windows 2003 Server servers when the scsiport hba drivers are used.
Chapter 8. Hosts
215
7521Hosts.fm
When SDD or SDDDSM is installed, the reserve and release functions described in this article are translated into proper persistent reserve and release equivalents to allow load balancing and multipathing from each host.
Tunable parameters
With Windows operating systems, the queue depth settings are the responsibility of the host adapters and configured through the BIOS setting. Configuring the queue depth settings varies from vendor to vendor. Refer to your manufacturers instructions about how to configure your specific cards and the IBM SAN Volume Controller Information Center (Host Attachment Chapter): http://publib.boulder.ibm.com/infocenter/svc/ic/index.jsp?topic=/com.ibm.storage.s vc.console.doc/svc_FChostswindows_cover.html Queue depth is also controlled by the Windows application program. The application program has control of how many I/O commands it will allow to be outstanding before waiting for completion. The queue depth may have to be adjusted based on the overall IO group queue depth calculation in 8.3.1, Queue depths. For IBM FAStT FC2-133 (and QLogic-based HBAs), the queue depth is known as the execution throttle, which can be set with either the QLogic SANSurfer tool or in the BIOS of the QLogic-based HBA by pressing Ctrl+Q during the startup process.
8.5.5 Linux
IBM has decided to transition SVC multipathing support from IBM SDD to Linux native DM-MPIO multipathing (listed as Device Mapper Multipath in the table). Veritas DMP is also
216
7521Hosts.fm
available for certain kernels. Refer to the SAN Volume Controller Supported Hardware List, Device Driver, Firmware and Recommended Software Levels V6.2 for which versions of each Linux kernel require SDD, DM-MPIO, and Veritas DMP support: https://www-304.ibm.com/support/docview.wss?uid=ssg1S1003797#_RH21 Some kernels allow a choice of which multipathing driver to use. This is indicated by a horizontal bar between the choices of multipathing driver for the specific kernel shown to the left. If your kernel is not listed for support, contact your IBM marketing representative to request a Request for Price Quotation (RPQ) for your specific configuration. Certain types of Clustering are now supported, however the multipathing software choice is tied to the type of cluster and hba driver. For example, Veritas Storage Foundation is supported for certain hardware/kernel combinations, but it also requires veritas DMP multipathing. Contact IBM marketing for RPQ support if you need Linux Clustering in your specific environment and it is not listed.
Tunable parameters
Linux performance is influenced by HBA parameter settings and queue depth. Aside from the overall calculation for queue depth for the IO group mentioned in 8.3.1, Queue depths, there are also maximums per hba adapter/type with settings recommended in the SVC 6.2.0 Information Center. Refer to the settings for each specific HBA type and general Linux OS tunable parameters in the IBM SAN Volume Controller Information Center (Host Attachment Chapter) at: http://publib.boulder.ibm.com/infocenter/svc/ic/index.jsp?topic=/com.ibm.storage.s vc431.console.doc/svc_linover_1dcv35.html In addition to the I/O and OS parameters, Linux also has tunable file system parameters. You can use the command tune2fs to increase file system performance based on your specific configuration. The journal mode and size can be changed. Also, the directories can be indexed. http://www.ibm.com/developerworks/linux/library/l-lpic1-v3-104-2/index.html?ca=dgr -lnxw06TracjLXFilesystems
8.5.6 Solaris
There are two options for multipathing support on Solaris hosts. You will choose between Symantec/VERITAS Volume Manager, or Solaris MPxIO depending on your file system requirements and the OS levels in the latest interoperability matrix: https://www-304.ibm.com/support/docview.wss?uid=ssg1S1003797#_Sun58
Chapter 8. Hosts
217
7521Hosts.fm
IBM SDD is no longer supported because its features are now available natively in the multipathing driver Solaris MPxIO. IF SDD support is still needed, contact your IBM marketing representative to request a Request for Price Quotation (RPQ) for your specific configuration.
Solaris MPxIO
SAN boot and clustering support are available for 5.9 and 5.10 OS, dependent on the multipathing driver and hba choices. Releases of SVC code prior to 4.3.0 did not support load balancing of the MPxIO software. Configure your SVC host object with the type attribute set to tpgs if you want to run MPxIO on your Sun SPARC host. For example: svctask mkhost -name new_name_arg -hbawwpn wwpn_list -type tpgs In this command, -type specifies the type of host. Valid entries are hpux, tpgs, or generic. The tpgs option enables an extra target port unit. The default is generic. For guidance on configuring MPxIO software for OS 5.10 and using SVC volumes, please refer to the following document: http://download.oracle.com/docs/cd/E19957-01/819-0139/ch_3_admin_multi_devices.html
218
7521Hosts.fm
# vxdmpadm listenclosure all ENCLR_NAME ENCLR_TYPE ENCLR_SNO STATUS ============================================================ OTHER_DISKS OTHER_DSKSI OTHER_DISKS CONNECTED SAN_VC0 SAN_VC 0200628002faXX00 CONNECTED
For latest Veritas Patch levels, refer to: https://sort.symantec.com/patch/matrix To check the installed Symantec/VERITAS version: showrev -p |grep vxvm To check what IBM ASLs are configured into the volume manager: vxddladm listsupport |grep -i ibm Following the installation of a new ASL using pkgadd, you need to either reboot or issue vxdctl enable. To list the ASLs that are active, run vxddladm listsupport.
Chapter 8. Hosts
219
7521Hosts.fm
vxdmpadm listenclosure all ENCLR_NAME ENCLR_TYPE ENCLR_SNO STATUS ============================================================ OTHER_DISKS OTHER_DISKS OTHER_DISKS CONNECTED Disk Disk DISKS DISCONNECTED
8.5.7 VMware
Review the SAN Volume Controller Supported Hardware List, Device Driver, Firmware and Recommended Software Levels V6.2 to determine the various ESX levels that are supported, and whether you plan to utilize the newly available 6.2 support of VMware https://www-304.ibm.com/support/docview.wss?uid=ssg1S1003797#_VMVAAI SVC 6.2.0 adds support for VMware vStorage APIs. SVC implemented new storage-related tasks that were previously performed by VMware, which helps improve efficiency and frees up server resources for other more mission-critical tasks. The new functions include full copy, block zeroing, and hardware-assisted locking. If not using the new API functionality, the recommended minimum and supported VMware levels is now 3.5. If lower versions are required, contact your IBM marketing representative and ask about the submission of an RPQ for support. The necessary patches and procedures required will be supplied after the specific configuration has been reviewed and approved. Host Attachment recommendations are now available in theIBM System Storage SAN Volume Controller V6.2.0 Information Center and Guides at: http://publib.boulder.ibm.com/infocenter/svc/ic/index.jsp?topic=/com.ibm.storage.s vc.console.doc/svc_over_1dcur0.html
7521Hosts.fm
to honor the SVC preferred node which is discovered via the TPGS command. Path failover is automatic in both cases. If Round Robin is used, path failback may not return to a preferred node path, and therefore it is recommended to manually check pathing after any maintenance or problems have occurred. Multipathing configuration maximums The maximum supported configuration for the VMware multipathing software is: A total of 256 SCSI devices Four paths to each volume Note: Each path to a volume equates to a single SCSI device. For more information about VMware and SVC, VMware storage and zoning recommendations, HBA settings and attaching volumes to VMware, refer to: Implementing the IBM System Storage SAN Volume Controller V6.1, SG24-7933 http://www.redbooks.ibm.com/redpieces/abstracts/sg247933.html
Chapter 8. Hosts
221
7521Hosts.fm
8.7 Monitoring
A consistent set of monitoring tools is available when IBM SDD, SDDDSM, and SDDPCM are used for the multipathing software on the various OS environments. Examples earlier in this chapter showed how the datapath query device and datapath query adapter commands can be used for path monitoring. Path performance can also be monitored via datapath commands: datapath query devstats (or pcmpath query devstats) The datapath query devstats command shows performance information for a single device, all devices, or a range of devices. Example 8-6 shows the output of datapath query devstats for two devices.
Example 8-6 The datapath query devstats command output
C:\Program Files\IBM\Subsystem Device Driver>datapath query devstats Total Devices : 2 Device #: 0 ============= I/O: SECTOR: Transfer Size: Total Read 1755189 14168026 <= 512 271 Total Write 1749581 153842715 <= 4k 2337858 Active Read 0 0 <= 16K 104 Active Write 0 0 <= 64K 1166537 Maximum 3 256 > 64K 0
Device #: 1 ============= I/O: SECTOR: Transfer Size: Total Read 20353800 162956588 <= 512 296 Total Write 9883944 451987840 <= 4k 27128331 Active Read 0 0 <= 16K 215 Active Write 1 128 <= 64K 3108902 Maximum 4 256 > 64K 0
Also, an adapter level statistics command is available: datapath query adaptstats (also mapped to pcmpath query adaptstats). Refer to Example 8-7 for a two adapter example.
Example 8-7 The datapath query adaptstats output
C:\Program Files\IBM\Subsystem Device Driver>datapath query adaptstats Adapter #: 0 ============= I/O: SECTOR: Adapter #: 1 Total Read 11060574 88611927 Total Write 5936795 317987806 Active Read 0 0 Active Write 0 0 Maximum 2 256
222
7521Hosts.fm
============= I/O: SECTOR: Total Read 11048415 88512687 Total Write 5930291 317726325 Active Read 0 0 Active Write 1 128 Maximum 2 256
It is possible to clear these counters so that you can script the usage to cover a precise amount of time. The commands also allow you to choose devices to return as a range, single device, or all devices. The command to clear the counts is datapath clear device count.
Chapter 8. Hosts
223
7521Hosts.fm
Xdd is a tool for measuring and analyzing disk performance characteristics on single systems or clusters of systems. It was designed by Thomas M. Ruwart from I/O Performance, Inc. to provide consistent and reproducible performance of a sustained transfer rate of an I/O subsystem. It is a command line-based tool that grew out of the UNIX community and has been ported to run in Windows environments as well. Xdd is a free software program distributed under a GNU General Public License. Xdd is available for download at: http://www.ioperformance.com/products.htm The Xdd distribution comes with all the source code necessary to install Xdd and the companion programs for the timeserver and the gettime utility programs. DS4000 Best Practices and Performance Tuning Guide, SG24-6363-02, has detailed descriptions of how to use these measurement and test tools: http://www.redbooks.ibm.com/abstracts/sg246363.html?Open
224
7521p02_perf.fm
Part 1
Part
225
7521p02_perf.fm
226
7521Performance.fm
Chapter 9.
227
7521Performance.fm
Note: Items marked with (*) are optional. In CG8 model a node can have either SSD drives or the 10Gbps iSCSI interfaces, but not both.
In July 2007 a SVC with 8 nodes model 8G4 running code version 4.2 delivered 272,505.19 SPC-1 IOPS. In February 2010 a SVC with 6 nodes model CF8 running code version 5.1 delivered 380,489.30 SPC-1 IOPS. For details on each of these benchmarks see the documents posted in the URLs below. Check also the Storage Performance Council web site for the latest published SVC benchmanrks.
http://www.storageperformance.org/benchmark_results_files/SPC-1/IBM/A00087_IBM_DS8700_SVC-5 .1-6node/a00087_IBM_DS8700_SVC5.1-6node_full-disclosure-r1.pdf http://www.storageperformance.org/results/a00052_IBM-SVC4.2_SPC1_full-disclosure.pdf
Figure 9-1 on page 229 compares the performance between two different SVC clusters, each with one single I/O group, with a series of different workloads. The first case is a 2-node 8G4 cluster running SVC version 4.3, and the second is a 2-node CF8 cluster running SVC version 5.1. SR / SW: sequential read / sequential write RH / RM / WH / WM : read or write, cache hit / cache miss 512b / 4K / 64K: block size 70/30: mixed profile 70% read and 30% write
228
7521Performance.fm
Figure 9-1
When discussing Enterprise Storage solutions, raw I/O performance is important, but not everything: to date, IBM has shipped more than 22.500 SVC engines, running in more than 7,200 SVC systems. In 2008 and 2009, across the entire installed base, SVC delivered better than five nines (99.999%) availability. Check IBM SVC web site for the latest information on SVC. http://www.ibm.com/systems/storage/software/virtualization/svc
229
7521Performance.fm
Figure 9-2 SVC 5.1 2-node cluster with internal SSDs: throughput for a variety or workloads
The recommended configuration and use of SSDs in SVC 6.2, either installed internally in the SVC nodes or in the managed storage controllers, are covered in other chapters of this Redbook - see Chapters 10, 11 and 12 for details. Note: While we provide in this Redbook several recommendations on how to fine tune your existing SVC and extract its best not only in I/Os per second but also in ease of management, there are many more other possible scenarios than we could possibly cover here. We strongly encourage you to contact your IBM Representative and Storage Techline for advice if you have a highly demanding storage environment. They have the knowledge and tools to provide you the best-fitting, tailor-made SVC solution for your needs.
230
7521Performance.fm
Table 9-2 RAID levels for internal SSDs RAID level (GUI Preset) RAID-0 (Striped) RAID-1 (Easy Tier) What you will need 1-4 drives, all in a single node. 2 drives, one in each node of the IO Group. When to use it? When Volume Mirror is on external MDisks. When using Easy Tier and/or both mirrors on SSDs For best performance A pool should only contain arrays from a single IO Group. An Easy Tier pool should only contain arrays from a single IO Group. The external MDisks in this pool should only be used by the same IO Group. A pool should only contain arrays from a single IO Group. Recommended over Volume Mirroring.
RAID-10 (Mirrored)
231
7521Performance.fm
We recommend that you check this screen periodically for possible hot spots that might be developing in your SVC environment. To view this screen in the GUI go to the Home page then select Performance on the top left menu. At this moment the SVC GUI will start plotting the charts so give it a few moments until you can see the graphs being shown. You can position your cursor over a particular point in a curve to see details like the actual value and time for that point. SVC will plot a new point every five seconds and will show you the last five minutes of data. Try also changing the System Statistics setting in the top left corner to see details for a particular node. SVC Performance Monitor does not store performance data for later analysis, its display only show you what happened in the last five minutes. While it may provide you valuable input to help diagnose a performance problem in real time, it does not trigger performance alerts nor provide you the long-term trends required for capacity planning. For that you need a tool capable of collecting and storing performance data for long periods and present you with the corresponding reports, like IBM TotalStorage Productivity Center (TPC). See Chapter 14 Monitoring for details.
232
7521BEPerfConsid.fm
10
Chapter 10.
233
7521BEPerfConsid.fm
234
7521BEPerfConsid.fm
10.2 Tiering
You can use the SVC to create tiers of storage in which each tier has different performance characteristics by only including MDisks that have the same performance characteristics within an MDG. So, if you have a storage infrastructure with, for example, three classes of storage, you create each volume from the MDG, which has the class of storage that most closely matches the volumes expected performance characteristics. Because migrating between storage pools, or rather MDGs, is non-disruptive to the users, it is an easy task to migrate a volume to another storage pool, if the actual performance is different than expected. Note: If there is uncertainty about in which storage pool (SP) to create a volume, initially use the pool with the lowest performance and then move the Volume up to a higher performing pool later if required.
235
7521BEPerfConsid.fm
RAID arrays are created on the storage subsystem as a placement for a LUNs which are assigned to the SVC as managed disks. The performance of the particular RAID array depends on the following: Type of the drives used in the array (for example, 15K FC, 10K SAS, 7.2K SATA, SSD) Number of the drives used in the array Type of the RAID used (ie. RAID 10, RAID 5, RAID 6) Table 10-1 shows conservative rule of thumb numbers for random IO performance which can be used in the calculations.
Table 10-1 Disk IO rates Disk type FC 15K/SAS 15K FC 10K/SAS 10K SATA 7.2K Number of IOps 160 120 75
The next important parameter which has to be considered when we want to calculate the IO capacity of an RAID array is the write penalty. Write penalty for various RAID array types is shown in Table 10-2.
Table 10-2 RAID write penalty RAID type RAID 5 RAID 10 RAID 6 Number of sustained failures 1 minimum 1 2 Number of disks N+1 2xN N+2 Write penalty 4 2 6
RAID 5 and RAID 6 do not suffer from the write penalty in case that full stripe writes (also called stride writes) are performed. In this case the write penalty is 1. With this and the information of how many disks are in each array, we are able to calculate read and write IO capacity of a particular array. In Table 10-3 we calculate the IO capacity. In this example our RAID array has eight 15K FC drives.
Table 10-3 RAID array (8 drives) IO capacity RAID type RAID 5 RAID 10 RAID 6 Read only IO capacity (IOps) 7 x 160 = 1120 8 x 160 = 1280 6 x 160 = 960 Write only IO capacity (IOps) (8 x 160)/4 = 320 (8 x 160)/2 = 640 (8 x 160)/6 = 213
In most of the current generation storage subsystems write operations are cached and handled asynchronous meaning that write penalty is hidden from the user. Of course heavy and steady random writes can cause that write cache destage is not fast enough and in this situation the speed of the array will be limited to the speed defined with the number of drives and the RAID array type. The numbers from Table 10-3 on page 236 are
236
7521BEPerfConsid.fm
covering the worst case scenario and are not taking into account any read or write cache efficiency. Storage pool IO capacity If we are using 1:1 LUN (SVC managed disk) to array mapping, then the array IO capacity is already the IO capacity of the managed disk. The IO capacity of the SVC storage pool, is the sum of the IO capacity of all managed disks in that pool. For example, if we have 10 managed disks from the RAID arrays with 8 disks as used in our example, then the IO capacity of the storage pool will be as shown in Table 10-4.
Table 10-4 Storage pool IO capacity RAID type RAID 5 RAID 10 RAID 6 Read only IO capacity (IOps) 10 * 1120 = 11200 10 * 1280 = 12800 10 * 960 = 9600 Write only IO capacity (IOps) 10 * 320 = 3200 10 * 640 = 6400 10 * 213 = 2130
IO capacity of RAID 5 storage pool would range from 3200 IOps when the workload pattern on the RAID array level is 100% write and 11200, when the workload pattern is 100% read. It is important to understand that this is workload pattern caused by SVC towards storage subsystem and it is not necessarily the same as it is from the host to the SVC, because of the SVC cache usage. If more than one managed disk (LUN) is used per array, then each managed disk will get the portion of the array IO capacity. For example if we would have two LUNs per our eight disk array and only one of the managed disks from each array would be used in the storage pool then IO capacity for 10 managed disks would be as shown in Table 10-5.
Table 10-5 Storage pool IO capacity with two LUNs per array RAID type RAID 5 RAID 10 RAID 6 Read only IO capacity (IOps) 10 * 1120/2 = 5600 10 * 1280/2 = 6400 10 * 960/2 = 4800 Write only IO capacity (IOps) 10 * 320/2 = 1600 10 * 640/2 = 3200 10 * 213/2 = 1065
The numbers shown in Table 10-5 are valid in the case that both LUNs on the array are evenly utilized. In the case that the second LUNs on the arrays participating in the storage pool, are idle storage pool capacity you can achieve numbers shown in Table 10-4 on page 237. In an environment with two LUNs per array the second LUN can also utilize the entire IO capacity of the array and cause the LUN used for the SVC storage pool to get less available IOps. If the second LUN on those arrays is also used for the SVC storage pool, the cumulative IO capacity of two storage pools in this case would be equal to one storage pool with one LUN per array. Storage subsystem cache influence The numbers for SVC storage pool IO capacity calculated in Table 10-5 did not take in account caching on the storage subsystem level, but only the raw RAID array performance. Similar to the hosts using SVC having the read/write pattern and cache efficiency in its workload, the SVC has also read/write pattern and cache efficiency towards the storage subsystem. Let us look at the example of the following host to SVC IO pattern:
237
7521BEPerfConsid.fm
70:30:50 - 70% reads, 30% writes, 50% read cache hits Read related IOps generated from the host IO = Host IOps * 0.7 * 0.5 Write related IOps generated from the host IO = Host IOps * 0.3 Table 10-6 shows the relation from host IOps to the SVC backend IOps.
Table 10-6 Host to SVC backend IO map Host IOps 2000 Pattern 70:30:50 Read IOps 700 Write IOps 600 Total IOps 1300
The total IOps from the Table 10-6 is the number of IOps which will be sent from SVC to the storage pool on the storage subsystem. As SVC is acting as the host towards storage subsystem we can also assume that we will have some read/write pattern and read cache hit on this traffic. As we can see from the table above, the 70:30 read/write pattern with the 50% cache hit from the host to the SVC is causing approximate 54:46 read write pattern from for the SVC traffic to the storage subsystem. If we apply the same read cache hit 50%, we get the 950 IOps which will be send to the RAID arrays, which are part of the storage pool, inside the storage subsystem as shown in Table 10-7.
Table 10-7 SVC to storage subsystem IO map SVC IOps 1300 Pattern 54:46:50 Read IOps 350 Write IOps 600 Total IOps 950
Note: These calculations are valid only when the IO generated from the host to the SVC generates exactly one IO from the SVC to the storage subsystem. If for example, SVC is combining several host IOs to one storage subsystem IO, this would mean that higher IO capacity can be achieved. It is also important to understand that IO with higher block size decreases the RAID array IO capacity, so it is quite possible that combining the IOs will not increase the total array IO capacity as viewed from the host perspective. The drive IO capacity numbers used in above IO capacity calculations are for small block size (ie. 4K-32K). To simplify this example we will assume that number of IOps generated on the path from the host to the SVC and from the SVC to the storage subsystem will remain the same. If we assume the write penalty, then the total IOps towards RAID array for the above host example would be as shown in Table 10-8.
Table 10-8 RAID array total utilization RAID type RAID 5 RAID 10 RAID 6 Host IOps 2000 2000 2000 SVC IOps 1300 1300 1300 RAID array IOps 950 950 950 RAID array IOps with write penalty 350+4*600 = 2750 350+2*600 = 1550 350+6*600 = 3950
Based on the above calculations we can make a generic formula to calculate available host IO capacity from the given RAID/storage pool IO capacity. Let us assume we have the following parameters: R - Host read ratio (%)
238
7521BEPerfConsid.fm
W - Host write ratio (%) C1 - SVC read cache hits (%) C2 - Storage subsystem read cache hits (%) WP - Write penalty for the RAID array XIO - RAID array/storage pool IO capacity Host IO capacity (HIO) could then be cacluated using the following formula: HIO = XIO / (R*C1*C2/1000000 + W*WP/100) As we can see the host IO capacity can be lower than storage pool IO capacity when the denominator in the above formula is bigger than 1. If we want to calculate at which write percentage in IO pattern (W) host IO capacity will be lower than storage pool capacity we can use the following formula: W =< 99.9 / (WP - C1*C2/10000) We can see that write percentage (W) mainly depends on the write penalty of the RAID array. In the Table 10-9 we can see the break-even value for the W with read cahce hit of 50% on the SVC and storage subsystem level.
Table 10-9 W % break-even RAID type RAID 5 RAID 10 RAID 6 Write penalty (WP) 4 2 6 W % break-even 26,64% 57,08% 17.37%
The W % break-even value from Table 10-9 is a good reference which RAID level should be used if we want to maximally utilize the storage subsystem backend RAID arrays from the write workload perspective. With the above formulas we can also calculate what would be host IO capacity for our example storage pool from Table 10-4 on page 237 with the 70:30:50 IO pattern (Read:Write:Cache hit) from the host side and 50% read cache hit on the storage subsystem. The results are shown in Table 10-10.
Table 10-10 Host IO example capacity RAID type RAID 5 RAID 10 RAID 6 Storage pool IO capacty (IOps) 112000 128000 9600 Host IO capacity (IOps) 8145 16516 4860
As already mentioned the above formula assumes that there is no IO grouping on the SVC level. With the SVC code 6.x the default backend read and write IO size is 256K. This means that it a [possible scenario is that a host would read or write multiple, for example 8, aligned 32K blocks from/to the SVC. The SVC would combine this to one IO on the backend side. In such cases the above formulas would need to be adjusted and in this case the available host IO for particular storage pool would increase.
239
7521BEPerfConsid.fm
FlashCopy
The use of the FlashCopy on the volume can generate additional load on the backend. It is important to understand that until FlashCopy (FC) target is not fully copied or when copy rate 0 is used, the IO to the FC target will cause IO load on the FC source. Once FC target is fully copied read/write IOs are served independently from the source read/write IO requests. The following combinations as shown in Table 10-11, are possible when copy rate 0 is used or target FC volume is not fully copied and IO are executed in uncopied area.
Table 10-11 Flash copy IO operations IO operation 1x read IO from source 1x write IO to source 1x write IO to source to the already copied area (copy rate > 0) 1x read IO from target 1x read IO from target from the already copied area copy rate > 0) 1x write IO to target 1x write IO to target to the already copied area copy rate > 0) Source volume write IOs 0 1 1 Source volume read IOs 1 1 0 Target volume write IOs 0 1 0 Target volume read IOs 0 0 0
0 0
1 0
0 0
0 0
1 0
1 1
0 0
As we can see that in certain IO operations we will experience multiple IO overhead which can cause performance degradation of the source and target volume. It is especially important to understand that if the source and the target FC volume will share the same backend storage pool as shown in Table 10-1 this will further have influence on the performance.
240
7521BEPerfConsid.fm
Figure 10-1 FlashCopy source and target volume in the same storage pool
When frequent FC operations are executed and we do not want to have to much impact on the performance of the source FC volumes it is recommended to put target FC volumes on a different storage pool not sharing the same backend disks or even if possible to put then on the separate backend controller as shown in Figure 10-2.
Figure 10-2 Source and target FlashCopy volumes on different storage pools
When there is the need for heavy IO on the target FC volume, for example FC target of the database can be used for data mining, it is recommended to wait until FC copy is completed before target volume is being used.
241
7521BEPerfConsid.fm
In case that volumes participating in FC operations are big, the copy time required for a full copy is not acceptable. In this situation it is recommended to use incremental FC approach. In this setup initial copy will last longer, all subsequent copies will only copy changes, because of the FC change tracking on source and target volumes. This incremental copying will be performed much faster and it is usually in acceptable time frame so that there is no need to utilize target volumes during the copy operation. Example of this approach is shown in the Figure 10-3.
With this approach we will achieve minimal impact on the source FC volume.
Thin provisioning
Thin provisioning (TP) function will also affect the performance of the volume as it will generate additional IOs. TP is implemented using as B-Tree directory which is also stored on the storage pool, the same as the actual data is. The real capacity of the volume consists of the virtual capacity and the space used for directory as shown in Figure 10-4.
There are four possible IO scenarios for TP volumes: Write to an unallocated region
242
7521BEPerfConsid.fm
a. Directory lookup indicates the region is unallocated b. SVC allocates space and updates the directory c. Data and directory is written to disk Write to an allocated region a. Directory lookup indicates the region is already allocated b. Data is written to disk Read to an unallocated region (unusual) a. Directory lookup indicates the region is unallocated b. SVC returns a buffer of 0x00s Read to an allocated region a. Directory lookup indicates the region has been allocated b. Data is read from disk As we can see from the list above single host IO requests to the specified TP volume can result in multiple IOs on the backend side because of the related directory lookup. The following are key elements to consider when using TP volumes: 1. Use striping for all TP volumes if possible across many backed disks. If TP volumes are used to reduce the number of required disks, this can also result in performance penalty on those TP volumes. 2. Do not use TP volumes where high I/O performance is required 3. TP volumes require more I/O capacity because of the directory lookups and because of that for truly random workloads this can generate 2x more workload on the backend disks. The directory I/O requests are two way write-back cached, same as fastwrite cache, that means that some applications will perform better as the directory lookup will be served from the cache. 4. TP volumes require more CPU processing on the SVC nodes, so the performance per I/O groups will be lower. Rule of thumb is that I/O capacity of the I/O group can be only 50% when using only TP volumes. 5. Smaller grain size can have more influence to the performance, as it will require more directory I/O It is recommended that bigger grain size (ie. 256K) is used for the host I/O where bigger amounts of write data is expected
243
7521BEPerfConsid.fm
For certain workloads the combination of TP and FlashCopy (FC) function can have significant impact on the performance of target FC volumes. This is related to the fact, that FlashCopy starts to copy the volume from its end, and when target FC volume is thin provisioned this means that last block will be physically at the beginning of the volume allocation on the backend storage as shown in the Figure 10-6.
In case of sequential workload as shown in the Figure 10-6 the data will be on the physical level (backend storage) read/write from the end to the beginning. In this case underlying storage subsystem will not be able to recognize sequential operation. This will cause the performance degradation on the particular I/O operation.
244
7521BEPerfConsid.fm
245
7521BEPerfConsid.fm
It is also worthwhile to consider the effect of aggregate load across multiple storage pools. It is clear that striping workload across multiple arrays has a positive effect on performance when you are talking about dedicated resources, but the performance gains diminish as the aggregate load increases across all available arrays. For example, if you have a total of eight arrays and are striping across all eight arrays, your performance is much better than if you were striping across only four arrays. However, if the eight arrays are divided into two LUNs each and are also included in another storage pool (SP), the performance advantagedrops as the load of SP2 approaches that of SP1, which means that when workload is spread evenly across all SPs, there will be no difference in performance. More arrays in the storage pool have more of an effect with lower performing storage controllers, due to the cache and RAID calculation constrains as usually RAID is calculated in the main processor, not on the dedicated processors. So, for example, we require fewer arrays from a DS8000 than we do from for example a DS5000 to achieve the same performance objectives. This difference is primarily related to the internal capabilities of each storage subsystem and will vary based on the workload. Table 10-13 on page 246 shows the recommended number of arrays per storage pool that is appropriate for general cases. Again, when it comes to performance, there can always be exceptions.
Table 10-13 Recommended number of arrays per storage pool Controller type Arrays per storage pool 4 - 24 4 - 24 4 - 24 4 - 12 4 - 12
As seen in the Table 10-13 the recommended number of arrays per storage pool is smaller in high end storage subsystems. This is related to the fact that those subsystems can deliver higher performances per array, even if the number of disks in the array is the same. The performance difference is resulted from the better multilayer caching and to specialized processors for RAID calculations. It is important to understand the following: You must consider the number of MDisks per array along with the number of arrays per MDG to understand aggregate MDG loading effects. You can achieve availability improvements without compromising performance objectives. Prior to the version 6.2 of the SVC code, the SVC cluster would only use one path to the managed disk, all other paths were standby paths. When managed disks are recognized by the cluster, active paths will be assigned in the round robin fashion. If we want to utilize all 8 ports in one IO group, then we should have at least 8 managed disks from a particular backend storage subsystem. In the setup of one managed disk per array this would mean that there should be at least 8 arrays from each backend storage subsystem.
246
7521BEPerfConsid.fm
247
7521BEPerfConsid.fm
Q = ((6 ports *1000/port)/4 nodes)/150 MDisks) = 10 With the sample configuration, each MDisk has a queue depth of 10. SVC4.3.1 has introduced dynamic sharing of queue resources based on workload. MDisks with high workload can now borrow some unused queue allocation from less busy MDisks on the same storage system. While the values are calculated internally and this enhancement provides for better sharing, it is important to consider queue depth in deciding how many MDisks to create.
Host I/O
In the SVC versions prior to 6.x the maximum backend transfer size resulted from the host IO under normal I/O is 32 KB. This means that in case of host IO which is bigger than 32 KB, this will be broken into several IOs send to the backend storage as shown in Figure 10-7. For this example the transfer size of the IO is 256 KB from the host side.
In such case it could happen that IO utilization of the backend storage ports could be multiplied compared to the number of IOs coming from the host side. This is especially true for sequential workloads where IO block size tends to be bigger than in traditional random IO. To address this the backend block IO size for reads and writes was increased to 256 KB in SVC versions 6.x, as shown in Figure 10-8.
248
7521BEPerfConsid.fm
Internal cache track size is 32 KB, and, therefore, when the IO comes to SVC it will be split to the adequate number of the cache tracks. For the above example this would be 8 32 KB cache tracks. Although the backend IO block size can be up to 256 KB, the particular host IO can be smaller and as such read or write operations to the backend managed disks can range from 512 bytes to 256 KB. The same is true for the cache as the tracks will be populated to the size of IO. For example the 60 KB IO would fit in two tracks where first track will be fully populated with 32 KB and second one will only hold 28 KB. If the host IO request is bigger than 256 KB it will be split into 256 KB chunks where the last chunk can be partial depending on the size of IO from the host.
FlashCopy I/O
The transfer size for FlashCopy can be 64KB or 256 KB, because the grain size of FlashCopy is 64 KB or 256 KB and any size write that changes data within a 64 KB or 256 KB grain will result in a single 64KB or 256 KB read from the source and write to the target.
249
7521BEPerfConsid.fm
Coalescing writes
The SVC coalesces writes up to the 32 KB track size if writes reside in the same tracks prior to destage, for example, if 4 KB is written into a track, another 4 KB is written to another location in the same track.This track will move to the bottom of the least recently used (LRU) list in the cache upon the second write, and the track will now contain 8 KB of actual data. This system can continue until the track reaches the top of the LRU list and is then destaged; the data is written to the back-end disk and removed from the cache. Any contiguous data within the track will be coalesced for the destage. Sequential writes The SVC does not employ a caching algorithm for explicit sequential detect, which means coalescing of writes in SVC cache has a random component to it. For example, 4 KB writes to VDisks will translate to a mix of 4 KB, 8 KB, 16 KB, 24 KB, and 32 KB transfers to the MDisks with reducing probability as the transfer size grows. Although larger transfer sizes tend to be more efficient, this varying transfer size has no effect on the controllers ability to detect and coalesce sequential content to achieve full stride writes. Sequential reads The SVC uses prefetch logic for staging reads based on statistics maintained on 128 MB regions. If the sequential content is sufficiently high enough within a region, prefetch occurs with 32 KB reads.
2048 (2 TB) 4096 (4 TB) 8192 (8 TB) 16,384 (16 TB) 32,768 (32 TB) 65,536 (64 TB) 131,072 (128 TB)
2048 (2 TB) 4096 (4 TB) 8192 (8 TB) 16,384 (16 TB) 32,768 (32 TB) 65,536 (64 TB) 131,072 (128 TB)
250
7521BEPerfConsid.fm
8 PB 16 PB 32 PB
The size of SVC extent will also define how many extents will be used for a particular volume. The example of two different extent sizes which is shown in Figure 10-9, shows that with the larger extent size less extents are required.
The extent size and the number of managed disks in the storage pool define the extent distribution in case of stripped volumes. In the example shown Figure 10-10 we can see two different cases where in first case we have the ratio of volume size and extent size the same as the number of managed disks in the storage pool, and in the second case where this ratio is not equal to the number of managed disks.
251
7521BEPerfConsid.fm
For even storage pool utilization it is recommended to align the size of volumes and extents so that even extent distribution can be achieved. As the volumes are typically used from the beginning of the volume, this does not bring performance improvements. This is also only valid for non thin provisioned volumes. Tip: It is recommended to align extent size to the underlying backend storage, for example an internal array stride size if this is possible in relation to the whole cluster size.
The effect of the SVC cache partitioning is that no single SP occupies more than its upper limit of cache capacity with write data. Upper limits are the point at which the SVC cache starts to limit incoming I/O rates for volumes created from the SP. If a particular SP reaches the upper limit, it will experience the same result as a global cache resource that is full. That is, the host writes are serviced on a one-out one-in basis - as the cache destages writes to the back-end storage. However, only writes targeted at the full SP are limited, all I/O destined for other (non-limited) SPs continues normally. Read I/O requests for the limited SP also continue normally. However, because the SVC is destaging write data at a rate that is obviously greater than the controller can actually sustain (otherwise, the partition does not reach the upper limit), reads are serviced equally as slowly. The main thing to remember is that the partitioning is only limited on write I/Os. In general, a 70/30 or 50/50 ratio of read to write operations is observed. Of course, there are applications, or workloads, that perform 100% writes; however, write cache hits are much less of a benefit than read cache hits. A write always hits the cache. If modified data already resides in the
252
7521BEPerfConsid.fm
cache, it is overwritten, which might save a single destage operation. However, read cache hits provide a much more noticeable benefit, saving seek and latency time at the disk layer. In all benchmarking tests performed, even with single active SPs, good path SVC I/O group throughput remains the same as it was before the introduction of SVC cache partitioning. For in-depth information about SVC cache partitioning, we recommend the following IBM Redpaper publication: IBM SAN Volume Controller 4.2.1 Cache Partitioning, REDP-4426-00
253
7521BEPerfConsid.fm
The DS8700 and DS8800 models do not have 2TB limit. It is recommended to use single LUN to rank mapping as shown in Figure 10-12.
In this setup we will have as many extent pools as there are ranks and extent pools would be evenly divided between both internal servers (server0 and server1). With both approaches SVC will be used to distribute the workload across ranks evenly by striping the volumes across LUNs. One of the benefits of one rank to one extent pool is that physical LUN placement can be easily determined in case when this is required, for example in performance analysis.
254
7521BEPerfConsid.fm
The drawback of such setup is that when additional ranks are added and they are integrated into existing SVC storage pools, existing volumes has to be restriped either manually or with scripts.
With this design it is important that LUN size is defined in a way that each will have the same amount of extents on each rank (extent size is 1GB). In the example above this would mean that LUN would have size of N x 10GB. With this approach the utilization of the DS8000 on the rank level would be balanced. If additional rank is added to the configuration the existing DS8000 LUNs (SVC managed disks) can be rebalanced using DS8000 Easy Tier manual operation so that optimal resource utilization of DS8000 will be achieved. With this there is no need to restripe volumes on the SVC level.
Extent pools
The number of extent pools on the DS8000 depends on the rank setup. As described above minimum two extent pools are required to evenly utilize both servers inside DS8000. In all cases an even number of extent pools will provide the most even distribution of resources.
255
7521BEPerfConsid.fm
odd and even extent pools. Failing to do this can result in a considerable performance degradation due to uneven device adapter loading. The DS8000 assigns server (controller) affinity to ranks when they are added to an extent pool. Ranks that belong to an even-numbered extent pool have an affinity to server0, and ranks that belong to an odd-numbered extent pool have an affinity to server1. Figure 10-14 on page 256 shows an example of a configuration that will result in a 50% reduction in available bandwidth. Notice how arrays on each of the DA pairs are only being accessed by one of the adapters. In this case, all ranks on DA pair 0 have been added to even-numbered extent pools, which means that they all have an affinity to server0, and therefore, the adapter in server1 is sitting idle. Because this condition is true for all four DA pairs, only half of the adapters are actively performing work. This condition can also occur on a subset of the configured DA pairs.
Example 10-1 shows what this invalid configuration looks like from the CLI output of the lsarray and lsrank commands. The important thing to notice here is that arrays residing on the same DA pair contain the same group number (0 or 1), meaning that they have affinity to the same DS8000 server (server0 is represented by group0 and server1 is represented by group1). As an example of this situation, arrays A0 and A4 can be considered. They are both attached to DA pair 0, and in this example, both arrays are added to an even-numbered extent pool (P0 and P4). Doing so means that both ranks have affinity to server0 (represented by group0), leaving the DA in server1 idle.
Example 10-1 Command output - lsarray and lsrank dscli> lsarray -l Date/Time: Aug 8, 2008 8:54:58 AM CEST IBM DSCLI Version:5.2.410.299 DS: IBM.2107-75L2321 Array State Data RAID type arsite Rank DA Pair DDMcap(10^9B) diskclass =================================================================================== A0 Assign Normal 5 (6+P+S) S1 R0 0 146.0 ENT A1 Assign Normal 5 (6+P+S) S9 R1 1 146.0 ENT
256
7521BEPerfConsid.fm
A2 A3 A4 A5 A6 A7
5 5 5 5 5 5
R2 R3 R4 R5 R6 R7
2 3 0 1 2 3
dscli> lsrank -l Date/Time: Aug 8, 2008 8:52:33 AM CEST IBM DSCLI Version: 5.2.410.299 DS: IBM.2107-75L2321 ID Group State datastate Array RAIDtype extpoolID extpoolnam stgtype exts usedexts ====================================================================================== R0 0 Normal Normal A0 5 P0 extpool0 fb 779 779 R1 1 Normal Normal A1 5 P1 extpool1 fb 779 779 R2 0 Normal Normal A2 5 P2 extpool2 fb 779 779 R3 1 Normal Normal A3 5 P3 extpool3 fb 779 779 R4 0 Normal Normal A4 5 P4 extpool4 fb 779 779 R5 1 Normal Normal A5 5 P5 extpool5 fb 779 779 R6 0 Normal Normal A6 5 P6 extpool6 fb 779 779 R7 1 Normal Normal A7 5 P7 extpool7 fb 779 779
Figure 10-15 shows an example of a correct configuration that balances the workload across all four DA pairs.
Example 10-2 shows what this correct configuration looks like from the CLI output of the lsrank command. The configuration from the lsarray output remains unchanged. Notice that arrays residing on the same DA pair are now split between groups 0 and 1. Looking at arrays A0 and A4 once again now shows that they have different affinities (A0 to group0, A4 group1). To achieve this correct configuration, what has been changed compared to Example 10-1 on page 256 is that array A4 now belongs to an odd-numbered extent pool (P5).
Example 10-2 Command output dscli> lsrank -l Date/Time: Aug 9, 2008 2:23:18 AM CEST IBM DSCLI Version: 5.2.410.299 DS: IBM.2107-75L2321
257
7521BEPerfConsid.fm
ID Group State datastate Array RAIDtype extpoolID extpoolnam stgtype exts usedexts ====================================================================================== R0 0 Normal Normal A0 5 P0 extpool0 fb 779 779 R1 1 Normal Normal A1 5 P1 extpool1 fb 779 779 R2 0 Normal Normal A2 5 P2 extpool2 fb 779 779 R3 1 Normal Normal A3 5 P3 extpool3 fb 779 779 R4 1 Normal Normal A4 5 P5 extpool5 fb 779 779 R5 0 Normal Normal A5 5 P4 extpool4 fb 779 779 R6 1 Normal Normal A6 5 P7 extpool7 fb 779 779 R7 0 Normal Normal A7 5 P6 extpool6 fb 779 779
10.8.2 Cache
For the DS8000, you cannot tune the array and cache parameters. The arrays will be either 6+p or 7+p, depending on whether the array site contains a spare and whether the segment size (contiguous amount of data that is written to a single disk) is 256 KB for fixed block volumes. Caching for the DS8000 is done on a 64 KB track boundary.
The DS8000 populate Fibre Channel (FC) adapters across two to eight I/O enclosures, depending on configuration. Each I/O enclosure represents a separate hardware domain. Ensure that adapters configured to different SAN networks do not share the same I/O enclosure as part of our goal of keeping redundant SAN networks isolated from each other. Example of DS8800 connections with 16 IO ports on eight 8 port adapters is shown in Figure 10-16. In this case two ports per adapter are used.
258
7521BEPerfConsid.fm
Example of DS8800 connections with 4 IO ports on two 4 port adapters is shown in Figure 10-17. In this case two ports per adapter are used.
259
7521BEPerfConsid.fm
Best practices that we recommend: Configure a minimum of four ports per DS8000. Configure 16 ports per DS8000 when > 48 ranks are presented to the SVC cluster. Configure a maximum of two ports per four port DS8000 adapter and four ports per eight port DS8000 adapter. Configure adapters across redundant SAN networks from different I/O enclosures.
260
7521BEPerfConsid.fm
This factors define the performance and size attributes of the DS8000 LUNs which will act as managed disks for SVC SPs. As already discussed in previous sections SVC storage pool should have managed disks (MD) with the same characteristic for performance and capacity as this is required for even DS8000 utilization. Tip: It is recommended that main characteristics of the storage pool are described in its name. For example the pool on DS8800 with 146GB 15K FC disks in RAID5 could have the following name - DS8800_146G15KFCR5. The Figure 10-18 shows an example of DS8700 storage pool layout based on disk type and RAID level. In this case ranks with RAID5 6+P+S and 7+P are combined in the same storage pool and the same goes for RAID10 2+2+2P+2S and 3+3+2P which are also combined in the same storage pool. With this approach some parts of volumes or some volumes would be only striped over MDs (LUNs) which are on the arrays/ranks where there is no spare disk. As those MDs effectively have one spindle more, this can also compensate for the performance requirements as more extents will be places on them. Such approach simplifies management of the SPs as it allows for a smaller number of SPs to be used. There are four SP defined in this scenario: 145GB 15K R5 - DS8700_146G15KFCR5 300GB 10K R5 - DS8700_300G10KFCR5 450GB 15K R10 - DS8700_450G15KFCR10 450GB 15K R5 - DS8700_450G15KFCR5
Figure 10-18 DS8700 storage pools based on disk type and RAID level
If we want to achieve totally optimized configuration from the RAID perspective the configuration would include SPs based on exact number of disks included in the array/rank as shown in the Figure 10-19.
261
7521BEPerfConsid.fm
Figure 10-19 DS8700 storage pools with exact number of disks in the array/rank
With this setup seven SPs will be defined instead of four. The complexity of management would be increased as more pools would have to be managed. From the performance perspective backend would be completely balanced on the RAID level. Configurations with so many different disk types in one storage subsystem are not common and usually one DS8000 system would have a maximum of two types of disks and different types of disks would be installed in different systems. The example of such setup on DS8800 is shown in Figure 10-20.
Figure 10-20 DS8800 storage pool setup with two types of disks
Although it is possible to span the storage pool across multiple backend systems as shown in the Figure 10-21 it is recommended to keep storage pools bound inside single DS8000 because of the availability reasons.
262
7521BEPerfConsid.fm
Best practices that we recommend: Use the same type of arrays (disk and RAID type) in the storage pool Minimize number of storage pools. If single type or two types of disks are used two storage pools can be used per DS8000, one for RAID 6+P+S and one for RAID 7+P if RAID5 is used and the same for RAID 10 with 2+2+2P+2S and 3+3+2P. Spread storage pool across both internal servers (server0 and server1). This means using LUNs from extent pools which have affinity to server0 and the ones with affinity to server1 in the same storage pool. Where performance is not the main goal a single storage pool can be used with mixing LUNs from array with different number of disks (spindles). The example of DS8800 with two storage pools for 6+P+S RAID5 and 7+P arrays is shown in Figure 10-22.
263
7521BEPerfConsid.fm
264
7521BEPerfConsid.fm
Table 10-18 XIV with 2TB disks and 1669GB LUNs (Gen3) Number of XIV Modules Installed 6 9 Number of LUNs (MDisks) at 1669GB each 33 52 IBM XIV System TB used 55.1 86.8 IBM XIV System TB Capacity Available 55.7 88
265
7521BEPerfConsid.fm
IBM XIV System TB used 101.8 110.1 125.2 133.5 148.5 160.2
IBM XIV System TB Capacity Available 102.6 111.5 125.9 134.9 149.3 161.3
Table 10-19 XIV with 3TB disks and 2185GB LUNs (Gen3) Number of XIV Modules Installed 6 9 10 11 12 13 14 15 Number of LUNs (MDisks) at 2185GB each 38 60 70 77 86 93 103 111 IBM XIV System TB used 83 131.1 152.9 168.2 187.9 203.2 225.0 242.5 IBM XIV System TB Capacity Available 84.1 132.8 154.9 168.3 190.0 203.6 225.3 243.3
If XIV is initially not configured with the full capacity, the SVC rebalancing script can be used to optimize volume placement when additional capacity is added to the XIV.
10.9.2 IO ports
XIV support from 8 to 24 FC ports, depending on the number of modules installed. Each module has two dualport FC cards. It is recommended to use one port per card for SVC use. With this setup the number of available ports for SVC use will range from 4 to 12 ports as shown in the Table 10-20.
Table 10-20 XIV FC ports for SVC Number of XIV Modules Installed 6 9 10 XIV Modules with FC ports Total available FC ports Ports used per FC card Port available for the SVC
8 16 16
1 1 1
4 8 8
266
7521BEPerfConsid.fm
20 20 24 24 24
1 1 1 1 1
10 10 12 12 12
As we can see the SVC 16 port limit for storage subsystem is not reached. Ports available for the SVC use should be connected to dual fabrics to provide redundancy. Each module should be connected to separate fabrics. The example of best practices SAN connectivity is shown in Figure 10-23.
267
7521BEPerfConsid.fm
It is possible to use the cluster definition with each SVC node as a host, but then it is important that LUNs mapped has their LUN ID preserved when mapped to the SVC.
268
7521BEPerfConsid.fm
with bigger array sizes, for example 10+1, 11+1. Example of the V7000 configuration with optimal smaller arrays and non-optimal bigger arrays is shown in Figure 10-24.
As we can see in this example, one hotspare disk was used for enclosure. This is not a requirement, but it is a good practice as this gives symmetrical use of the enclosures. At minimum it is recommended to use one hotspare (HS) disk per SAS chain for each type of the disks in the V7000 and if more then two enclosures are present it is recommended to have at least two HS disks per SAS chain per disk type if those disks occupy more then two enclosures. The example of multiple disk types in the V7000 is shown in Figure 10-25.
When defining a volume on the V7000 level default values should be used. This default values will define 256KB strip size (the size of RAID chunk on the particular disk). This is inline with the SVC backend IO size which is in version above 6.1 256KB. For example using 256KB strip size would give 2MB stride size (the whole RAID chunk size) in 8+1 array.
269
7521BEPerfConsid.fm
As V7000 also supports big NL-SAS driver (2TB and 3TB). Using those drives in the RAID5 arrays could produce significant RAID rebuild times, even couple of hours. Based on this it is recommended to use RAID6 to avoid double failure during rebuild period. Example of such setup is shown in Figure 10-26.
Tip: Make sure that volumes defined on V7000 are evenly distributed across all nodes.
10.10.2 IO ports
Each V7000 has four FC ports for a host access. This ports will be used for SVC to access the volumes on V7000. Minimum configuration is to connect each V7000 canister node to two independent fabrics as shown in the Figure 10-27.
270
7521BEPerfConsid.fm
In this setup SVC would access V7000 with two node configuration over four ports. Such connectivity is sufficient for not fully loaded V7000 environments. In case that V7000 is hosting capacity which requires more connections than two per node, four connections per node should be used as shown in the Figure 10-28.
271
7521BEPerfConsid.fm
With two node V7000 setup this would give eight target connections from the SVC perspective which is well below 16 target ports which is current SVC limit for backend storage subsystem. Current limit in the V7000 configuration is four node cluster. With this configuration with four connections to the SAN, 16 target ports would be reached and as such this would be still supported configuration. Such example is shown in the Figure 10-29.
272
7521BEPerfConsid.fm
Important: It is very important that at minimum two ports per node are connected to the SAN with connections to two redundant fabrics.
The example above has hot spare disk in every enclosure which is not a requirement. To avoid two pools for the same disk type it is recommended to create array configuration based on the following rules: Number of disks in the array 6+1 7+1 8+1 Number of hotspare disks Minimum 2 Based on the array size the following setup for five enclosure V7000 the following symmetrical array configuration is possible: 6+1 - 17 arrays (119 disks) + 1 x hotspare disk 7+1 - 15 arrays (120 disks) + 0 x hotspare disk 8+1 - 13 arrays (117 disks) + 3 x hotspare disks
273
7521BEPerfConsid.fm
The 7+1 arrays does not provide any hotspare disks in the symmetrical array configuration ad shown in the Figure 10-31.
The 6+1 arrays provides single hotspare disk in the symmetrical array configuration as shown in the Figure 10-32 and this is not recommended value for the number of hotspare disk.
The 8+1 arrays provides three hotspare disks in the symmetrical array configuration as shown in the Figure 10-33. This is in recommended value range for the number of hotspare disks (two).
274
7521BEPerfConsid.fm
As you can see that the best configuration for a single storage pool for the same type of disks in the five enclosure V7000 is 8+1 array configuration. Note: Symmetrical array configuration for the same disk type provides the least possible complexity in the storage pool configuration.
275
7521BEPerfConsid.fm
A larger number of disks in an array increases the rebuild time for disk failures, which can have a negative effect on performance. Additionally, more disks in an array increase the probability of having a second drive fail within the same array prior to the rebuild completion of an initial drive failure, which is an inherent exposure to the RAID 5 architecture. Best practice: For the DS5000, we recommend array widths of 4+p and 8+p.
Segment size
With direct-attached hosts, considerations are often made to align device data partitions to physical drive boundaries within the storage controller. For the SVC, aligning device data partitions to physical drive boundaries within the storage controller is less critical based on the caching that the SVC provides and based on the fact that there is less variation in its I/O profile, which is used to access back-end disks. Because the maximum destage size for the SVC is 256KB, it is impossible to achieve full stride writes for random workloads. For the SVC, the only opportunity for full stride writes occurs with large sequential workloads, and in that case, the larger the segment size is, the better. Larger segment sizes can adversely affect random I/O, however. The SVC and controller cache do a good job of hiding the RAID 5 write penalty for random I/O, and therefore, larger segment sizes can be accommodated. The primary consideration for selecting segment size is to ensure that a single host I/O will fit within a single segment to prevent accessing multiple physical drives. Testing has shown that the best compromise for handling all workloads is to use a segment size of 256 KB. Best practice: We recommend a segment size of 256 KB as the best compromise for all workloads.
276
7521BEPerfConsid.fm
277
7521BEPerfConsid.fm
278
7521Easytier.fm
11
Chapter 11.
Easy Tier
In this chapter we describe the function provided by the EasyTier disk performance optimization feature of the SAN Volume Controller. We also explain how to activate the EasyTier process for both evaluation purposes and for automatic extent migration.
279
7521Easytier.fm
280
7521Easytier.fm
MDisks that are used in a single tier storage pool should have the same hardware characteristics, for example, the same RAID type, RAID array size, disk type, and disk revolutions per minute (RPMs) and controller performance characteristics.
281
7521Easytier.fm
Figure 11-2 shows a scenario in which a storage pool is populated with two different MDisk types: one belonging to an SSD array, and one belonging to an HDD array. Although this example shows RAID5 arrays, other RAID types can be used.
Adding SSD to the pool means additional space is also now available for new volumes, or volume expansion.
282
7521Easytier.fm
Data Migration Planner Using the extents previously identified, the Data Migration Planner step builds the extent migration plan for the storage pool. 4. Data Migrator The Data Migrator step involves the actual movement or migration of the volumes extents up to, or down from the high disk tier. The extent migration rate is capped so that a maximum of up to 30 MBps is migrated. This equates to around 3 TB a day that will be migrated between disk tiers. When relocating volume extents, Easy Tier performs these actions: It attempts to migrate the most active volume extents up to SSD first. To ensure there is a free extent available, a less frequently accessed extent may first need to be migrated back to HDD. A previous migration plan and any queued extents that are not yet relocated are abandoned.
283
7521Easytier.fm
Examples of the use of these parameters are shown in 11.5, Using Easy Tier with the SVC CLI on page 287 and 11.6, Using Easy Tier with the SVC GUI on page 293.
11.3.1 Prerequisites
There is no Easy Tier license required for the SVC; it comes as part of the V6.1 code. For Easy Tier to migrate extents you will need to have disk storage available that has different tiers, for example a mix of SSD and HDD.
7521Easytier.fm
Automatic data placement and extent I/O activity monitors are supported on each copy of a mirrored volume. Easy Tier works with each copy independently of the other copy. Note: Volume mirroring can have different workload characteristics on each copy of the data because reads are normally directed to the primary copy and writes occur to both. Thus, the number of extents that Easy Tier will migrate to SSD tier will probably be different for each copy. If possible, the SAN Volume Controller creates new volumes or volume expansions using extents from MDisks from the HDD tier. However, it will use extents from MDisks from the SSD tier if necessary. When a volume is migrated out of a storage pool that is managed with Easy Tier, then Easy Tier automatic data placement mode is no longer active on that volume. Automatic data placement is also turned off while a volume is being migrated even if it is between pools that both have Easy Tier automatic data placement enabled. Automatic data placement for the volume is re-enabled when the migration is complete.
11.3.3 Limitations
Limitations exist when using IBM System Storage Easy Tier on the SAN Volume Controller. Limitations when removing an MDisk by using the -force parameter When an MDisk is deleted from a storage pool with the -force parameter, extents in use are migrated to MDisks in the same tier as the MDisk being removed, if possible. If insufficient extents exist in that tier, then extents from the other tier are used. Limitations when migrating extents When Easy Tier automatic data placement is enabled for a volume, the svctask migrateexts command-line interface (CLI) command cannot be used on that volume. Limitations when migrating a volume to another storage pool When the SAN Volume Controller migrates a volume to a new storage pool, Easy Tier automatic data placement between the two tiers is temporarily suspended. After the volume is migrated to its new storage pool, Easy Tier automatic data placement between the generic SSD tier and the generic HDD tier resumes for the moved volume, if appropriate. When the SAN Volume Controller migrates a volume from one storage pool to another, it will attempt to migrate each extent to an extent in the new storage pool from the same tier as the original extent. In several cases, such as a target tier being unavailable, the other tier is used. For example, the generic SSD tier might be unavailable in the new storage pool. Limitations when migrating a volume to image mode Easy Tier automatic data placement does not support image mode. When a volume with Easy Tier automatic data placement mode active is migrated to image mode, Easy Tier automatic data placement mode is no longer active on that volume. Image mode and sequential volumes cannot be candidates for automatic data placement. Easy Tier does support evaluation mode for image mode volumes.
Best practices
Always set the Storage Pool -easytier value to on rather than to the default value auto. This makes it easier to turn on evaluation mode for existing single tier pools, and
285
7521Easytier.fm
no further changes will be needed when you move to multitier pools. See Easy Tier activation on page 284 for more information about the mix of pool and volume settings. Using Easy Tier can make it more appropriate to use smaller storage pool extent sizes.
Offloading statistics
To extract the summary performance data, use one of these methods.
286
7521Easytier.fm
The distribution of hot data and cold data for each volume is shown in the volume heat distribution report. The report displays the portion of the capacity of each volume on SSD (red), and HDD (blue), as shown in Figure 11-5.
287
7521Easytier.fm
Deleted lines: Many non-Easy Tier-related lines have been deleted in the command output or responses in examples shown in the following sections to enable you to focus on Easy Tier-related information only.
IBM_2145:ITSO-CLS5:admin>svcinfo lsmdiskgrp -filtervalue "name=Single*" id name status mdisk_count vdisk_count easy_tier easy_tier_status 27 Single_Tier_Storage_Pool online 3 1 off inactive IBM_2145:ITSO-CLS5:admin>svcinfo lsmdiskgrp Single_Tier_Storage_Pool id 27 name Single_Tier_Storage_Pool status online mdisk_count 3 vdisk_count 1 . easy_tier off easy_tier_status inactive . tier generic_ssd 288
SAN Volume Controller Best Practices and Performance Guidelines
7521Easytier.fm
tier_mdisk_count 0 . tier generic_hdd tier_mdisk_count 3 tier_capacity 200.25GB IBM_2145:ITSO-CLS5:admin>svctask chmdiskgrp -easytier on Single_Tier_Storage_Pool IBM_2145:ITSO-CLS5:admin>svcinfo lsmdiskgrp Single_Tier_Storage_Pool id 27 name Single_Tier_Storage_Pool status online mdisk_count 3 vdisk_count 1 . easy_tier on easy_tier_status active . tier generic_ssd tier_mdisk_count 0 . tier generic_hdd tier_mdisk_count 3 tier_capacity 200.25GB
------------ Now Reapeat for the Volume ------------IBM_2145:ITSO-CLS5:admin>svcinfo lsvdisk -filtervalue "mdisk_grp_name=Single*" id name status mdisk_grp_id mdisk_grp_name capacity type 27 ITSO_Volume_1 online 27 Single_Tier_Storage_Pool 10.00GB striped IBM_2145:ITSO-CLS5:admin>svcinfo lsvdisk ITSO_Volume_1 id 27 name ITSO_Volume_1 . easy_tier off easy_tier_status inactive . tier generic_ssd tier_capacity 0.00MB . tier generic_hdd tier_capacity 10.00GB
IBM_2145:ITSO-CLS5:admin>svctask chvdisk -easytier on ITSO_Volume_1 IBM_2145:ITSO-CLS5:admin>svcinfo lsvdisk ITSO_Volume_1 id 27 name ITSO_Volume_1 . easy_tier on easy_tier_status measured . tier generic_ssd tier_capacity 0.00MB
289
7521Easytier.fm
IBM_2145:ITSO-CLS5:admin>svcinfo lsmdisk mdisk_id mdisk_name status mdisk_grp_name capacity raid_level tier 299 SSD_Array_RAID5_1 online Multi_Tier_Storage_Pool 203.6GB raid5 generic_hdd 300 SSD_Array_RAID5_2 online Multi_Tier_Storage_Pool 203.6GB raid5 generic_hdd IBM_2145:ITSO-CLS5:admin>svcinfo lsmdisk SSD_Array_RAID5_2 mdisk_id 300 mdisk_name SSD_Array_RAID5_2 status online mdisk_grp_id 28 mdisk_grp_name Multi_Tier_Storage_Pool capacity 203.6GB
290
7521Easytier.fm
IBM_2145:ITSO-CLS5:admin>svcinfo lsmdiskgrp -filtervalue "name=Multi" *" id name mdisk_count vdisk_count capacity easy_tier easy_tier_status 28 Multi_Tier_Storage_Pool 5 1 606.00GB auto inactive IBM_2145:ITSO-CLS5:admin>svcinfo lsmdiskgrp Multi_Tier_Storage_Pool id 28 name Multi_Tier_Storage_Pool status online mdisk_count 5 vdisk_count 1 . easy_tier auto easy_tier_status inactive . tier generic_ssd tier_mdisk_count 0 . tier generic_hdd tier_mdisk_count 5
IBM_2145:ITSO-CLS5:admin>svcinfo lsmdisk SSD_Array_RAID5_1 id 299 name SSD_Array_RAID5_1 status online . tier generic_hdd IBM_2145:ITSO-CLS5:admin>svctask chmdisk -tier generic_ssd SSD_Array_RAID5_1 IBM_2145:ITSO-CLS5:admin>svctask chmdisk -tier generic_ssd SSD_Array_RAID5_2
IBM_2145:ITSO-CLS5:admin>svcinfo lsmdisk SSD_Array_RAID5_1 id 299 name SSD_Array_RAID5_1 status online . tier generic_ssd IBM_2145:ITSO-CLS5:admin>svcinfo lsmdiskgrp Multi_Tier_Storage_Pool id 28 name Multi_Tier_Storage_Pool status online mdisk_count 5 vdisk_count 1 . easy_tier auto
Chapter 11. Easy Tier
291
7521Easytier.fm
easy_tier_status active . tier generic_ssd tier_mdisk_count 2 tier_capacity 407.00GB . tier generic_hdd tier_mdisk_count 3
IBM_2145:ITSO-CLS5:admin>svcinfo lsvdisk ITSO_Volume_10 id 28 name ITSO_Volume_10 mdisk_grp_name Multi_Tier_Storage_Pool capacity 10.00GB type striped . easy_tier on easy_tier_status active . tier generic_ssd tier_capacity 0.00MB tier generic_hdd tier_capacity 10.00GB The volume in the example will be measured by Easy Tier and a hot extent migration will be performed from the hdd tier MDisk to the ssd tier MDisk. Also note that the volume hdd tier generic_hdd still holds the entire capacity of the volume because the generic_ssd capacity value is 0.00 MB. The allocated capacity on the generic_hdd tier will gradually change as Easy Tier optimizes the performance by moving extents into the generic_ssd tier.
7521Easytier.fm
tier generic_ssd tier_capacity 407.00GB tier_free_capacity 100.00GB tier generic_hdd tier_capacity 18.85TB tier_free_capacity 10.40TB As you can now see we have two different tiers available in our SVC cluster, generic_ssd and generic_hdd. At this time there are also extents being used on both the generic_ssd tier and the generic_hdd tier; see the free_capacity values. However, we do not know from this command if the SSD storage is being used by the Easy Tier process. To determine if Easy Tier is actively measuring or migrating extents within the cluster, you need to view the volume status as shown previously in Example 11-5 on page 292.
This is because, by default, all MDisks are initially discovered as Hard Disk Drives (HDDs); see the MDisk properties panel Figure 11-7 on page 294.
293
7521Easytier.fm
Therefore, for Easy Tier to take effect, you need to change the disk tier. Right-click the selected MDisk and choose Select Tier, as shown in Figure 11-8.
Now set the MDisk Tier to Solid-State Drive, as shown in Figure 11-9 on page 295.
294
7521Easytier.fm
The MDisk now has the correct tier and so the properties value is correct for a multidisk tier pool, as shown in Figure 11-10.
295
7521Easytier.fm
296
7521Applications.fm
12
Chapter 12.
Applications
In this chapter, we provide information about laying out storage for the best performance for general applications, IBM AIX Virtual I/O (VIO) servers, and IBM DB2 databases specifically. While most of the specific information is directed to hosts running the IBM AIX operating system, the information is also relevant to other host types.
297
7521Applications.fm
298
7521Applications.fm
Generally, a smaller number of physical drives are needed to reach adequate I/O performance than with transaction-based workloads. For instance, 20 - 28 physical drives are normally enough to reach maximum I/O throughput rates with the IBM System Storage DS4000 series of storage subsystems. In a throughput-based environment, read operations make use of the storage subsystem cache to stage greater chunks of data at a time to improve the overall performance. Throughput rates are heavily dependent on the storage subsystems internal bandwidth. Newer storage subsystems with broader bandwidths are able to reach higher numbers and bring higher rates to bear.
299
7521Applications.fm
7521Applications.fm
mixed and use SVC striped volumes over several MDisks in an storage pool in order to have the best performance and eliminate trouble spots or hot spots.
301
7521Applications.fm
302
7521Applications.fm
most SVC back-end storage configurations and removes a significant data layout burden for the storage administrators. Consider where the failure boundaries are in the back-end storage and take this into consideration when locating application data. A failure boundary is defined as what will be affected if we lose a RAID array (an SVC MDisk). All the volumes and servers striped on that MDisk will be affected together with all other volumes in that storage pool. Consider also that spreading out the I/Os evenly across the back-end storage has a performance benefit and a management benefit. We recommend that an entire set of back-end storage is managed together considering the failure boundary. If a company has several lines of business (LOBs), it might decide to manage the storage along each LOB so that each LOB has a unique set of back-end storage. So, for each set of back-end storage (a group of storage pools or perhaps better, just one storage pool), we create only striped volumes across all the back-end storage arrays, which is is beneficial, because the failure boundary is limited to a LOB, and performance and storage management is handled as a unit for the LOB independently. What we do not recommend is to create striped volumes that are striped across different sets of back-end storage, because using different sets of back-end storage makes the failure boundaries difficult to determine, unbalances the I/O, and might limit the performance of those striped volumes to the slowest back-end device. For SVC configurations where SVC image mode volumes must be used, we recommend that the back-end storage configuration for the database consists of one LUN (and therefore one image mode volume) per array, or an equal number of LUNs per array, so that the Database Administrator (DBA) can guarantee that the I/O workload is distributed evenly across the underlying physical disks of the arrays. Refer to Figure 12-2 on page 304. Use striped mode volumes for applications that do not already stripe their data across physical disks. Striped volumes are the all-purpose volumes for most applications. Use striped mode volumes if you need to manage a diversity of growing applications and balance the I/O performance based on probability. If you understand your application storage requirements, you might take an approach that explicitly balances the I/O rather than a probabilistic approach to balancing the I/O. However, explicitly balancing the I/O requires either testing or a good understanding of the application and the storage mapping and striping to know which approach works better. Examples of applications that stripe their data across the underlying disks are DB2, IBM GPFS, and Oracle ASM. These types of applications might require additional data layout considerations as described in , How the partitions are selected for use and laid out can vary from system to system. In all cases, you need to ensure that spreading the partitions is done in a manner to achieve maximum I/Os available to the logical drives in the group. Generally, large volumes are built across a number of different logical drives to bring more resources to bear. You must be careful when selecting logical drives when you do this in order to not use logical drives that will compete for resources and degrade performance. on page 306.
303
7521Applications.fm
304
7521Applications.fm
305
7521Applications.fm
How the partitions are selected for use and laid out can vary from system to system. In all cases, you need to ensure that spreading the partitions is done in a manner to achieve maximum I/Os available to the logical drives in the group. Generally, large volumes are built across a number of different logical drives to bring more resources to bear. You must be careful when selecting logical drives when you do this in order to not use logical drives that will compete for resources and degrade performance.
12.5 Data layout with the AIX virtual I/O (VIO) server
The purpose of this section is to describe strategies to get the best I/O performance by evenly balancing I/Os across physical disks when using the VIO Server.
12.5.1 Overview
In setting up storage at a VIO server (VIOS), a broad range of possibilities exists for creating volumes and serving them up to VIO clients (VIOCs). The obvious consideration is to create sufficient storage for each VIOC. Less obvious, but equally important, is getting the best use of the storage. Performance and availability are of paramount importance. There are typically internal Small Computer System Interface (SCSI) disks (typically used for the VIOS operating system) and SAN disks. Availability for disk is usually handled by RAID on the SAN or by SCSI RAID adapters on the VIOS. We will assume here that any internal SCSI disks are used for the VIOS operating system and possibly for the VIOCs operating systems. Furthermore, we will assume that the applications are configured so that the limited I/O will occur to the internal SCSI disks on the VIOS and to the VIOCs rootvgs. If you expect your rootvg will have a significant IOPS rate, you can configure it in the same fashion as we recommend for other application VGs later.
306
7521Applications.fm
VIOS restrictions
There are two types of volumes that you can create on a VIOS: physical volume (PV) VSCSI hdisks and logical volume (LV) VSCSI hdisks. PV VSCSI hdisks are entire LUNs from the VIOS point of view, and if you are concerned about failure of a VIOS and have configured redundant VIOS for that reason, you must use PV VSCSI hdisks. So, PV VSCSI hdisks are entire LUNs that are volumes from the VIOC point of view. An LV VSCSI hdisk cannot be served up from multiple VIOSs. LV VSCSI hdisks reside in LVM VGs on the VIOS and cannot span PVs in that VG, nor be striped LVs.
307
7521Applications.fm
have a group of applications where if one application fails, none of the applications can perform any productive work. When implementing the SVC, limiting the spread can be accounted for through the storage pool layout. Refer to Chapter 5, Storage pools and Managed Disks on page 71 for more information about failure boundaries in the back-end storage configuration.
308
7521p04.fm
Part 2
Part
309
7521p04.fm
310
7521Monitor_NS.fm
13
Chapter 13.
Monitoring
In this chapter, we discuss Tivoli Storage Productivity Center reports and how to use them to monitor your SVC and Storwize V7000 and identify performance problems. We then show several examples of misconfiguration and failures, and how they can be identified in Tivoli Storage Productivity Center using the Topology Viewer and performance reports. We also show how to collect and view performance data directly from the SVC. You must always use the latest version of Tivoli Storage Productivity Center that is supported by your SVC code; Tivoli Storage Productivity Center is often updated to support new SVC features. If you have an earlier version of Tivoli Storage Productivity Center installed, you might still be able to reproduce the reports described here, but certain data might not be available.
311
7521Monitor_NS.fm
Figure 13-1 Asset Report: Manage Disk Group (SVC Storage Pool) Detail
Managed Disks: Figure 13-2 shows the Managed Disk for the selected SVC. No additional information is provided here that you need for performance problem determination. The report was enhanced in 4.2.1 to also reflect if the MDisk is a Solid State Disk (SSD). SVC does not automatically detect SSD MDisks. To mark them as SDD candidates for EZ-Tier, the managed disk tier attribute must be manually changed from generic_hdd to generic_sdd.
312
7521Monitor_NS.fm
Figure 13-2 Tivoli Storage Productivity Center Asset Report: Managed Disk Detail
Virtual Disks: Figure 13-3 shows virtual disks for the selected SVC, or in this case a virtual disk or volume from a Storwize V7000. Note: Virtual disks for either the Storwize V7000 or SVC are identical within Tivoli Storage Productivity Center in this report. So only Storwize V7000 screens were selected as they review an SVC version 6.2 version impact with Tivoli Storage Productivity Center V4.2.1.
Figure 13-3 Tivoli Storage Productivity Center Asset Report: Virtual Disk Detail
The virtual disks are referred to as volumes in other performance reports. For the volumes, you see the managed disk (MDisk) on which the virtual disks are allocated, but you do not see the correct RAID level. From a SVC perspective, you often stripe the data across the MDisks within a storage pool so that Tivoli Storage Productivity Center displays RAID 0 as the RAID level.
313
7521Monitor_NS.fm
As with many other reports this one was also enhanced to report on EZ-Tier and Space Efficient usage. In this example screen shot you see that EZ-Tier is enabled for this volume, still in inactive status. In addition this report was also enhanced to show the amount of storage assigned to this volume from the different tiers (sdd and hdd). There is another report that can help you see the actual configuration of the volume. This includes the MDG or Storage Pool, Backend Controller, MDisks among other details; unfortunately, this information is not available in the asset reports on the MDisks. Volume to Backend Volume Assignment: Figure 13-4 shows the location of the Volume to Backend Volume Assignment report within the Navigation Tree.
Figure 13-5 shows the report. Notice that the virtual disks are referred to as volumes in the report.
314
7521Monitor_NS.fm
This report provides many details about the volume. While specifics of the RAID configuration of the actual MDisks are not presented, the report is quite useful since all aspects from the host perspective to the backend storage are placed together in one report. The following details are available and are quite useful: Storage Subsystem containing the Disk in View, for this report this is the SVC Storage Subsystem type, for this report this is the SVC User-Defined Volume Name Volume Name Volume Space, total usable capacity of the volume Tip: For space-efficient volumes, this value is the amount of storage space requested for these volumes, not the actual allocated amount. This can result in discrepancies in the overall storage space reported for a storage subsystem using space-efficient volumes. This also applies to other space calculations, such as the calculations for the Storage Subsystem's Consumable Volume Space and FlashCopy Target Volume Space. Storage Pool associated with this volume Disk, what MDisk is the volume placed upon. Note: For SVC or Storwize V7000 volumes spanning multiple MDisks, this report will have multiple entries for that volume to reflect the actual MDisks the volume is using. Disk Space, what is the total disk space available on the MDisk. Available Disk Space, what is the remaining space available on the MDisk. Backend Storage Subsystem, what is the name of Storage Subsystem this MDisk is from. Backend Storage Subsystem type, what type of storage subsystem is this. Backend Volume Name, what is the volume name for this MDisk as known by the backend storage subsystem. (Big Time Saver) Backend Volume Space Copy ID Copy Type, this will present the type of copy this volume is being used for, such as primary or copy for SVC versions 4.3 and newer. Primary is the source volume, and Copy is the target volume. Backend Volume Real Space, for full backend volumes this is the actual space. For Space Efficient backend volumes this is the real capacity being allocated. Easy Tier, indicated whether EZ-Tier is enabled or not on the volume. Easy Tier status, active or inactive. Tiers Tier Capacity
315
7521Monitor_NS.fm
316
7521Monitor_NS.fm
317
7521Monitor_NS.fm
There are situations where multiple MDisk groups are desirable: Workload isolation Short-stroking a production MDisk group Managing different workloads in different groups
Expand IBM Tivoli Storage Productivity Center Reporting System Reports Disk to view system reports that are relevant to SVC and Storwize V7000. I/O Group Performance and Managed Disk Group Performance are specific reports for SVC and Storwize V7000, while Module/Node Cache Performance is also available for IBM XIV. In Figure 13-7 those reports are highlighted:
318
7521Monitor_NS.fm
Figure 13-8 shows a sample structure to review basic SVC concepts about SVC structure and then proceed with performance analysis at the different component levels.
MDisk (2 TB)
MDisk (2 TB)
MDisk (2 TB)
MDisk (2 TB)
13.4.1 Top 10 for SVC and Storwize V7000#1: I/O Group Performance reports
Note: For SVCs with multiple I/O groups, a separate row is generated for every I/O group within each SVC.
319
7521Monitor_NS.fm
In our lab environment, data was collected for an SVC with a single I/O group. The scroll bar at the bottom of the table indicates that additional metrics can be viewed, as shown in Figure 13-9.
Important: The data displayed in a performance report is the last collected value at the time the report is generated. It is not an average of the last hours or days, but it simply shows the last data collected. Click the next to SVC io_grp0 entry to drill down and view the statistics by nodes within the selected I/O group. Notice that a new tab, Drill down from io_grp0, is created containing the report for nodes within the SVC. See Figure 13-10.
To view a historical chart of one or more specific metrics for the resources, click the icon. A list of metrics is displayed, as shown in Figure 13-11. You can select one or more metrics that use the same measurement unit. If you select metrics that use different measurement units, you will receive an error message.
320
7521Monitor_NS.fm
You can change the reporting time range and click the Generate Chart button to re-generate the graph, as shown in Figure 13-12 on page 322. A continual high Node CPU Utilization rate, indicates a busy I/O group; in our environment CPU utilization doesnt rise above 24%, that is a more than acceptable value.
321
7521Monitor_NS.fm
322
7521Monitor_NS.fm
Notice that the I/Os are only present on Node 2. So, in Figure 13-15 on page 324, you could see a configuration problem, where workload is not well balanced, at least during this time frame (this is the reason for the red traffic light shown in that figure).
Recommendations
To interpret your performance results, the first recommendation is to go always back to your baseline. For information on creating a baseline refer to SAN Storage Performance Management Using Tivoli Storage Productivity Center, SG24-7364. Moreover, some industry benchmarks for the SVC and Storwize V7000 are available. SVC 4.2, and the 8G4 node brought a dramatic increase in performance as demonstrated by the results in the Storage Performance Council (SPC) Benchmarks, SPC-1 and SPC-2. The benchmark number, 272,505.19 SPC-1 IOPS, is the industry-leading OLTP result and the PDF is available at the following URL: http://www.storageperformance.org/results/b00024_IBM-SVC4.2_SPC2_executive-summary .pdf An SPC Benchmark2 was also performed for Storwize V7000; the Executive Summary PDF is available at the following URL: http://www.storageperformance.org/benchmark_results_files/SPC-2/IBM_SPC-2/B0005 2_IBM_Storwize-V7000/b00052_IBM_Storwize-V7000_SPC2_executive-summary.pdf Figure 13-14 on page 324 shows numbers on max I/Os and MB/s per I/O group. Realize that SVC performance or your realized SVC obtained performance will be based upon multiple factors. Some of these are: The specific SVC nodes in your configuration The type of Managed Disks (volumes) in the Managed Disk Group (MDG) The application I/O workloads using the MDG The paths to the backend storage These are all factors that ultimately lead to the final performance realized. In reviewing the SPC benchmark (see Figure 13-14), depending upon the transfer block size used, the results for the I/O and Data Rate obtained are quite different. Looking at the two-node I/O group used, you might see 122,000 I/Os if all of the transfer blocks were 4K. In typical environments, they rarely are. So if you jump down to 64K, or bigger. with anything over about 32K, you might realize a result more typical of the 29,000 as seen by the SPC benchmark.
323
7521Monitor_NS.fm
Max I/Os and MB/s Per I/O Group 70/30 R/W Miss
2145-8G4 4K Transfer Size 122K 500MB/s 64K Transfer Size 29K 1.8GB/s 2145-8F4 4K Transfer Size 72K 300MB/s 64K Transfer Size 23K 1.4GB/s 2145-4F2 4K Transfer Size 38K 156MB/s 64K Transfer Size 11K 700MB/s 2145-8F2 4K Transfer Size 72K 300MB/s 64K Transfer Size 15K 1GB/s
Figure 13-14 SPC SVC benchmark Max I/Os and MB/s per I/O group
As mentioned before, in the I/O rate graph shown in Figure 13-15, you can see a configuration problem indicated by the red traffic light in the lower right corner.
324
7521Monitor_NS.fm
Response time
To view the read and write response time at Node level, click the Drill down from io_grp0 tab to return to the performance statistics for the nodes within the SVC. Click the icon and select the Backend Read Response Time and Backend Write Response Time metrics, as shown in Figure 13-16.
Click Ok to generate the report, as shown in Figure 13-17 on page 326. We see values that could be accepted in backend response time for both read and write operations, and these are consistent for both our I/O Groups.
Recommendations
For random read I/O, the backend rank (disk) read response times should seldom exceed 25 msec, unless the read hit ratio is near 99%. Backend Write Response Times will be higher because of RAID 5 (or RAID 10) algorithms, but should seldom exceed 80 msec. There will be some time intervals when response times exceed these guidelines. In case of poor response time, you should investigate using all available information from the SVC and the backend storage controller. Possible causes for a large change in response times from the backend storage might be visible using the storage controller management tool include: Physical array drive failure leading to an array rebuild. This drives additional backend storage subsystem internal read/write workload while the rebuild is in progress. If this is causing poor latency, it might be desirable to adjust the array rebuild priority to lessen the load. However, this must be balanced with the increased risk of a second drive failure during the rebuild, which would cause data loss in a RAID 5 array. Cache battery failure leading to cache being disabled by the controller. This can usually be resolved simply by replacing the failed battery.
Chapter 13. Monitoring
325
7521Monitor_NS.fm
For further details about Rules of Thumb and how to interpret these values, consult the SAN Storage Performance Management Using Tivoli Storage Productivity Center, SG24-7364-02: http://www.redbooks.ibm.com/redbooks/pdfs/sg247364.pdf
Data Rate
To look at the Read Data rate, click the Drill down from io_grp0 tab to return to the performance statistics for the nodes within the SVC. Click the icon and select the Read Data Rate metric. Press down Shift key and select Write Data Rate and Total Data Rate. Then click Ok to generate the chart, shown in Figure 13-18 on page 327.
326
7521Monitor_NS.fm
To interpret your performance results, the first recommendation is to go always back to your baseline. For information on creating a baseline refer to SAN Storage Performance Management Using Tivoli Storage Productivity Center, SG24-7364. Moreover, a benchmark is available. The throughput benchmark, 7,084.44 SPC-2 MBPS, is the industry-leading throughput benchmark, and the PDF is available here: http://www.storageperformance.org/results/b00024_IBM-SVC4.2_SPC2_executive-summary .pdf
13.4.2 Top 10 for SVC and Storwize V7000#2: Node Cache Performance reports
Efficient use of cache can help enhance virtual disk I/O response time. The Node Cache Performance report displays a list of cache related metrics such as Read and Write Cache Hits percentage and Read Ahead percentage of cache hits. The cache memory resource reports provide an understanding of the utilization of the SVC or Storwize V7000 cache. These reports provide you with an indication of whether the cache is able to service and buffer the current workload. Expand IBM Tivoli Storage Productivity Center Reporting System Reports Disk. Select Module/Node Cache performance report. Notice that this report is generated at SVC and Storwize V7000 node level (there is an entry that refers to IBM XIV storage device), as shown in Figure 13-19 on page 328.
327
7521Monitor_NS.fm
Figure 13-19 SVC and Storwize V7000 Node cache performance report
Important: The flat line for node1 does not mean that the read request for that node cannot be handled by the cache, it means that there is no traffic at all on that node, as is illustrated in Figure 13-21 on page 329 and Figure 13-22 on page 329, where Read Cache Hit Percentage and Read I/O Rates are compared in the same time interval.
328
7521Monitor_NS.fm
329
7521Monitor_NS.fm
This could be not a good configuration, since the two nodes are not balanced. In our lab environment volumes defined on Storwize V7000 were all defined with node2 as preferred node. After moving the preferred node for volume tpcblade3-7-ko from node2 to node1, we obtained the graphs showed in Figure 13-23 and Figure 13-24 on page 331 for Read Cache Hit percentage and Read I/O Rates:
330
7521Monitor_NS.fm
Recommendations
Read Hit percentages can vary from near 0% to near 100%. Anything below 50% is considered low, but many database applications show hit ratios below 30%. For very low hit ratios, you need many ranks providing good backend response time. It is difficult to predict whether more cache will improve the hit ratio for a particular application. Hit ratios are more dependent on the application design and amount of data than on the size of cache (especially for Open System workloads). But larger caches are always better than smaller ones. For high hit ratios, the backend ranks can be driven a little harder, to higher utilizations. If you need to analyze further cache performances and try to understand if it is enough for your workload, you can run multiple metrics charts. Select the metrics named percentage, because you can have multiple metrics with the same unit type, in one chart. In the Selection panel, move from Available Column to Included Column the percentage metrics you want include, then in the Selection button check only the Storwize V7000 entries. Figure Figure 13-25 on page 333 shown an example where several percentage metrics are chosen for Storwize V7000: The complete list of metrics is as follows: CPU utilization percentage: The average utilization of the node controllers in this I/O group during the sample interval. Dirty Write percentage of Cache Hits: The percentage of write cache hits which modified only data that was already marked dirty in the cache; re-written data. This is an obscure measurement of how effectively writes are coalesced before destaging.
331
7521Monitor_NS.fm
Read/Write/Total Cache Hits percentage (overall): The percentage of reads/writes/total during the sample interval that are found in cache. This is an important metric. The write cache hot percentage should be very nearly 100%. Readahead percentage of Cache Hits: An obscure measurement of cache hits involving data that has been prestaged for one reason or another. Write Cache Flush-through percentage: For SVC and Storwize V7000, the percentage of write operations that were processed in Flush-through write mode during the sample interval. Write Cache Overflow percentage: For SVC and Storwize V7000 the percentage of write operations that were delayed due to lack of write-cache space during the sample interval. Write Cache Write-through percentage: For SVC and Storwize V7000 the percentage of write operations that were processed in Write-through write mode during the sample interval. Write Cache Delay percentage: The percentage of all I/O operations that were delayed due to write-cache space constraints or other conditions during the sample interval. Only writes can be delayed, but the percentage is of all I/O. Small Transfers I/O percentage: Percentage of I/O operations over a specified interval. Applies to data transfer sizes that are <= 8 KB. Small Transfers Data percentage Percentage of data that was transferred over a specified interval. Applies to I/O operations with data transfer sizes that are <= 8 KB. Medium Transfers I/O percentage Percentage of I/O operations over a specified interval. Applies to data transfer sizes that are > 8 KB and <= 64 KB. Medium Transfers Data percentage Percentage of data that was transferred over a specified interval. Applies to I/O operations with data transfer sizes that are > 8 KB and <= 64 KB. Large Transfers I/O percentage Percentage of I/O operations over a specified interval. Applies to data transfer sizes that are > 64 KB and <= 512 KB. Large Transfers Data percentage Percentage of data that was transferred over a specified interval. Applies to I/O operations with data transfer sizes that are > 64 KB and <= 512 KB. Very Large Transfers I/O percentage Percentage of I/O operations over a specified interval. Applies to data transfer sizes that are > 512 KB. Very Large Transfers Data percentage Percentage of data that was transferred over a specified interval. Applies to I/O operations with data transfer sizes that are > 512 KB. Overall Host Attributed Response Time Percentage 332
SAN Volume Controller Best Practices and Performance Guidelines
7521Monitor_NS.fm
The percentage of the average response time, both read response time and write response time, that can be attributed to delays from host systems. This metric is provided to help diagnose slow hosts and poorly performing fabrics. The value is based on the time taken for hosts to respond to transfer-ready notifications from the SVC nodes (for read) and the time taken for hosts to send the write data after the node has responded to a transfer-ready notification (for write). The following metric is only applicable in a Global Mirror Session: Global Mirror Overlapping Write Percentage Average percentage of write operations issued by the Global Mirror primary site which were serialized overlapping writes for a component over a specified time interval. For SVC 4.3.1 and later, some overlapping writes are processed in parallel (are not serialized) and are excluded. For earlier SVC versions, all overlapping writes were serialized.
Then select all the metrics in the Select charting option pop-up window and click Ok to generate the chart. In our test, we notice in Figure 13-26 on page 334 that there is a drop in the Cache Hits percentage. Even if the drop is not so dramatic, this can be considered as an example for further investigation on arising problems. Changes in these performance metrics together with an increase in backend response time (see Figure 13-27 on page 334) shows that the storage controller is heavily burdened with I/O, and the Storwize V7000 cache could become full of outstanding write I/Os. Host I/O activity will be impacted with the backlog of data in the Storwize V7000 cache and with any other Storwize V7000 workload that is going on to the same MDisks.
333
7521Monitor_NS.fm
Note: For SVC only, If cache utilization is a problem, you can add additional cache to the cluster by adding an I/O Group and moving VDisks to the new I/O Group. Also, note that adding an I/O Group and moving VDisk from one I/O group to another is still a disruptive action. So proper planning to manage this disruption is required. You cant add an I/O Group to Storwize V7000.
334
7521Monitor_NS.fm
For further details about Rules of Thumb and how to interpret these values, consult the SAN Storage Performance Management Using Tivoli Storage Productivity Center, SG24-7364-02.
13.4.3 Top 10 for SVC #3: Managed Disk Group Performance reports
Managed Disk Group performance report provides disk performance information at the managed disk group level. It summarizes read and write transfer size, backend read, write, and total I/O rate. From this report you can easily drill up to see the statistics of virtual disks supported by a managed disk group or drill down to view the data for the individual MDisks that make up the managed disk group. Expand IBM Tivoli Storage Productivity Center Reporting System Reports Disk and select Managed Disk Group performance. A table is displayed listing all the known managed disk groups and their last collected statistics, based on the latest performance data collection. See Figure 13-28.
One of the Managed Disk Groups is CET_DS8K1901mdg. Click the drill down icon on the entry CET_DS8K1901mdg to drill down. A new tab is created, containing the Managed Disks in the Managed Disk Group. See Figure 13-29.
Figure 13-29 Drill down from Managed Disk Group Performance report
335
7521Monitor_NS.fm
Click the drill down icon on the entry mdisk61 to drill down. A new tab is created, containing the Volumes in the Managed Disk. See Figure 13-30.
I/O rate
We recommend that you analyze how the I/O workload is split between Managed Disk Groups, to determine if it is well balanced or not. Click Managed Disk Groups tab, select all Managed Disk Groups, click the icon, and select Total Backend I/O Rate, as shown in Figure 13-31.
Figure 13-31 Top 10 SVC - Managed Disk Group I/O rate selection
Click Ok to generate the next chart, as shown in Figure 13-32 on page 337. When reviewing this general chart, you must understand that it reflects all I/O to the backend storage from the MDisks included within this MDG. The key for this report is a general understanding of backend I/O rate usage, not whether there is balance outright. 336
7521Monitor_NS.fm
While the SVC and Storwize V7000 by default stripes writes and read I/Os across all MDisks, the striping is not through a RAID 0 type of stripe. Rather, as the VDisk is a concatenated volume, the striping injected by the SVC and Storwize V7000 is only in how we identify extents to be used when we create a VDisk. Until host I/O write actions fill up the first extent, the remaining extents in the block VDisk provided by SVC will not be used. It is very likely when you are looking at the MDG Backend I/O report, that you will not see a balance of write activity even for a single MDG. In the report shown in Figure 13-32, for the time frame specified, we see that at one point we have a maximum of nearly 8200 IOPS.
Figure 13-32 Top 10 SVC - Managed Disk Group I/O rate report
Response Time
Now you can get back to the list of MDisks, by moving to the Drill down from CET_DS8K1901mdg tab (see Figure 13-29 on page 335). Select all the Managed Disks entries, click the icon and select the Backend Read Response time metric, as shown in Figure 13-33 on page 338.
337
7521Monitor_NS.fm
Then click Ok to generate the chart, as shown in Figure 13-34 on page 339.
Recommendations
For random read I/O, the backend rank (disk) read response time should seldom exceed 25 msec, unless the read hit ratio is near 99%. Backend Write Response Time will be higher because of RAID5 (or RAID10) algorithms, but should seldom exceed 80 msec. There will be some time intervals when response times exceed these guidelines.
338
7521Monitor_NS.fm
339
7521Monitor_NS.fm
Click Ok to generate the report shown in Figure 13-36. Here the workload is not balanced on MDisks.
340
7521Monitor_NS.fm
13.4.4 Top 10 for SVC and Storwize V7000 #5-9: Top Volume Performance reports
Tivoli Storage Productivity Center provides five reports on top volume performance: Top Volume Cache performance: Prioritized by the Total Cache Hits percentage (overall) metric. Top Volume Data Rate performances: Prioritized by the Total Data Rate metric. Top Volume Disk performances: Prioritized by the Disk to cache Transfer rate metric. Top Volume I/O Rate performances: Prioritized by the Total I/O Rate (overall) metric. Top Volume Response performances: Prioritized by the Total Data Rate metric. Volumes referred in these reports correspond to the VDisks in SVC. Important: The last collected performance data on volumes are used for the reports. The report creates a ranked list of volumes based on the metric used to prioritize the performance data. You can customize these reports according to the needs of your environment. To limit these system reports to just SVC subsystems, you have to specify a filter, as shown in Figure 13-37. Click the Selection tab, then click Filter. Click Add to specify another condition to be met. This has to be done for all the five reports.
341
7521Monitor_NS.fm
Recommendations
Read Hit percentages can vary from near 0% to near 100%. Anything below 50% is considered low, but many database applications show hit ratios below 30%. For very low hit ratios, you need many ranks providing good backend response time. It is difficult to predict whether more cache will improve the hit ratio for a particular application. Hit ratios are more dependent on the application design and amount of data than on the size of cache (especially for Open System workloads). But larger caches are always better than smaller ones. For high hit ratios, the backend ranks can be driven a little harder, to higher utilizations.
342
7521Monitor_NS.fm
Click Generate Report on the Selection panel to regenerate the report, shown next in Figure 13-40. If this report is generated during the run time periods, the volumes would be have the highest total data rate and be listed on the report.
343
7521Monitor_NS.fm
Recommendations
The throughput for storage volumes can range from fairly small numbers (1 to 10 I/O per second) to very large values (more than 1000 I/O/second). This depends a lot on the nature of the application. When the I/O rates (throughput) approach 1000 IOPS per volume, it is because the volume is getting very good performance, usually from very good cache behavior. Otherwise, it is not possible to do so many IOPS to a volume.
344
7521Monitor_NS.fm
Recommendations
Typical response time ranges are only slightly more predictable. In the absence of additional information, we often assume (and our performance models assume) that 10 milliseconds is pretty high. But for a particular application, 10 msec might be too low or too high. Many OLTP (On-Line Transaction Processing) environments require response times closer to 5 msec, while batch applications with large sequential transfers might be fine with 20 msec response time. The appropriate value might also change between shifts or on the weekend. A response time of 5 msec might be required from 8 until 5, while 50 msec is perfectly acceptable near midnight. It is all customer and application dependent. The value of 10 msec is somewhat arbitrary, but related to the nominal service time of current generation disk products. In crude terms, the service time of a disk is composed of a seek, a latency, and a data transfer. Nominal seek times these days can range from 4 to 8 msec, though in practice, many workloads do better than nominal. It is not uncommon for applications to experience from 1/3 to 1/2 the nominal seek time. Latency is assumed to be 1/2 the rotation time for the disk, and transfer time for typical applications is less than a msec. So it is not unreasonable to expect 5-7 msec service time for a simple disk access. Under ordinary queueing assumptions, a disk operating at 50% utilization would have a wait time roughly equal to the service time. So 10-14 msec response time for a disk is not unusual, and represents a reasonable goal for many applications.
345
7521Monitor_NS.fm
For cached storage subsystems, we certainly expect to do as well or better than uncached disks, though that might be harder than you think. If there are a lot of cache hits, the subsystem response time might be well below 5 msec, but poor read hit ratios and busy disk arrays behind the cache will drive the average response time number up. A high cache hit ratio allows us to run the backend storage ranks at higher utilizations than we might otherwise be satisfied with. Rather than 50% utilization of disks, we might push the disks in the ranks to 70% utilization, which would produce high rank response times, which are averaged with the cache hits to produce acceptable average response times. Conversely, poor cache hit ratios require pretty good response times from the backend disk ranks in order to produce an acceptable overall average response time. To simplify, we can assume that (front end) response times probably need to be in the 5-15 msec range. The rank (backend) response times can usually operate in the 20-25 msec range unless the hit ratio is really poor. Backend write response times can be even higher, generally up to 80 msec. Important: All the above considerations are not valid for SSD disks, where seek time and latency are not applicable. We should expect for these disks much better performance and therefore very short response time (less than 4 ms).
Refer to 13.8, Case study: Top volumes response time and I/O rate performance report on page 368 to create a tailored report for your environment.
13.4.5 Top 10 for SVC and Storwize V7000 #10: Port Performance reports
The SVC and Storwize V7000 port performance reports help you understand the SVC and Storwize V7000 impact on the fabric and give you an indication of the traffic between: SVC (or Storwize V7000) and hosts that receive storage SVC (or Storwize V7000) and backend storage Nodes in the SVC (or Storwize V7000) cluster
346
7521Monitor_NS.fm
These reports can help you understand if the fabric might be a performance bottleneck and if upgrading the fabric can lead to performance improvement. Port performance report summarizes the various send, receive and total port I/O rates and data rates. Expand IBM Tivoli Storage Productivity Center My Reports System Reports Disk and click Port Performance. in order to display only SVC and Storwize V7000 ports, click Filter to produce a report for all the volumes belonging to SVC or Storwize V7000 subsystems, as shown in Figure 13-44:
A separate row is generated for each subsystems ports. The information displayed in each of the rows reflect data last collected for the port. Notice the Time column displayed the last collection time, which might be different for different subsystem ports. Not all the metrics in the Port Performance report are applicable for all ports. For example, the Port Send Utilization percentage, Port Receive Utilization Percentage, Overall Port Utilization percentage data are not available on SVC or Storwize V7000 ports. N/A is displayed in the place when data is not available, as shown in Figure 13-45. By clicking Total Port I/O Rate you get a prioritized list by I/O rate.
At this point you can verify if the Data Rates seen to the backend ports are beyond the normal expected for the speed of your fibre links, as shown in Figure 13-46 on page 348. This report is typically generated to support Problem Determination, Capacity Management, or SLA reviews. Based upon the 8 Gb per second fabric, these rates are well below the throughput capability of this fabric, and thus the fabric is not a bottleneck, here.
347
7521Monitor_NS.fm
Figure 13-46 SVC and Storwize V7000 Port I/O rate report
348
7521Monitor_NS.fm
Then generate another historical chart with the Port Send Data Rate and Port Receive Data Rate metric, as shown in Figure 13-47, which confirms the unbalanced workload for one port.
Recommendations
Based on the nominal speed of each of the FC ports, which could be 4 Gbit, 8 Gbit or more, we recommend not to exceed 50-60% of that value as Data Rate. For example, a 8 Gbit port can reach a maximum theoretical Data Rate of round 800 MB/sec. So, you should generate an alert when it is more than 400 MB/sec.
Figure 13-47 SVC and Storwize V7000 Port Data Rate report
To investigate further using the Port performance report, go back to the I/O group performances report. Expand IBM Tivoli Storage Productivity Center My Reports System Reports Disk. Click I/O group Performance and drill-down to Node level. In the example in Figure 13-48 we choose node 1 of the SVC subsystem:
349
7521Monitor_NS.fm
Then click the icon and select Port to Local Node Send Queue Time, Port to Local Node Receive Queue Time, Port to Local Node Receive Response Time and Port to Local Node Send Response Time, as shown in Figure 13-49:
Look at port rates between SVC nodes, hosts, and disk storage controllers. Figure 13-50 shows low queue and response times, indicating that the nodes do not have a problem communicating with each other.
350
7521Monitor_NS.fm
If this report shows high queue and response times, the write activity (because each node communicates to each other node over the fabric) is affected. Unusually high numbers in this report indicate: SVC (or Storwize V7000) node or port problem (unlikely) Fabric switch congestion (more likely) Faulty fabric ports or cables (most likely)
After you have the I/O rate review chart, you also need to generate a data rate chart for the same time frame. This will support a review of your HA ports for this application. Generate another historical chart with the Total Port Data Rate metric, as shown in Figure 13-52 on page 352 that confirms the unbalanced workload for one port shown in the foregoing report.
Recommendations
Chapter 13. Monitoring
351
7521Monitor_NS.fm
According to the nominal speed of each FC ports, which could be 4 Gbit, 8 Gbit or more, we recommend not to exceed 50-60% of that value as Data Rate. For example, a 8 Gbit port can reach a maximum theoretical Data Rate of around 800 MB/sec. So, you should generate an alert when it is more than 400 MB/sec.
352
7521Monitor_NS.fm
Tip: Rather than using a specific report to monitor Switch Port Errors, we recommend that you use the Constraint Violation report. By setting an Alert for the number of errors at the switch port level, the Constraint Violation report becomes a direct tool to monitor the errors in your fabric. For more information on Constraint Violation reports refer to SAN Storage Performance Management Using Tivoli Storage Productivity Center, SG24-7364.
353
7521Monitor_NS.fm
Click Ok to generate the chart shown next in Figure 13-55 on page 355. Port Data Rates do not reach a warning level, in this case, knowing that FC Port speed is 8 Gbits/sec.
Recommendations
Use this report to monitor if some Switch Ports are overloaded or not. According to FC Port nominal speed (2 Gbit, 4 Gbit or more) as shown in Table 13-1, you have to establish the maximum workload a switch port can reach. We recommend to not exceed 50-70%.
Table 13-1 Switch Port data rates FC Port speed Gbits/sec 1 Gbits/sec 2 Gbits/sec 4 Gbits/sec 8 Gbits/sec 10 Gbit/sec FC Port speed MBytes/sec 100 MB/sec 200 MB/sec 400 MB/sec 800 MB/sec 1000 MB/sec Recommended Port Data Rate threshold 50 MB/sec 100 MB/sec 200 MB/sec 400 MB/sec 500 MB/sec
354
7521Monitor_NS.fm
355
7521Monitor_NS.fm
Click Generate Report to get the output shown in Figure 13-57. Scrolling to the right of the table more information is available, such as the volume names, volume capacity, allocated and unallocated volume spaces are listed.
356
7521Monitor_NS.fm
Data on the report can be exported by selecting File Export Data to a comma delimited file, comma delimited file with headers or formatted report file and HTML file. You can start from this volumes list to analyze performance data and workload I/O rate. Tivoli Storage Productivity Center provides a report that shows volume to backend volume assignments. To display the report, expand Disk Manager Reporting Storage Subsystem Volume to Backend Volume Assignment By Volume. Click Filter to limit the list of the volumes to the ones belonging to server tpcblade3-7, as shown in Figure 13-58.
Click Generate Report to get the list shown in Figure 13-59 on page 358.
357
7521Monitor_NS.fm
Scroll to the right to see the SVC managed disks and backend volumes on DS8000, as shown in Figure 13-60: Note: The highlighted lines with N/A values are related to Backend Storage subsystem not defined in our Tivoli Storage Productivity Center environment. To obtain the information on Backend Storage Subsystem, it has to be added in Tivoli Storage Productivity Center environment, together with the corresponding probe job (see the first line in the report in Figure 13-60, where the backend storage subsystem is part of our Tivoli Storage Productivity Center environment and therefore the volume is correctly showed in all its details).
With this information and the list of volumes mapped to this computer, you can start to run a Performance Report to understand where the problem for this server could be.
358
7521Monitor_NS.fm
Recommendations
Looking at disk performance problems, you need to check the overall response time as well as its overall I/O rate. If they are both high, there might be a problem. If the overall response time is high and the I/O rate is trivial, the impact of the high overall response time might be inconsequential. Expand Disk Manager Reporting Storage Subsystem Performance by Volume. Then click Filter to produce a report for all the volumes belonging to Storwize V7000 subsystems, as shown in Figure 13-61.
Click the volume you need to investigate, click the icon and select Total I/O Rate (overall). Then click Ok to produce the graph, as shown in Figure 13-62 on page 360.
359
7521Monitor_NS.fm
The chart in Figure 13-63 shows that I/O rate had been around 900 operations per second and suddenly declined to around 400 operations per second. Then, it goes back to 900 operations per second. In this case study we limited the days to the time frame reported by the customer when the problem was noticed.
360
7521Monitor_NS.fm
Select again the Volumes tab, click the volume you need to investigate, click the icon and scroll down to select Overall Response Time. Then click Ok to produce the chart, as shown in Figure 13-64.
The chart in Figure 13-65 indicates the increase in response time from a few milliseconds to around 30 milliseconds. This information, combined with the high I/O rate, indicates there is a significant problem and further investigation is appropriate.
361
7521Monitor_NS.fm
The next step is to look at the performance of MDisks in the MDisk group. To identify to which MDisk the VDisk tpcblade3-7-ko2 belongs, go back to Volumes tab and click the drill up icon, as shown in Figure 13-66.
Figure 13-67 shows the Managed Disks where tpcblade3-7-ko2 extents reside:
Select all the MDisks. Click the icon and select Overall Backend Response Time. Click Ok as shown in Figure 13-68 on page 363.
362
7521Monitor_NS.fm
Keep the charts generated relevant to this scenario, using the charting time range. You can see from the chart in Figure 13-69 that something happened around May, 26 at 6:00 pm that probably caused the backend response time for all MDisks to dramatically increase.
If you take a look at the chart for the Total Backend I/O Rate for these two MDisks during the same time period, you will see that their I/O rates all remained in a similar overlapping pattern, even after the introduction of the problem. This is as expected and would be because
363
7521Monitor_NS.fm
tpcblade3-7-ko2 is evenly striped across the two MDisks. The I/O rate for these MDisks is only as high as the slowest MDisk, as shown in Figure 13-70.
At this point, we have identified that the response time for all MDisks dramatically increased and that the response time. Next step is to generate a report to show the volumes which have an overall I/O rate equal to or greater than 1000 Ops/ms and then generate a chart to show which of those volumes I/O rates changed around 5:30 pm on August 20. Expand Disk Manager Reporting Storage Subsystem Performance by Volume. Click Display historic performance data using absolute time and limit the time period to 1 hour before and1 hour after the event reported in Figure 13-69 on page 363. Click Filter to limit to Storwize V7000 Subsystem and Add a second filter to select the Total I/O Rate (overall) greater than 1000 (it means high I/O rate). Click Ok, as shown in Figure 13-71 on page 365.
364
7521Monitor_NS.fm
The report in Figure 13-72, shows all the performance records of the volumes filtered above. In the Volume column there are only three volumes that meet these criteria: tpcblade3-7-ko2, tpcblade3-7-ko3 and tpcblade3ko4. There are multiple rows for each as there is a row for each performance data record. Look for what volumes I/O rate changed around 6:00 pm on May 26. You can click the Time column to sort.
Now we have to compare the Total I/O rate (overall) metric for the above volumes and the volume subject of the case study, tpcblade3-7-k02. To do so remove the filtering condition on the Total I/O Rate defined in Figure 13-71 and generate the report again. Then select one row for each of these volumes and select Total I/O Rate (overall). Then click Ok to generate the chart, as shown in Figure 13-73 on page 366.
365
7521Monitor_NS.fm
For Limit days From, insert the time frame we are investigating. Results: Figure 13-74 on page 367 shows the root cause. Volume tpcblade3-7-ko2 (the blue line in the screen capture) started around 5:00pm and has a Total I/O rate around 1000 IOPS. When the new workloads (generated by tpcblade3-7-ko3 and tpcblade3-ko4)started together, the Total I/O rate for volume tpcblade3-7-ko2 fell from around1000 IOPS to less than 500 I/Os, and then grew up again to about 1000 I/Os when one of the two loads decreased. The hardware has physical limitations on the number of IOPS that it can handle and this was reached at 6:00 pm.
366
7521Monitor_NS.fm
To confirm this behavior, you can generate a chart by selecting Response time. The chart shown in Figure 13-75 confirms that as soon as the new workload started, response time for tpcblade3-7-ko2 gets worse.
The easy solution is to split this workload, moving one VDisk to another Managed Disk Group.
367
7521Monitor_NS.fm
13.8 Case study: Top volumes response time and I/O rate performance report
The default Top Volumes Response Performance Report can be useful identifying problem performance areas. A long response time is not necessarily indicative of a problem. It is possible to have volumes with long response time with very low (trivial) I/O rates. These situations might pose a performance problem. In this section we tailor Top Volumes Response Performance Report to identify volumes with both long response times and high I/O rates. The report can be tailored for your environment; it is also possible to update your Filters to exclude volumes or subsystems you no longer want in this report. Expand Disk Manager Reporting Storage Subsystem Performance by Volume as shown in Figure 13-76 and keep only desired metrics as Included Columns, moving all the others to Available Columns. You can save this report to be referenced in the future from IBM Tivoli Storage Productivity Center My Reports your user Reports.
You have to specify the filters to limit the report, as shown in Figure 13-77 on page 369. Click Filter and then Add the conditions. In our example we are limiting the report to Subsystems SVC* and DS8* and to the volumes that have an I/O Rate greater than 100 Ops/sec and a Response Time greater than 5 msec.
368
7521Monitor_NS.fm
Prior to generating the report, you should specify the date and time of the period for which you want to make the inquiry. Important: Specifying large intervals might require intensive processing and a long time to complete. As shown in Figure 13-78, click Generate Report.
Figure 13-79 on page 370 shows the resulting Volume list. Sorting by response time or by I/O Rate columns (by clicking the column header), you can easily identify which entries have both interesting total I/O Rates and Overall Response Times.
369
7521Monitor_NS.fm
Recommendations
We suggest that in a production environment, you might want to initially specify a Total I/O Rate overall somewhere between 1 and 100 Ops/sec and Overall Response Time (msec) that is greater than or equal to 15 msec, and adjust those numbers to suit your needs as you gain more experience.
13.9 Case study: SVC and Storwize V7000 performance constraint alerts
Along with reporting on SVC and Storwize V7000 performance, Tivoli Storage Productivity Center can generate alerts when performance has not met, or has exceeded a defined threshold. Like most Tivoli Storage Productivity Center tasks, the alerting can report to: SNMP:
370
7521Monitor_NS.fm
Enables you to send an SNMP trap to an upstream systems management application. The SNMP trap can then be used with other events occurring within the environment to help determine the root cause of an SNMP trap. In this case was generated by the SVC. For example, if the SVC or Storwize V7000 reported to Tivoli Storage Productivity Center that a fibre port went offline, it might in fact be because a switch has failed. This port failed trap, together with the switch offline trapped could be analyzed by a systems management tool to be a switch problem, not an SVC (or Storwize V7000) problem, so that the switch technicians called. Tivoli Omnibus Event: Select to send a Tivoli Omnibus event. Login Notification: Select to send the alert to a Tivoli Storage Productivity Center user. The user receives the alert upon logging in to Tivoli Storage Productivity Center. In the Login ID field, type the user ID. UNIX or Windows NT system event logger Script: The script option enables you to run a predefined set of commands that can help address this event, for example, simply open a trouble ticket in your helpdesk ticket system. Email: Tivoli Storage Productivity Center will send an e-mail to each person listed. Tip: Remember for Tivoli Storage Productivity Center to be able to email addresses, an e-mail relay must be identified in the Administrative Services Configuration Alert Disposition and then Email settings. These are some useful alert events that you should set: CPU utilization threshold: The CPU utilization report will alert you when your SVC or Storwize V7000 nodes become too busy. If this alert is being generated too often, it might be time to upgrade your cluster with additional resources. Development recommends this setting to be at 75% as warning or 90% as critical. These are the defaults that come with Tivoli Storage Productivity Center 4.2.1. So to enable this function, just create an alert selecting the CPU Utilization. Then define the alert actions to be performed. Next, using the Storage Subsystem tab, select the SVC or Storwize V7000 cluster to have this alert set for. Overall port response time threshold: The port response times alert can let you know when the SAN fabric is becoming a bottleneck. If the response times are consistently bad, you should perform additional analysis of your SAN fabric. Overall backend response time threshold: An increase in backend response time might indicate that you are overloading your backend storage: Because backend response times can very depending on what I/O workloads are in place. Prior to setting this value, capture 1 to 4 weeks of data to baseline your environment. Then set the response time values.
371
7521Monitor_NS.fm
Because you can select the storage subsystem for this alert. You are able to set different alerts based upon the baselines you have captured. Our recommendation is to start with your mission critical Tier 1 storage subsystems. To create an alert, as shown in Figure 13-80, expand Disk Manager Alerting Storage Subsystem Alerts and right click to Create a Storage Subsystems Alert. On the right you get a pull-down menu where you can choose which alert you would like to set.
Tip: The best place for you to verify which thresholds are currently enabled, and at what values, is at the beginning of a Performance Collection job. Expand Tivoli Storage Productivity Center Job Management and select in the Schedule table the latest performance collection job running or that has run for your subsystem. In the Job for Selected Schedule part of the panel (lower part), expand the corresponding job and select the instance, as shown in Figure 13-81 on page 373.
372
7521Monitor_NS.fm
Figure 13-81 Job management panel - SVC performance job log selection
By clicking to the View Log File(s) button you access to the corresponding log file, where you can see the threshold defined, as shown in Figure 13-82 on page 373. Tip: to go at the begin of the log file, click on the Top button
373
7521Monitor_NS.fm
Expand IBM Tivoli Storage Productivity Center Alerting Alert Log Storage Subsystem to list all the alerts occurred. Look for your SVC subsystem, as shown in Figure 13-83 on page 374.
By clicking the icon next to the alert you would like to enquiry, you get detailed information as shown in Figure 13-84.
For more information on defining alerts refer to SAN Storage Performance Management Using Tivoli Storage Productivity Center, SG24-7364.
374
7521Monitor_NS.fm
After generating this report on the next page, you will use the Topology Viewer to identify what device is being impacted and identify a possible solution. Figure 13-86 shows the result we are getting in our lab.
Figure 13-86 Ports exceeding filters set for switch performance report
Click the icon and select Port Send Data Rate, Port Receive Data Rate and Total Port Data Rate, holding Ctrl key. Click Ok to generate the chart shown in Figure 13-87 on page 376. Tip: This chart gives you an indication as to how persistent this high utilization for this port is. This is an important consideration in order to establish the importance and the impact of this bottleneck.
important: To get al the values in the selected interval, you have to remove the filters defined in Figure 13-85 The chart shows a consistent throughput higher than 300 MB/sec in the selected time period. You can change the dates, extending the Limit days.
375
7521Monitor_NS.fm
To identify what device is connected to port 7 on this switch, expand IBM Tivoli Storage Productivity Center Topology Switches. Right click, select Expand all Groups and look for your switch, as shown in Figure 13-88 on page 377.
376
7521Monitor_NS.fm
Tip: To navigate in the Topology Viewer, press and hold the Alt key and press and hold the left mouse button to anchor your cursor. With these keys all held down, you can use the mouse to drag the screen to quickly move to the information you need. Find and click port 7. The line shows that it is connected to computer tpcblade3-7, as shown in Figure 13-89 on page 378. Note that in the tabular view on the bottom, you could see Port details. If you scroll right you can check Port speed, too.
377
7521Monitor_NS.fm
Double-click this computer to highlight it. Click Datapath Explorer (see DataPath Explorer shortcut highlighted in the minimap on Figure 13-89) to get a view of the paths between servers and storage subsystems or between storage subsystems (for example you can get SVC to backend storage, or server to storage subsystem). The view consist of three panels (host information, fabric information and subsystem information) that show the path through a fabric or set of fabrics for the endpoint devices, as shown in Figure 13-90 on page 379. Tip: A possible scenario utilizing Data Path Explorer is an application on a host that is running slow. The system administrator wants to determine the health status for all associated I/O path components for this application. Are all components along that path healthy? Are there any component level performance problems that might be causing the slow application response? Looking at the data paths for computer tpcblade3-7, this indicates that it has a single port HBA connection to the SAN. A possible solution to improve the SAN performance for computer tpcblade3-7 is to upgrade it to a dual port HBA.
378
7521Monitor_NS.fm
379
7521Monitor_NS.fm
13.11 Case study: Using Topology Viewer to verify SVC and Fabric configuration
After Tivoli Storage Productivity Center has probed the SAN environment, it takes the information from all the SAN components (switches, storage controllers, and hosts) and automatically builds a graphical display of the SAN environment. This graphical display is available via the Topology Viewer option on the Tivoli Storage Productivity Center Navigation Tree. The information on the Topology Viewer panel is current as of the successful resolution of the last problem. By default, Tivoli Storage Productivity Center will probe the environment daily; however, you can execute an unplanned or immediate probe at any time. Tip: If you are analyzing the environment for problem determination, we recommend that you execute an ad hoc probe to ensure that you have the latest up-to-date information on the SAN environment. Make sure that the probe completes successfully.
Figure 13-91 on page 381 shows the SVC ports connected and the switch ports.
380
7521Monitor_NS.fm
Important: Figure 13-91 shows an incorrect configuration for the SVC connections, as it was implemented for lab purposes only. In real environments it is important that each SVC (or Storwize V7000) node port is connected to two separate fabrics. If any SVC (or Storwize V7000) node port is not connected, each node in the cluster displays an error on LCD display. Tivoli Storage Productivity Center also shows the health of the cluster as a warning in Topology Viewer, as shown in Figure 13-91. It is also important that: You have at least one port from each node in each Fabric; You have an equal number of ports in each Fabric from each node; that is, do not have three ports in Fabric 1 and only one port in Fabric 2 for an SVC (or Storwize V7000) node.
381
7521Monitor_NS.fm
Note: In our example, the connected SVC ports are both on-line. When an SVC port is not healthy, a black line drawn is shown between the switch and the SVC node. Since Tivoli Storage Productivity Center knew to where the unhealthy ports were connected on a previous probe (and, thus, they were previously shown with a green line), the probe discovered that these ports were no longer connected, which resulted in the green line becoming a black line. If these ports had never been connected to the switch, no lines will show for them.
382
7521Monitor_NS.fm
383
7521Monitor_NS.fm
connectivity to its logical unit number (LUN) rad (ID:009f). This is shown in Figure 13-93 on page 385. What is not shown in Figure 13-93 is that you can hover the MDisk, LUN, and switch ports and get both health and performance information about these components. This enables you to verify the status of each component to see how well it is performing.
384
7521Monitor_NS.fm
385
7521Monitor_NS.fm
The Data Path Viewer in Tivoli Storage Productivity Center can also be used to check and confirm path connectivity between a disk that an operating system sees and the VDisk that the Storwize V7000 provides. Figure 13-95 on page 387 shows the path information relating to host tpcblade3-11 and its VDisks.
386
7521Monitor_NS.fm
Figure 13-95 does not show us that you can hover over each component to also get health and performance information, which might also be useful when you perform problem determination and analysis.
387
7521Monitor_NS.fm
The performance monitor panel shown in Figure 13-97 on page 389 presents the graphs in four quadrants: The top left hand quadrant is CPU utilization in percentage. The top right hand quadrant is volume throughput in MBps as well as current volume latency and current IOPS. The bottom left hand quadrant is Interface throughput (FC, SAS and iSCSI). The bottom right hand quadrant is MDisk throughput in MBps as well as current MDisk latency and current IOPS.
388
7521Monitor_NS.fm
Each graph represents five minutes of collected statistics and provides a means of assessing the overall performance of your system. For example, CPU utilization shows the current percentage of CPU usage as well as specific data points on the graph, showing peaks in utilization. With this real-time performance monitor, you can quickly view bandwidth of volumes, interfaces, and MDisks. Each of these graphs displays the current bandwidth in megabytes per second, as well as a view of bandwidth over time. Each data point can be accessed to determine its individual bandwidth utilization and to evaluate whether a specific data point might represent performance impacts. For example, you can monitor the interfaces, such as Fibre Channel or SAS interfaces, to determine if the host data-transfer rate is different from the expected rate. The volumes and mdisks graphs also show the IOPS and latency values. On the pop-up menu you can switch from system statistic to statistics by node selecting a specific node to get its real-time performance graphs. Figure 13-98 shows the CPU usage, volume, interface, and MDisk bandwidth for a specific node.
389
7521Monitor_NS.fm
With this panel you can easily find an unbalanced usage of your system nodes. You can also run the real-time performance monitoring while you are performing other GUI operations selecting the Run in Background option.
390
7521Monitor_NS.fm
391
7521Monitor_NS.fm
good source to create an additional spreadsheet tab in order to relate for instance vdisks with their I/O group and preferred node. Figure 13-99 shows a spreadsheet chart generated from the <system_name>__vdisk.csv file filtered for I/O group 2. The vdisks for this I/O group were selected using a secondary spreadsheet tab populated with the vdisk section of the config backup html file.
Figure 13-99 total ops per vdisk for I/O group 2. Vdisk37 is by far the most busiest volume
By default svcreport.pl script generates gif charts and csv files with one hour data. While the csv files aggregate a large amount of data, the gif charts are presented by vdisk, mdisk and node as described in Table 13-3. To generate a 24h chart you need to specify --for 1440 option. The -for option specifies the time range by minute you want to generate SVC/Storwize V7000 performance report files (CSV and GIF) and the default value is 60 minutes.
Table 13-3 Spreadsheets and gif chart types produced by svcreport spreadsheets (csv) cache_node cache_vdisk cpu drive mdisk node vdisk charts per vdisk cache.hits cache.stage cache.throughput cache.usage vdisk.response.tx vdisk.response.wr vdisk.throughput vdisk.transaction charts per mdisk mdisk.response.worst.resp mdisk.response mdisk.throughput mdisk.transaction charts per node cache.usage.node cpu.usage.node
Figure 13-100 is an example of a chart automatically generated by the svcperf.pl script for the vdisk37. We have chosen to present this chart related to vdisk37 as long as Figure 13-99 shows that vdisk is the one that reaches the highest iops values.
392
7521Monitor_NS.fm
svcmon is not intended to replace TPC, however it helps a lot when TPC is not available allowing an easy interpretation of the SVC performance xml data. This set of Perl scripts is designed and programmed by Yoshimichi Kosuge personally. It is not an IBM product and it is provided without any warranty. Hence you can use svcmon but on your own risk.
393
7521Monitor_NS.fm
Figure 13-101 Read and Write throughput for vdisk37 in bytes per second
394
7521Maintaining.fm
14
Chapter 14.
Maintenance
Among the many benefits that SVC can provide is greatly simplifying the storage management tasks System Administrators need to perform. However, as the IT environment grows and gets renewed, so need the Storage Infrastructure. In this chapter, we shall discuss some of the Best Practices in day-to-day activities of Storage Administration using SVC that can help you keeping your Storage Infrastructure with the levels of availability, reliability, and resiliency demanded by todays applications and yet keep up with their storage growth needs. You will find in this chapter tips and recommendations that might have already been made in this and other Redbooks, in some cases with more details. Do not hesitate to refer back to them. The idea here was to put in one place the most important topics you should consider in SVC administration so you use this chapter as a checklist. You will also find some practical examples of the procedures described here in Chapter 16, SVC scenarios on page 453. Note: The practices described here were proven to be effective in many SVC installations worldwide for organizations in several different areas with one thing in common: a need to easily, effectively and reliably manage their SAN disk storage environment. Nevertheless, whenever you have a choice between two possible implementations or configurations, if you look deep enough, one will always have both advantages and disadvantages over the other. We expect that you do not take these practices as absolute truth, but rather use them as a guide. The choice of which approach to use is ultimately yours.
395
7521Maintaining.fm
396
7521Maintaining.fm
Typical SAN and SVC component names have a limit in the number and type of characters you can use - SVC names for instance are limited to 15 characters. This is typically what makes creating a name convention a bit tricky. Most (if not all) names in SAN storage and SVC can be modified online, so you dont need to worry about planning outages in order to implement your new naming convention. Server names are the exception, and we discuss that later in this chapter. Keep in mind that the examples below are just suggestions that proved to be effective in most cases, but might not be fully adequate to your particular environment or needs. It is your choice the naming convention to use, but once you choose it you should implement it in the whole environment.
Storage Controllers
SVC names the storage controllers simply controllerX, with X being just a sequential, decimal number. If you have multiple controllers attached to your SVC you should change that so the name includes, for instance, the vendor name, the model, or simply its serial number. As such if you receive an error message pointing to controllerX you dont need to log into SVC to know what storage controller you need to check.
397
7521Maintaining.fm
Hosts
However cool it was in the past to name servers after cartoon or movies characters , today we are faced with large networks, the Internet and Cloud Computing. A good server naming convention allows you to, even in a very large network, quickly spot a server and tell: where it is (so you know how to access it), whats its kind (so you can tell the vendor and support group in charge), what it does (to engage the proper application support and notify its owner), its importance (so you know the severity of a problem should it occur). Changing a servers name might have implications with applications configuration and require a server reboot, so you might want to prepare a detailed plan if you decide to rename several servers in your network. Server name convention example: LLAATRFFNN LL - Location: might designate city, data center, building floor or room, etc. AA - Major application: examples are billing, ERP, Data Warehouse T - type: Unix (which), Windows, Vmware R - role: Production, Test, Q&A, Development FF - function: DB server, application server, web server, file server NN - numeric
SVC02_IO2_A: SVC cluster SVC02, ports group A for iogrp 2 (aliases SVC02_N3P1, SVC02_N3P3, SVC02_N4P1, and SVC02_N4P4) D8KXYZ1_I0301: DS8000 serial number 75VXYZ1, port I0301(WWPN) TL01_TD06: Tape library 01, tape drive 06 (WWPN) If your SAN by any chance does not support aliases, like in heterogeneous fabrics with switches in some interop modes, use WWPNs in your zones all across, just dont forget to update every zone that uses a given WWPN if you come to change it. A SAN zone name should reflect the devices in the SAN it includes, normally in a one-to-one relationship, like:
398
7521Maintaining.fm
servername_svcclustername (from a server to the SVC) svcclustername_storagename (from the SVC cluster to its back end storage) svccluster1_svccluster2 (for remote copy services)
Figure 14-2 on page 400 depicts one of SANHealth Options screen where you can choose the format of your SAN diagram that better suits your needs. Depending on the topology and size of your SAN fabrics, you might want to try and play with the options in the Diagram Format or Report Format tabs.
399
7521Maintaining.fm
SANHealth supports switches from manufactures other than Brocade, like McData and Cisco. Both the data collection tool download and the processing of files are free, and you can download Microsoft Visio and Excel viewers for free from Microsoft web site. There is also an additional free tool available for download called SAN Health Professional, that enables you to audit the reports in detail, utilizing advanced search functionality and inventory tracking. It is possible to configure the SAN Health data collection tool as a Windows scheduled task. Note: Whatever method you choose, we recommend that you generate a fresh report at least once a month, and keep previous versions so you can track the evolution of your SAN.
TPC Reporting
If you have TPC running in your environment, you can use it to generate reports on your SAN. Details on how to configure and schedule TPC reports can be found it the TPC documentation, just make sure the reports you generate include all the information you need, and schedule them with a periodicity that allow you to back track the changes you do.
14.1.3 SVC
For SVC, you should periodically collect at least the output of the following commands, and import them into a spreadsheet, preferably each command output into a separate sheet: svcinfo lsfabric svcinfo lsvdisk svcinfo lshost
400
7521Maintaining.fm
svcinfo lshostvdiskmap X, with X ranging to all defined host numbers in your SVC cluster Of course, you might want to store the output of additional commands, for instance if you have SVC Copy Services configured, or dedicated MDGs to specific applications or servers. One way to automate this task is creating a batch file (Windows) or shell script (Unix or Linux) that runs these commands, store their output in temporary files, then uses spreadsheet macros to import these temporary files into your SVC documentation spreadsheet. With MS Windows, use Puttys PLINK utility to create a batch session that run these commands and store their output. With Unix or Linux, you may use the standard SSH utility. Create a SVC user with Monitor privilege just to run these batches - dont grant it Administrator privilege. Create and configure a SSH key specifically for it. Use the -delim option of these commands to make their output delimited by a character other than Tab, like comma or colon. Using comma even allows you to initially import the temporary files into your spreadsheet in CSV format. To make your spreadsheet macros simpler, you might want to pre-process the temporary output files and remove any garbage or undesired lines or columns. With Unix or Linux you can use text edition commands like grep, sed and awk. There is freeware software for Windows with the same commands, or you can use any batch text edition utility you like. Remember that the objective is to fully automate this procedure so you can schedule it to run automatically from time to time, and the resulting spreadsheet should be easy to consult and contain only the relevant information you use more frequently. We shall discuss the automated collection and store of configuration and support data (which is typically much more extensive and difficult to use) later in this Chapter.
14.1.4 Storage
We recommend that you fully allocate all the space available in whatever Storage Controllers you use in its backend to the SVC itself, so that you can perform all your Disk Storage Management tasks using just SVC. If thats the case, you only need to generate (by hand) a documentation of your backend Storage Controllers once after you configured it, with the proper updates whenever these controllers receive hardware or code upgrades. As such, there really isnt much point automating this backend storage controller documentation. However, if youre using split controllers, you might want to reconsider, since the portion of your storage controllers being used outside SVC might get its configuration changed frequently. In this case, consult your backend storage controller documentation on how to grab and store the documentation you might need.
401
7521Maintaining.fm
Physical Location Datacenter complete street address and phone number Equipment physical location - room number, floor, tile location, rack number Vendors security access information or procedure, if applicable On Site person contact name and phone or page number Support Contract Information Vendor contact phone numbers and Web Site - keep them both Customers contact name and phone or page number userid to support web site, if applicable (DO NOT store the password in the spreadsheet unless the spreadsheet itself is password-protected) Support contract number and expiration date As such, everything you need to fill in a web support request form, or to inform a Vendors call center support representative is already there. Typically, you will be first asked for a brief description of the problem, and later on for a detailed description and support data collection.
7521Maintaining.fm
to submit, schedule and approve a non-emergency Change ticket. One exception to this rule is if you need to interrupt additional servers or applications in order to replace the part, so you need to schedule the activity and coordinate support groups. Use your good judgment and avoid unnecessary exposure and delays. 5. Keep handy the procedures to generate reports of the latest Incidents and implemented Changes in your SAN Storage environment. Typically theres no need to periodically generate these reports, because your organization probably already have a Problem and Change Management group doing that for trend analysis purposes.
Again, you can create procedures that automatically create and store this data on scheduled dates, delete old ones or even transfer them to tape.
403
7521Maintaining.fm
404
7521Maintaining.fm
405
7521Maintaining.fm
If youre running SVC release 5.1 or earlier, youll need to check the SVC Console version. You can see this is the SVC Console Welcome screen, in the upper right corner, or in Windows Control Panel - Add or Remove Software screen. As for the SVC Target code level, we recommend that you set it to the latest Generally Available (GA) release, unless you have some specific reason not to. Examples of such reasons are: a known problem with the particular version of some application or other component of your SAN Storage environment; the latest SVC GA release is not yet cross-certified as compatible with another key component of your SAN storage environment; internal policies in your organization, like using the latest minus 1 release, or asking for some seasoning in the field before implementation. As such, youll need to check the compatibility of your target SVC code level with all components of your SAN storage environment (SAN switches, storage controllers, servers HBAs) and its attached servers (operating systems and eventually applications). Typically, applications only certify the Operating System they run under, and leave to the O.S. provider the task to certify its compatibility with attached components (like SAN storage). Some applications, however, might make special use of special hardware features or raw devices and certify the attached SAN storage as well - if this is your case, consult your applications compatibility matrix to certify your SVC target code level is compatible. Review the SAN Volume Controller and SVC Console GUI Compatibility web page, and the SAN Volume Controller Concurrent Compatibility and Code Cross-Reference Web page: http://www-1.ibm.com/support/docview.wss?rs=591&uid=ssg1S1002888 406
SAN Volume Controller Best Practices and Performance Guidelines
7521Maintaining.fm
http://www-1.ibm.com/support/docview.wss?rs=591&uid=ssg1S1001707
Figure 14-4 SVC Upgrade Test Utility installation using the GUI
While you can use either the GUI or the CLI to upload and install the SVC Upgrade Test Utility, you can only use the CLI to run it. Example 14-1shows an example.
Example 14-1
IBM_2145:svccf8:admin>svcupgradetest -v 6.2.0.2 -d svcupgradetest version 6.6 Please wait while the tool tests for issues that may prevent a software upgrade from completing successfully. The test may take several minutes to complete.
407
7521Maintaining.fm
Checking 32 mdisks: Results of running svcupgradetest: ================================== The tool has found 0 errors and 0 warnings The test has not found any problems with the cluster. Please proceed with the software upgrade. IBM_2145:svccf8:admin>
408
7521Maintaining.fm
IBMs Support page on SVC Flashes and Alerts (Troubleshooting): http://www-947.ibm.com/support/entry/portal/Troubleshooting/Hardware/System_ Storage/Storage_software/Storage_virtualization/SAN_Volume_Controller_(2145) Fix every single problem or suspect you find with the disk paths failover capability. Since a typical SVC environment has from many dozens to a few hundreds of servers attached to it, a spreadsheet might help you with the Attached Hosts Preparation tracking process. If you have in your environment some kind of hosts virtualization, like VMware ESX, AIX LPARs and VIOs, or Solaris containers, you need to verify the redundancy and failover capability in these virtualization layers as well.
Upgrade sequence
The ultimate guide to tell you the order your SVC SAN storage environment components should be upgraded is the SVC Supported Hardware List. Below the link to version 6.2: https://www-304.ibm.com/support/docview.wss?uid=ssg1S1003797 By cross-checking which version of SVC is compatible with, say, which versions of your SAN directors, one should be able to tell which should be upgraded first. By checking your individual components upgrade path, one can tell if that particular component will require a multi-step upgrade. Typically, if youre not going to make any major version or multi step upgrade in any components, the order that showed to be less prone to eventual problems is: 1. 2. 3. 4. SAN switches or directors Storage Controllers Servers HBAs microcodes and muti-path software SVC Cluster
409
7521Maintaining.fm
Note: UNDER NO CIRCUMSTANCES upgrade two components of your SVC SAN storage environment simultaneously, like the SVC and one storage controller, even if you intend to do it with your system offline. Doing so might lead to very unpredictable results, and an unexpected problem will be much more difficult to debug.
SVC Console
SVC 6.1 no longer requires a separate hardware with the specific function of its Console. The SVC Console software was incorporated in the nodes, so in order to access the SVC Management GUI you simply use the cluster IP address. If you purchased your SVC with a console or SSPC server, and you no longer have any SVC clusters running SVC release 5.1 or earlier, you can remove the SVC Console software from this server. In fact, SVC Console versions 6.1 and 6.2 are utilities that remove the previous SVC Console GUI software and create desktop shortcuts to the new console GUI. Check the following URL for details and download: https://www-304.ibm.com/support/docview.wss?uid=ssg1S4000918
capacity 136.7GB
410
7521Maintaining.fm
1 mdisk1 online managed 2 0000000000000000 controller3 5000a72030032820000000000000000000000000000000000000000000000000 IBM_2145:svccf8:admin> IBM_2145:svccf8:admin>svcinfo lscontroller id controller_name ctrl_s/n product_id_high 0 controller0 1 controller1 75L3001FFFF 2 controller2 75L3331FFFF 3 controller3 IBM_2145:svccf8:admin> IBM_2145:svccf8:admin>svcupgradetest -v 6.2.0.2 -d svcupgradetest version 6.6 Please wait while the tool tests for issues that may prevent a software upgrade from completing successfully. The test may take several minutes to complete. Checking 34 mdisks: ******************** Error found ******************** The requested upgrade from 5.1.0.10 to 6.2.0.2 cannot be completed as there are internal SSDs are in use. Please refer to the following flash: http://www.ibm.com/support/docview.wss?rs=591&uid=ssg1S1003707
MDG3SVCCF8SSD
136.7GB
Internal
Results of running svcupgradetest: ================================== The tool has found errors which will prevent a software upgrade from completing successfully. For each error above, follow the instructions given. The tool has found 1 errors and 0 warnings IBM_2145:svccf8:admin>
After you upgrade your SVC cluster from release 5.1 to 6.2, your internal SSD drives no longer appear as MDisks from storage controllers that are in fact the SVC nodes, but rather as drives that you need to configure into arrays that can be used in storage pools (formerly MDisk groups). Example 14-3 shows this change.
Example 14-3
### Previous configuration in SVC version 5.1: IBM_2145:svccf8:admin>svcinfo lscontroller id controller_name ctrl_s/n 0 controller0 1 controller1 75L3001FFFF 2 controller2 75L3331FFFF 3 controller3 IBM_2145:svccf8:admin> ### After upgrade SVC to version 6.2: IBM_2145:svccf8:admin>lscontroller id controller_name ctrl_s/n vendor_id product_id_low product_id_high 1 DS8K75L3001 75L3001FFFF IBM 2107900 2 DS8K75L3331 75L3331FFFF IBM 2107900 IBM_2145:svccf8:admin> IBM_2145:svccf8:admin>lsdrive id status error_sequence_number use tech_type capacity mdisk_id mdisk_name member_id enclosure_id slot_id node_id node_name 0 online unused sas_ssd 136.2GB 0 2 node2 1 online unused sas_ssd 136.2GB 0 1 node1 IBM_2145:svccf8:admin>
You will need to decide what RAID level you will configure in the new arrays with SSD drives, depending on purpose you want to give them and the level of redundancy necessary in order to protect your data in case of hardware failure. Table 14-2 provides the considering factors in each case. Again, we recommend that you use your internal SSD drives for Easy Tier - this is how you will, in most cases, get better overall performance gain.
411
7521Maintaining.fm
Table 14-2 RAID levels for internal SSDs RAID level (GUI Preset) RAID-0 (Striped) RAID-1 (Easy Tier) What you will need 1-4 drives, all in a single node. 2 drives, one in each node of the IO Group. When to use it? When VDisk Mirror is on external MDisks. When using Easy Tier and/or both mirrors on SSDs For best performance A pool should only contain arrays from a single IO Group. An Easy Tier pool should only contain arrays from a single IO Group. The external MDisks in this pool should only be used by the same IO Group. A pool should only contain arrays from a single IO Group. Recommended over VDisk Mirroring.
RAID-10 (Mirrored)
412
7521Maintaining.fm
413
7521Maintaining.fm
state active IBM_2145:svccf8:admin> 3. If necessary, cross-reference with you SAN switches information. In Brocade switches use nodefind <WWPN>. blg32sw1_B64:admin> nodefind 10:00:00:00:C9:25:F5:B0 Local: Type Pid COS PortName NodeName SCR N 401000; 2,3;10:00:00:00:C9:25:F5:B0;20:00:00:00:C9:25:F5:B0; 3 Fabric Port Name: 20:10:00:05:1e:04:16:a9 Permanent Port Name: 10:00:00:00:C9:25:F5:B0 Device type: Physical Unknown(initiator/target) Port Index: 16 Share Area: No Device Shared in Other AD: No Redirect: No Partial: No Aliases: nybixtdb02_fcs0 b32sw1_B64:admin> Best Practices require that Storage Allocation requests submitted by the Server Support or Application Support teams to the Storage Administration team always include the servers HBA WWPNs the new LUNs or volumes are supposed to be mapped to. For instance, a server might use separate HBAs for disk and tape access, or distribute its mapped LUNs across different HBAs for performance, and one cannot assume that any given new volume is supposed to be mapped to every WWPN that server has logged in the SAN. If your organization uses a Change Management tracking tool, perform all your SAN storage allocations under approved Change tickets with the servers WWPNs listed in the Description and Implementation sessions.
root@nybixtdb03::/> pcmpath query device Total Dual Active and Active/Asymmetric Devices : 1 DEV#: 4 DEVICE NAME: hdisk4 TYPE: 2145 ALGORITHM: Load Balance SERIAL: 60050768018205E12000000000000000 ========================================================================== Path# Adapter/Path Name State Mode Select Errors 0* fscsi0/path0 OPEN NORMAL 7 0 1 fscsi0/path1 OPEN NORMAL 5597 0 2* fscsi2/path2 OPEN NORMAL 8 0 3 fscsi2/path3 OPEN NORMAL 5890 0 414
SAN Volume Controller Best Practices and Performance Guidelines
7521Maintaining.fm
If your organization uses a Change Management tracking tool, include LUNid information in every Change ticket performing SAN storage allocation or reclaim.
415
7521Maintaining.fm
416
7521Maintaining.fm
4. Try, if possible, not to set a server to use volumes from I/O groups using very different node types - not as a permanent situation anyway. If you do, as this servers storage capacity grows you might experience a performance difference between volumes from different I/O groups, thus making very tricky to spot and solve an eventual performance problem .
417
7521Maintaining.fm
14.7 Wrap up
There are, of course, many more practices that can be applied to the SAN storage environment management and would benefit its administrators and users. You can see these practices we just reviewed and some more been applied in Chapter 16, SVC scenarios on page 453.
418
7521Troubleshooting.fm
15
Chapter 15.
419
7521Troubleshooting.fm
420
7521Troubleshooting.fm
problem occurs in one of the SVC nodes. The fast node reset function means that SVC software problems can be recovered without the host experiencing an I/O error and without requiring the multipathing driver to fail over to an alternative path. The fast node reset is done automatically by the SVC node. This node will inform the other members of the cluster that it is resetting. Other than SVC node hardware and software problems, failures in the SAN zoning configuration are a problem. A misconfiguration in the SAN zoning configuration might lead to the SVC cluster not working, because the SVC cluster nodes communicate with each other by using the Fibre Channel SAN fabrics. You must check the following areas from the SVC perspective: The attached hosts Refer to 15.1.1, Host problems on page 420. The SAN Refer to 15.1.3, SAN problems on page 422. The attached storage subsystem Refer to 15.1.4, Storage subsystem problems on page 422. There are several SVC command line interface (CLI) commands with which you can check the current status of the SVC and the attached storage subsystems. Before starting the complete data collection or starting the problem isolation on the SAN or subsystem level, we recommend that you use the following commands first and check the status from the SVC perspective. You can use these helpful CLI commands to check the environment from the SVC perspective: svcinfo lscontroller controllerid Check that multiple worldwide port names (WWPNs) that match the back-end storage subsystem controller ports are available. Check that the path_counts are evenly distributed across each storage subsystem controller or that they are distributed correctly based on the preferred controller. Use the path_count calculation found in 15.3.4, Solving back-end storage problems on page 443. The total of all path_counts must add up to the number of managed disks (MDisks) multiplied by the number of SVC nodes. svcinfo lsmdisk Check that all MDisks are online (not degraded or offline). svcinfo lsmdisk mdiskid Check several of the MDisks from each storage subsystem controller. Are they online? And, do they all have path_count = number of nodes? svcinfo lsvdisk Check that all virtual disks (volumes) are online (not degraded or offline). If the volumes are degraded, are there stopped FlashCopy jobs? Restart these stopped FlashCopy jobs or delete the mappings. svcinfo lshostvdiskmap Check that all volumes are mapped to the correct hosts. If a volume is not mapped correctly, create the necessary host mapping.
421
7521Troubleshooting.fm
svcinfo lsfabric Use it with the various options, such as -controller and you can check different parts of the SVC configuration to ensure that multiple paths are available from each SVC node port to an attached host or controller. Confirm that all SVC node port WWPNs are connected to the back-end storage consistently.
422
7521Troubleshooting.fm
Example 15-1 shows how to obtain this information using the commands svcinfo lscontroller controllerid and svcinfo lsnode.
Example 15-1 The svcinfo lscontroller 0 command
IBM_2145:itsosvccl1:admin>svcinfo lscontroller 0 id 0 controller_name controller0 WWNN 200400A0B8174431 mdisk_link_count 2 max_mdisk_link_count 4 degraded no vendor_id IBM product_id_low 1742-900 product_id_high product_revision 0520 ctrl_s/n WWPN 200400A0B8174433 path_count 4 max_path_count 12 WWPN 200500A0B8174433 path_count 4 max_path_count 8 IBM_2145:itsosvccl1:admin>svcinfo lsnode id name UPS_serial_number WWNN status IO_group_id IO_group_name config_node UPS_unique_id hardware 6 Node1 1000739007 50050768010037E5 online 0 io_grp0 no 20400001C3240007 8G4 5 Node2 1000739004 50050768010037DC online 0 io_grp0 yes 20400001C3240004 8G4 4 Node3 100068A006 5005076801001D21 online 1 io_grp1 no 2040000188440006 8F4 8 Node4 100068A008 5005076801021D22 online 1 io_grp1 no 2040000188440008 8F4
Example 15-1 shows that two MDisks are present for the storage subsystem controller with ID 0, and there are four SVC nodes in the SVC cluster, which means that in this example the path_count is: 2 x 4 = 8 If possible, spread the paths across all storage subsystem controller ports, which is the case for Example 15-1 (four for each WWPN).
423
7521Troubleshooting.fm
424
7521Troubleshooting.fm
paths to both the preferred and non-preferred SVC nodes. For more information, refer to Chapter 8, Hosts on page 191. Check that paths are open for both preferred paths (with select counts in high numbers) and non-preferred paths (the * or nearly zero select counts). In Example 15-2, path 0 and path 2 are the preferred paths with a high select count. Path 1 and path 3 are the non-preferred paths, which show an asterisk (*) and 0 select counts.
Example 15-2 Checking paths
C:\Program Files\IBM\Subsystem Device Driver>datapath query device -l Total Devices : 1 DEV#: 0 DEVICE NAME: Disk1 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF2800000000000037 LUN IDENTIFIER: 60050768018101BF2800000000000037 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk1 Part0 OPEN NORMAL 1752399 0 1 * Scsi Port3 Bus0/Disk1 Part0 OPEN NORMAL 0 0 2 Scsi Port3 Bus0/Disk1 Part0 OPEN NORMAL 1752371 0 3 * Scsi Port2 Bus0/Disk1 Part0 OPEN NORMAL 0 0
SDDPCM
SDDPCM has been enhanced to collect SDDPCM trace data periodically and to write the trace data to the systems local hard drive. SDDPCM maintains four files for its trace data: pcm.log pcm_bak.log pcmsrv.log pcmsrv_bak.log
425
7521Troubleshooting.fm
Starting with SDDPCM 2.1.0.8, the relevant data for debugging problems is collected by running sddpcmgetdata. The sddpcmgetdata script collects information that is used for problem determination and then creates a tar file at the current directory with the current date and time as a part of the file name, for example: sddpcmdata_hostname_yyyymmdd_hhmmss.tar When you report an SDDPCM problem, it is essential that you run this script and send this tar file to IBM Support for problem determination. Refer to Example 15-3.
Example 15-3 Use of the sddpcmgetdata script (output shortened for clarity)
If the sddpcmgetdata command is not found, collect the following files: pcm.log pcm_bak.log pcmsrv.log pcmsrv_bak.log The output of the pcmpath query adapter command The output of the pcmpath query device command You can find these files in the /var/adm/ras directory.
SDDDSM
SDDDSM also provides the sddgetdata script to collect information to use for problem determination. SDDGETDATA.BAT is the batch file that generates the following files: The sddgetdata_%host%_%date%_%time%.cab file SDD\SDDSrv logs Datapath output Event logs Cluster log SDD specific registry entry HBA information Example 15-4 shows an example of this script.
Example 15-4 Use of the sddgetdata script for SDDDSM (output shortened for clarity)
C:\Program Files\IBM\SDDDSM>sddgetdata.bat Collecting SDD trace Data Collecting datapath command outputs Collecting SDD and SDDSrv logs Collecting Most current driver trace Generating a CAB file for all the Logs 426
SAN Volume Controller Best Practices and Performance Guidelines
7521Troubleshooting.fm
sdddata_DIOMEDE_20080814_42211.cab file generated C:\Program Files\IBM\SDDDSM>dir Volume in drive C has no label. Volume Serial Number is 0445-53F4 Directory of C:\Program Files\IBM\SDDDSM 06/29/2008 04:22 AM 574,130 sdddata_DIOMEDE_20080814_42211.cab
#!/bin/ksh export PATH=/bin:/usr/bin:/sbin echo "y" | snap -r # Clean up old snaps snap -gGfkLN # Collect new; don't package yet cd /tmp/ibmsupt/other # Add supporting data cp /var/adm/ras/sdd* . cp /var/adm/ras/pcm* . cp /etc/vpexclude . datapath query device > sddpath_query_device.out datapath query essmap > sddpath_query_essmap.out pcmpath query device > pcmpath_query_device.out pcmpath query essmap > pcmpath_query_essmap.out sddgetdata sddpcmgetdata snap -c # Package snap and other data echo "Please rename /tmp/ibmsupt/snap.pax.Z after the" echo "PMR number and ftp to IBM." exit 0
427
7521Troubleshooting.fm
In the following sections, we describe how to collect SVC data using the SVC Console GUI, using the SVC CLI, as well as how to generate a SVC livedump.
To download the support package, perform the following steps: 1. Click Download Support Package (Figure 15-2)
2. A Download Support Package window opens (Figure 15-3 on page 429) From there, select which kind of logs you want to download: Standard logs These contain the most recent logs that have been collected for the cluster. These logs are the most commonly used by support to diagnose and solve problems. Standard logs plus one existing statesave These contain the standard logs for the cluster and the most recent statesave from any of the nodes in the cluster. Statesaves are also known as dumps or livedumps. Standard logs plus most recent statesave from each node These contain the standard logs for the cluster and the most recent statesaves from each node in the cluster. Standard logs plus new statesaves 428
SAN Volume Controller Best Practices and Performance Guidelines
7521Troubleshooting.fm
These generate new statesaves (livedumps) for all the nodes in the cluster and packaged them with the most recent logs.
3. Click Download to confirm your choice (Figure 15-3). Note: Depending on your choice, this action can take several minutes to complete. 4. Finally, select where you want to save these logs(Figure 15-4).
Note: Any option used in the GUI (1-4), as well as using the CLI will collect the performance statistics files from all the nodes in the cluster.
Data collection for SVC using the SVC CLI 4.x or later
Because the config node is always the SVC node with which you communicate, it is essential that you copy all the data from the other nodes to the config node. In order to copy the files, first run the command svcinfo lsnode to determine the non-config nodes. The output of this command is shown in Example 15-6.
Example 15-6 Determine the non-config nodes (output shortened for clarity)
IO_group_id 0
config_node no
429
7521Troubleshooting.fm
node2
50050768010037DC
online
yes
The output that is shown in Example 15-6 shows that the node with ID 2 is the config node. So, for all nodes, except the config node, you must run the command svctask cpdumps. There is no feedback given for this command. Example 15-7 shows the command for the node with ID 1.
Example 15-7 Copy the dump files from the other nodes
IBM_2145:itsosvccl1:admin>svctask cpdumps -prefix /dumps 1 To collect all the files, including the config.backup file, trace file, errorlog file, and more, you need to run the svc_snap dumpall command. This command collects all of the data, including the dump files. To ensure that there is a current backup of the SVC cluster configuration, run a svcconfig backup before issuing the svc_snap dumpall command. Refer to Example 15-8 for an example run. It is sometimes better to use the svc_snap and ask for the dumps individually, which you do by omitting the dumpall parameter, which captures the data collection apart from the dump files. Note: Dump files are extremely large. Only collect them if they are really needed.
Example 15-8 The svc_snap dumpall command
IBM_2145:itsosvccl1:admin>svc_snap dumpall Collecting system information... Copying files, please wait... Copying files, please wait... Dumping error log... Waiting for file copying to complete... Waiting for file copying to complete... Waiting for file copying to complete... Waiting for file copying to complete... Creating snap package... Snap data collected in /dumps/snap.104603.080815.160321.tgz After the data collection with the svc_snap dumpall command is complete, you can verify that the new snap file appears in your 2145 dumps directory using this command, svcinfo ls2145dumps. Refer to Example 15-9 on page 430.
Example 15-9 The ls2145 dumps command (shortened for clarity)
IBM_2145:itsosvccl1:admin>svcinfo ls2145dumps id 2145_filename 0 dump.104603.080801.161333 1 svc.config.cron.bak_node2 . . 23 104603.trc 24 snap.104603.080815.160321.tgz To copy the file from the SVC cluster, use secure copy (SCP). The PuTTY SCP function is described in more detail in Implementing the IBM System Storage SAN Volume Controller V6.1, SG24-7933-00. 430
SAN Volume Controller Best Practices and Performance Guidelines
7521Troubleshooting.fm
Livedump
SVC livedump is a procedure that IBM Support might ask the clients to run for problem
investigation. It can be generated for all nodes from the GUI, as shown in Data collection for SVC using the SVC Console GUI on page 428, or triggered from CLI for instance just on one node of the cluster. Note: Only invoke the SVC livedump procedure under the direction of IBM Support. Sometimes, investigations require a livedump from the configuration node in the SVC cluster. A livedump is a lightweight dump from a node, which can be taken without impacting host I/O. The only impact is a slight reduction in system performance (due to reduced memory being available for the I/O cache) until the dump is finished. The instructions for a livedump are: 1. Prepare the node for taking a livedump: svctask preplivedum <node id/name> This command will reserve the necessary system resources to take a livedump. The operation can take some time, because the node might have to flush data from the cache. System performance might be slightly affected after running this command, because part of the memory, which normally is available to the cache, is not available while the node is prepared for a livedump. After the command has completed, then the livedump is ready to be triggered, which you can see by looking at the output from svcinfo lslivedump <node id/name>. The status must be reported as prepared. 2. Trigger the livedump: svctask triggerlivedump <node id/name> This command completes as soon as the data capture is complete, but before the dump file has been written to disk. 3. Query the status and copy the dump off when complete: svcinfo lslivedump <nodeid/name> The status shows dumping while the file is being written to disk and inactive after it is completed. After the status returns to the inactive state, you can find the livedump file in /dumps on the node with a filename of the format: livedump.<panel_id>.<date>.<time> You can then copy this file off the node, just as you copy a normal dump, by using the GUI or SCP. The dump must then be uploaded to IBM Support for analysis.
431
7521Troubleshooting.fm
Note: The switch must be running Fabric OS 5.2.X or later to collect technical support data. 1. Select Monitor Technical Support Product/Host SupportSave (Figure 15-5)
The Technical SupportSave dialog box displays. 2. Select the switches you want to collect data for in the Available SAN Products table and click the right arrow to move them to the Selected Products and Hosts table, as shown in Figure 15-6.
432
7521Troubleshooting.fm
3. Click OK on the Technical SupportSave dialog box. 4. You will see the Technical SupportSave Status box, as shown in
Data collection may take 20-30 minutes for each selected switch. This estimate may increase depending on the number of switches selected. 5. To view and save the technical support information you select Monitor Technical Support View Repository as shown in Figure 15-8.
433
7521Troubleshooting.fm
7. Click Save to store the data on your system, as seen above in Example 15-10. 8. You find a User Action Event in the Master Log, when the download was successful, as shown in Figure 15-10.
Note: You can gather technical data for M-EOS (McDATA SAN switches) devices using the devices Element Manager.
434
7521Troubleshooting.fm
Example 15-10 The supportSave output from IBM SAN32B-3 switch (output shortened for clarity)
IBM_2005_B5K_1:admin> supportSave This command will collect RASLOG, TRACE, supportShow, core file, FFDC data and other support information and then transfer them to a FTP/SCP server or a USB device. This operation can take several minutes. NOTE: supportSave will transfer existing trace dump file first, then automatically generate and transfer latest one. There will be two trace dump files transfered after this command. OK to proceed? (yes, y, no, n): [no] y Host IP or Host Name: 9.43.86.133 User Name: fos Password: Protocol (ftp or scp): ftp Remote Directory: / Saving support information for switch:IBM_2005_B5K_1, ..._files/IBM_2005_B5K_1-S0-200808132042-CONSOLE0.gz: Saving support information for switch:IBM_2005_B5K_1, ...files/IBM_2005_B5K_1-S0-200808132042-RASLOG.ss.gz: Saving support information for switch:IBM_2005_B5K_1, ...M_2005_B5K_1-S0-200808132042-old-tracedump.dmp.gz: Saving support information for switch:IBM_2005_B5K_1, ...M_2005_B5K_1-S0-200808132042-new-tracedump.dmp.gz: Saving support information for switch:IBM_2005_B5K_1, ...les/IBM_2005_B5K_1-S0-200808132042-ZONE_LOG.ss.gz: Saving support information for switch:IBM_2005_B5K_1, ..._files/IBM_2005_B5K_1-S0-200808132044-CONSOLE1.gz: Saving support information for switch:IBM_2005_B5K_1, ..._files/IBM_2005_B5K_1-S0-200808132044-sslog.ss.gz: SupportSave completed IBM_2005_B5K_1:admin> module:CONSOLE0... 5.77 kB 156.68 kB/s module:RASLOG... 38.79 kB 0.99 MB/s module:TRACE_OLD... 239.58 kB 3.66 MB/s module:TRACE_NEW... 1.04 MB 1.81 MB/s module:ZONE_LOG... 51.84 kB 1.65 MB/s module:RCS_LOG... 5.77 kB 175.18 kB/s module:SSAVELOG... 1.87 kB 55.14 kB/s
435
7521Troubleshooting.fm
Always follow the instructions that are given by the support team to determine whether to collect the package using the management GUI or the service assistant. Instruction is also given for which package content option is required. Using the management GUI to collect the support data is like collecting the information on a SVC, as shown in Chapter , Data collection for SVC using the SVC Console GUI on page 428. If you choose the statesave option for the Support Package you will get Enclosure dumps for all the enclosures in the system too.
The Collect Support Logs dialog box displays and you can collect the data using the Collect button (Figure 15-12).
When the collecting is done, it shows up under System Log File Name (Figure 15-13).
436
7521Troubleshooting.fm
Save the file on your system using the Get button as shown in Figure 15-13.
437
7521Troubleshooting.fm
lsfbvol lsioport -l lshostconnect The complete data collection is normally performed by the IBM service support representative (IBM SSR) or the IBM Support center. The IBM product engineering (PE) package includes all current configuration data as well as diagnostic data.
438
7521Troubleshooting.fm
C:\Program Files\IBM\Subsystem Device Driver>datapath query device -l Total Devices : 1 DEV#: 3 DEVICE NAME: Disk4 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018381BF2800000000000027 LUN IDENTIFIER: 60050768018381BF2800000000000027 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk4 Part0 CLOSE OFFLINE 218297 0 1 * Scsi Port2 Bus0/Disk4 Part0 CLOSE OFFLINE 0 0 2 Scsi Port3 Bus0/Disk4 Part0 OPEN NORMAL 222394 0 3 * Scsi Port3 Bus0/Disk4 Part0 OPEN NORMAL 0 0 Based on our field experience, we recommend that you check the hardware first: Check if any connection error indicators are lit on the host or SAN switch. Check if all of the parts are seated correctly (cables securely plugged in the SFPs, and the SFPs plugged all the way into the switch port sockets). Ensure that there are no broken fiber optic cables (if possible, swap the cables to cables that are known to work). After the hardware check, continue to check the software setup: Check that the HBA driver level and firmware level are at the recommended and supported levels. Check the multipathing driver level, and make sure that it is at the recommended and supported level. Check for link layer errors reported by the host or the SAN switch, which can indicate a cabling or SFP failure. Verify your SAN zoning configuration. Check the general SAN switch status and health for all switches in the fabric. In Example 15-12, we discovered that one of the HBAs was experiencing a link failure due to a fiber optic cable that has been bent over too far. After we changed the cable, the missing paths reappeared.
Example 15-12 Output from datapath query device command after fiber optic cable change
C:\Program Files\IBM\Subsystem Device Driver>datapath query device -l Total Devices : 1 DEV#: 3 DEVICE NAME: Disk4 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018381BF2800000000000027 LUN IDENTIFIER: 60050768018381BF2800000000000027 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port3 Bus0/Disk4 Part0 OPEN NORMAL 218457 1 1 * Scsi Port3 Bus0/Disk4 Part0 OPEN NORMAL 0 0 2 Scsi Port2 Bus0/Disk4 Part0 OPEN NORMAL 222394 0 3 * Scsi Port2 Bus0/Disk4 Part0 OPEN NORMAL 0 0
439
7521Troubleshooting.fm
The Recommended Actions panel displays event conditions that require actions, and procedures to diagnose and fix them. The highest-priority event is indicated, along with information about how long ago the event occurred. It is important to note that if an event is reported, you must select the event and run a fix procedure.
To retrieve properties and sense about a specific event, perform the following steps: 1. Select an event in the table. 2. Click Properties in the Actions menu (Figure 15-16) Tip: You can also obtain access to the Properties by right-clicking an event. 3. The Properties and Sense Data for Event sequence_number window (where sequence_number is the sequence number of the event that you selected in the previous step) opens, as shown in Figure 15-17.
440
7521Troubleshooting.fm
Tip: From the Properties and Sense Data for Event window, you can use the Previous and Next buttons to navigate between events. 4. Click Close to return to the Recommended Actions panel. Another common practice is to use the SVC CLI to find problems. The following list of commands provides you with information about the status of your environment: svctask detectmdisk (discovers any changes in the back-end storage configuration) svcinfo lscluster clustername (checks the SVC cluster status) svcinfo lsnode nodeid (checks the SVC nodes and port status) svcinfo lscontroller controllerid (checks the back-end storage status) svcinfo lsmdisk (provides a status of all the MDisks) svcinfo lsmdisk mdiskid (checks the status of a single MDisk) svcinfo lsmdiskgrp (provides a status of all the storage pools) svcinfo lsmdiskgrp mdiskgrpid (checks the status of a single storage pool) svcinfo lsvdisk (checks if volumes are online) Important: Although the SVC raises error messages, most problems are not caused by the SVC. Most problems are introduced by the storage subsystems or the SAN. If the problem is caused by the SVC and you are unable to fix it either with the Recommended Action panel or with the event log, you need to collect the SVC debug data as explained in 15.2.2, SVC data collection on page 427.
441
7521Troubleshooting.fm
If the problem is related to anything outside of the SVC, refer to the appropriate section in this chapter to try to find and fix the problem.
442
7521Troubleshooting.fm
zone:
The correct zoning must look like the zoning shown in Example 15-14.
Example 15-14 Correct WWPN zoning
zone:
The following SVC error codes are related to the SAN environment: Error 1060 Fibre Channel ports are not operational. Error 1220 A remote port is excluded. If you are unable to fix the problem with these actions, use 15.2.3, SAN data collection on page 431, collect the SAN switch debugging data, and then contact IBM Support.
443
7521Troubleshooting.fm
Typical problems for storage subsystem controllers include incorrect configuration, which results in a 1625 error code. Other problems related to the storage subsystem are failures pointing to the managed disk I/O (error code 1310), disk media (error code 1320), and error recovery procedure (error code 1370). However, all messages do not have just one explicit reason for being issued. Therefore, you have to check multiple areas and not just the storage subsystem. Next, we explain how to determine the root cause of the problem and in what order to start checking: 1. Check the Recommended Actions panel under SVC. 2. Check the attached storage subsystem for misconfigurations or failures. 3. Check the SAN for switch problems or zoning failures. 4. Collect all support data and involve IBM Support. Now, we look at these steps sequentially: 1. Check the Recommended Action panel under Troubleshooting. Select Troubleshooting Recommended Actions as shown in Figure 15-15 on page 440 For more information about how to use the Reocmmended Actions panel, refer to Implementing the IBM System Storage SAN Volume Controller V6.1, SG24-7933-00, or check the IBM System Storage SAN Volume Controller V6.2.0 Information Center and Guides at: http://publib.boulder.ibm.com/infocenter/svc/ic/index.jsp 2. Check the attached storage subsystem for misconfigurations or failures: a. Independent of the type of storage subsystem, the first thing for you to check is whether there are any open problems on the system. Use the service or maintenance features provided with the storage subsystem to fix these problems. b. Then, check if the LUN masking is correct. When attached to the SVC, you have to make sure that the LUN masking maps to the active zone set on the switch. Create a similar LUN mask for each storage subsystem controller port that is zoned to the SVC. Also, observe the SVC restrictions for back-end storage subsystems, which can be found at: https://www-304.ibm.com/support/docview.wss?rs=591&uid=ssg1S1003799 c. Next, we show an example of a misconfigured storage subsystem, and how this misconfigured storage system will appear from the SVCs point of view. Furthermore, we explain how to fix the problem. By running the svcinfo lscontroller ID command, you will get output similar to the output that is shown in Example 15-15 on page 444. As highlighted in the example, the MDisks, and therefore, the LUNs, are not equally allocated. In our example, the LUNs provided by the storage subsystem are only visible by one path, which is storage subsystem WWPN.
Example 15-15 The svcinfo lscontroller command
444
7521Troubleshooting.fm
vendor_id IBM product_id_low 1742-900 product_id_high product_revision 0520 ctrl_s/n WWPN 200400A0B8174433 path_count 8 max_path_count 12 WWPN 200500A0B8174433 path_count 0 max_path_count 8 This imbalance has two possible causes: If the back-end storage subsystem implements a preferred controller design, perhaps the LUNs are all allocated to the same controller. This situation is likely with the IBM System Storage DS4000 series, and you can fix it by redistributing the LUNs evenly across the DS4000 controllers and then rediscovering the LUNs on the SVC. Because we used a DS4500 storage subsystem (type 1742) in the Example 15-15, we need to check for this situation. Another possible cause is that the WWPN with zero count is not visible to all the SVC nodes via the SAN zoning or the LUN masking on the storage subsystem. Use the SVC CLI command svcinfo lsfabric 0 to confirm.
If you are unsure which of the attached MDisks has which corresponding LUN ID, use the SVC CLI command svcinfo lsmdisk (refer to Example 15-16). This command also shows to which storage subsystem a specific MDisk belongs (the controller ID).
Example 15-16 Determine the ID for the MDisk
IBM_2145:itsosvccl1:admin>svcinfo lsmdisk id name status mode mdisk_grp_id mdisk_grp_name capacity ctrl_LUN_# controller_name UID 0 mdisk0 online managed 0 MDG-1 600.0GB 0000000000000000 controller0 600a0b800017423300000059469cf84500000000000000000000000000000000 2 mdisk2 online managed 0 MDG-1 70.9GB 0000000000000002 controller0 600a0b800017443100000096469cf0e800000000000000000000000000000000 The problem turned out to be with the LUN allocation across the DS4500 controllers. After fixing this allocation on the DS4500, an SVC MDisk rediscovery fixed the problem from the SVCs point of view. Example 15-17 on page 445 shows an equally distributed MDisk.
Example 15-17 Equally distributed MDisk on all available paths
445
7521Troubleshooting.fm
max_mdisk_link_count 4 degraded no vendor_id IBM product_id_low 1742-900 product_id_high product_revision 0520 ctrl_s/n WWPN 200400A0B8174433 path_count 4 max_path_count 12 WWPN 200500A0B8174433 path_count 4 max_path_count 8 d. In our example, the problem was solved by changing the LUN allocation. If step 2 did not solve the problem, you need to continue with step 3. 3. Check the SANs for switch problems or zoning failures. Many situations can cause problems in the SAN. Refer to 15.2.3, SAN data collection on page 431 for more information. 4. Collect all support data and involve IBM Support. Collect the support data for the involved SAN, SVC, or storage systems as described in 15.2, Collecting data and isolating the problem on page 424.
446
7521Troubleshooting.fm
IBM_2145:itsosvccl1:admin>svcinfo lsvdisklba -mdisk 6 -lba 0x00172001 vdisk_id vdisk_name copy_id type LBA vdisk_start vdisk_end mdisk_start mdisk_end 0 diomede0 0 allocated 0x00102001 0x00100000 0x0010FFFF 0x00170000 0x0017FFFF This output shows: This LBA maps to LBA 0x00102001 of volume 0. The LBA is within the extent that runs from 0x00100000 to 0x0010FFFF on the volume and from 0x00170000 to 0x0017FFFF on the MDisk (so, the extent size of this storage pool is 32 MB). So, if the host performs I/O to this LBA, the MDisk goes offline.
447
7521Troubleshooting.fm
mdisk6
allocated
0x00050000
0x00050000
0x0005FFFF
0x00000000
0x0000FFFF
IBM_2145:itsosvccl1:admin>svcinfo lsmdisklba -vdisk 14 -lba 0x0 copy_id mdisk_id mdisk_name type LBA mdisk_start 0 unallocated
mdisk_end
vdisk_start 0x00000000
vdisk_end 0x0000003F
Volume 0 is a fully allocated volume, so the MDisk LBA information is displayed as in Example 15-18 on page 447. Volume 14 is a thin-provisioned volume to which the host has not yet performed any I/O; all of its extents are unallocated. Therefore, the only information shown by lsmdisklba is that it is unallocated and that this thin-provisioned grain starts at LBA 0x00 and ends at 0x3F (the grain size if 32 KB).
LABEL: SC_DISK_ERR2 IDENTIFIER: B6267342 Date/Time: Thu Aug 5 10:49:35 2008 Sequence Number: 4334 Machine Id: 00C91D3B4C00 Node Id: testnode Class: H Type: PERM Resource Name: hdisk34 Resource Class: disk Resource Type: 2145 Location: U7879.001.DQDFLVP-P1-C1-T1-W5005076801401FEF-L4000000000000 VPD: Manufacturer................IBM Machine Type and Model......2145 ROS Level and ID............0000 Device Specific.(Z0)........0000043268101002 Device Specific.(Z1)........0200604 Serial Number...............60050768018100FF78000000000000F6 SENSE DATA 0A00 2800 001C ED00 0000 0104 0000 0000 0000 0000 0000 0000 0102 0000 F000 0300 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 448
SAN Volume Controller Best Practices and Performance Guidelines
7521Troubleshooting.fm
From the sense byte decode: Byte 2 = SCSI Op Code (28 = 10-Byte Read) Bytes 4 - 7 = LBA (Logical Block Address for volume) Byte 30 = Key Byte 40 = Code Byte 41 = Qualifier
Error Log Entry 1965 Node Identifier Object Type Object ID Sequence Number Root Sequence Number First Error Timestamp
: Node7 : mdisk : 48 : 7073 : 7073 : Thu Jul 24 17:44:13 2008 : Epoch + 1219599853 Last Error Timestamp : Thu Jul 24 17:44:13 2008 : Epoch + 1219599853 Error Count : 21 Error ID : 10025 : Amedia error has occurred during I/O to a Managed Disk Error Code : 1320 : Disk I/O medium error Status Flag : FIXED Type Flag : TRANSIENT ERROR 11 80 02 03 00 00 00 40 00 00 11 00 00 00 02 00 02 0B 00 00 00 00 40 00 80 00 00 00 00 00 00 6D 00 00 00 00 00 00 59 00 00 00 00 00 00 58 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 0A 00 00 00 00 02 00 00 00 00 00 00 28 00 00 08 00 00 00 00 00 80 00 00 00 00 58 80 00 C0 00 00 00 59 00 00 AA 00 00 00
40 6D 04 02 00 00 00
449
7521Troubleshooting.fm
00 00 00 00 0B 00 00 00 04 00 00 00 10 00 02 01 Where the sense byte decodes as: Byte 12 = SCSI Op Code (28 = 10-Byte Read) Bytes 14 - 17 = LBA (Logical Block Address for MDisk) Bytes 49 - 51 = Key/Code/Qualifier Important: Attempting to locate medium errors on MDisks by scanning volumes with host applications, such as dd, or using SVC background functions, such as volume migrations and FlashCopy, can cause the storage pool to go offline as a result of error handling behavior in current levels of SVC microcode. This behavior will change in future levels of SVC microcode. Check with support prior to attempting to locate medium errors by any of these means.
Notes: Medium errors encountered on volumes will log error code 1320 Disks I/O Medium Error. If more than 32 medium errors are found while data is being copied from one volume to another volume, the copy operation will terminate and log error code 1610 Too many medium errors on Managed Disk.
450
7521p02_Practical examples.fm
Part 3
Part
Practical examples
In this part we show some practical examples of typical procedures using the Best Practices discussed in this Redbook. Some of these examples were taken from actual cases in Production Environment, and some were executed in IBM Laboratories.
451
7521p02_Practical examples.fm
452
7521PracticalExamplesKG.fm
16
Chapter 16.
SVC scenarios
In this chapter we provied working scenarios to reinforce and demonstrate the best practices and performance information in this book.
453
7521PracticalExamplesKG.fm
1. Take the preparation steps discussed in 14.4, SVC Code upgrade on page 406. Verify attached servers, SAN switches and storage controllers for errors. Define current and target SVC code levels - in this case, 5.1.0.8 and 6.2.0.2. 2. Download from IBM Storage Support web site the following software: SVC Console Software version 6.1 SVC Upgrade Test Utility version 6.6 (latest) SVC Code release 5.1.0.10 (latest fix for current version) SVC Code release 6.2.0.2 (latest release)
3. Upload and install the SVC Upgrade Test Utility. Run it pointing the target version to 5.1.0.10. Fix any errors it finds before proceeding.
454
7521PracticalExamplesKG.fm
Note: Do not proceed further until you made sure all servers attached to this SVC have compatible multipath software versions and each one have their redundant disk paths working free of errors. A clean exit from the SVC Upgrade Test Utility is just not enough for this. 4. Install SVC Code release 5.1.0.10 into cluster. Monitor the upgrade progress.
Figure 16-2 SVC Code upgrade status monitor using the GUI Example 16-2 SVC Code upgrade status monitor using the CLI
IBM_2145:svccf8:admin>svcinfo lssoftwareupgradestatus status upgrading IBM_2145:svccf8:admin>
455
7521PracticalExamplesKG.fm
5. Once the upgrade to 5.1.0.10 is completed, re-check the SVC cluster for eventual errors, out of precaution. 6. Migrate the existing VDisks out of the existing SSDs MDG. Here we show one example using migratevdisk and another adding then removing a vdisk mirror copy. While the first is simpler, the later allows you to do the task even if source and target MDGs have different extent sizes. In the later example: Since this cluster didnt use vdisk mirror copies before, we had first to configure some memory for the vdisk mirror bitmaps (chiogrp). Be careful with the -syncrate parameter to avoid any performance impact during the VDisk mirror copy synchronization. Changing this parameter from the default value of 50 to 55 as shown actually doubles the sync rate speed.
Example 16-3 SVC VDisk migration using migratevdisk
IBM_2145:svccf8:admin>svctask migratevdisk -mdiskgrp MDG4DS8KL3331 -vdisk NYBIXTDB02_T03 -threads 2 IBM_2145:svccf8:admin>svcinfo lsmigrate migrate_type MDisk_Group_Migration progress 5 migrate_source_vdisk_index 0 migrate_target_mdisk_grp 3 max_thread_count 2 migrate_source_vdisk_copy_id 0 IBM_2145:svccf8:admin>
456
7521PracticalExamplesKG.fm
status online sync no primary no mdisk_grp_id 3 mdisk_grp_name MDG4DS8KL3331 type striped mdisk_id mdisk_name fast_write_state empty used_capacity 20.00GB real_capacity 20.00GB free_capacity 0.00MB overallocation 100 autoexpand warning grainsize IBM_2145:svccf8:admin> IBM_2145:svccf8:admin>svctask addvdiskcopy -mdiskgrp MDG4DS8KL3331 -syncrate 75 NYBIXTDB02_T03 Vdisk [0] copy [1] successfully created IBM_2145:svccf8:admin>svcinfo lsvdiskcopy vdisk_id vdisk_name copy_id status sync primary mdisk_grp_name capacity type 0 NYBIXTDB02_T03 0 online yes yes MDG3SVCCF8SSD 20.00GB striped 0 NYBIXTDB02_T03 1 online no no MDG4DS8KL3331 20.00GB striped IBM_2145:svccf8:admin> IBM_2145:svccf8:admin>svcinfo lsvdiskcopy vdisk_id vdisk_name copy_id status sync primary mdisk_grp_name capacity type 0 NYBIXTDB02_T03 0 online yes yes MDG3SVCCF8SSD 20.00GB striped 0 NYBIXTDB02_T03 1 online yes no MDG4DS8KL3331 20.00GB striped IBM_2145:svccf8:admin> IBM_2145:svccf8:admin>svctask rmvdiskcopy -copy 0 NYBIXTDB02_T03 IBM_2145:svccf8:admin>svcinfo lsvdisk NYBIXTDB02_T03 id 0 name NYBIXTDB02_T03 IO_group_id 0 IO_group_name io_grp0 status online mdisk_grp_id 3 mdisk_grp_name MDG4DS8KL3331 capacity 20.00GB type striped formatted no mdisk_id mdisk_name FC_id FC_name RC_id RC_name vdisk_UID 60050768018205E12000000000000000 throttling 0 preferred_node_id 2 fast_write_state empty cache readwrite udid 0 fc_map_count 0 sync_rate 75 copy_count 1 copy_id 1 status online sync yes primary yes mdisk_grp_id 3 mdisk_grp_name MDG4DS8KL3331 type striped mdisk_id mdisk_name fast_write_state empty used_capacity 20.00GB real_capacity 20.00GB free_capacity 0.00MB overallocation 100 autoexpand warning grainsize IBM_2145:svccf8:admin>
mdisk_grp_id 2 3
mdisk_grp_id 2 3
457
7521PracticalExamplesKG.fm
7. Remove the SSDs from their MDG. If you try and run svcupgradetest before this it will still return errors as shown. Since we planned to no longer use it, the MDG itself was also removed.
Example 16-5 SVC internal SSDs put into unmanaged state.
IBM_2145:svccf8:admin>svcupgradetest -v 6.2.0.2 -d svcupgradetest version 6.6 Please wait while the tool tests for issues that may prevent a software upgrade from completing successfully. The test may take several minutes to complete. Checking 34 mdisks: ******************** Error found ******************** The requested upgrade from 5.1.0.10 to 6.2.0.2 cannot be completed as there are internal SSDs are in use. Please refer to the following flash: http://www.ibm.com/support/docview.wss?rs=591&uid=ssg1S1003707
Results of running svcupgradetest: ================================== The tool has found errors which will prevent a software upgrade from completing successfully. For each error above, follow the instructions given. The tool has found 1 errors and 0 warnings IBM_2145:svccf8:admin> IBM_2145:svccf8:admin>svcinfo lsmdisk -filtervalue mdisk_grp_name=MDG3SVCCF8SSD id name status mode mdisk_grp_id mdisk_grp_name capacity ctrl_LUN_# controller_name UID 0 mdisk0 online managed 2 MDG3SVCCF8SSD 136.7GB 0000000000000000 controller0 5000a7203003190c000000000000000000000000000000000000000000000000 1 mdisk1 online managed 2 MDG3SVCCF8SSD 136.7GB 0000000000000000 controller3 5000a72030032820000000000000000000000000000000000000000000000000 IBM_2145:svccf8:admin>svcinfo lscontroller id controller_name ctrl_s/n vendor_id product_id_low product_id_hi 0 controller0 IBM 2145 Internal 1 controller1 75L3001FFFF IBM 2107900 2 controller2 75L3331FFFF IBM 2107900 3 controller3 IBM 2145 Internal IBM_2145:svccf8:admin> IBM_2145:svccf8:admin>svctask rmmdisk -mdisk mdisk0:mdisk1 -force MDG3SVCCF8SSD IBM_2145:svccf8:admin> IBM_2145:svccf8:admin>svctask rmmdiskgrp MDG3SVCCF8SSD IBM_2145:svccf8:admin> IBM_2145:svccf8:admin>svcupgradetest -v 6.2.0.2 -d svcupgradetest version 6.6 Please wait while the tool tests for issues that may prevent a software upgrade from completing successfully. The test may take several minutes to complete. Checking 32 mdisks: Results of running svcupgradetest: ================================== The tool has found 0 errors and 0 warnings The test has not found any problems with the cluster. Please proceed with the software upgrade. IBM_2145:svccf8:admin>
8. Upload and install version 6.2.0.2. As you monitor the installation progress you will notice the GUI changing its shape. At the end of the process you will be asked to re-launch the Management GUI, which now runs in one SVC node instead of the SVC Console.
458
7521PracticalExamplesKG.fm
459
7521PracticalExamplesKG.fm
Figure 16-5 SVC Cluster already running code version 6.2.0.2, asking to restart GUI
9. Again, check the SVC for errors, out of precaution. Then configure the internal SSDs to be used by the MDG that received the VDisks migrated in Step 6, only now using Easy Tier. From the GUI Home screen, go to Physical Storage -> Internal, then click the Configure Storage button on the top left corner.
460
7521PracticalExamplesKG.fm
461
7521PracticalExamplesKG.fm
Select a RAID preset for the SSD drives - see Table 14-2 on page 412 for details.
Confirm the number of SSD drives and the RAID Preset to use to the Configuration Wizard and click Next.
462
7521PracticalExamplesKG.fm
Figure 16-11 Configuration Wizard confirmation Select the Storage Pool (former MDG) the SSD drives should be included into and click Finish.
463
7521PracticalExamplesKG.fm
At this point, SVC will continue the SSD array intialization process but will already put this pools Easy Tier in Active state, in effect by collecting I/O data to decide what VDisk extents to migrate to the SSDs later on. You can monitor your array initialization progress in the Tasks panel at the GUI bottom right corner.
The upgrade is finished. If not done so yet, you might want to plan what are your next steps into fine tuning Easy Tier. Also, if you dont have any other SVC clusters running SVC code version 5.1 or earlier, you can install SVC Console code version 6.
464
7521PracticalExamplesKG.fm
Example 16-6
### ### Verify that both old and new HBA WWPNs are logged in both fabrics: ### Here an example in one fabric ### b32sw1_B64:admin> nodefind 10:00:00:00:C9:59:9F:6C Local: Type Pid COS PortName NodeName SCR N 401000; 2,3;10:00:00:00:c9:59:9f:6c;20:00:00:00:c9:59:9f:6c; 3 Fabric Port Name: 20:10:00:05:1e:04:16:a9 Permanent Port Name: 10:00:00:00:c9:59:9f:6c Device type: Physical Unknown(initiator/target) Port Index: 16 Share Area: No Device Shared in Other AD: No Redirect: No Partial: No Aliases: nybixpdb01_fcs0 b32sw1_B64:admin> nodefind 10:00:00:00:C9:99:56:DA Remote: Type Pid COS PortName NodeName N 4d2a00; 2,3;10:00:00:00:c9:99:56:da;20:00:00:00:c9:99:56:da; Fabric Port Name: 20:2a:00:05:1e:06:d0:82 Permanent Port Name: 10:00:00:00:c9:99:56:da Device type: Physical Unknown(initiator/target) Port Index: 42 Share Area: No Device Shared in Other AD: No Redirect: No Partial: No Aliases: b32sw1_B64:admin> ###
465
7521PracticalExamplesKG.fm
### Cross check SVC for HBAs WWPNs amd LUNid ### IBM_2145:VIGSVC1:admin> IBM_2145:VIGSVC1:admin>svcinfo lshost nybixpdb01 id 20 name nybixpdb01 port_count 2 type generic mask 1111 iogrp_count 1 WWPN 10000000C9599F6C node_logged_in_count 2 state active WWPN 10000000C9594026 node_logged_in_count 2 state active IBM_2145:VIGSVC1:admin>svcinfo lshostvdiskmap nybixpdb01 id name SCSI_id vdisk_id 20 nybixpdb01 0 47 60050768019001277000000000000030 20 nybixpdb01 1 48 60050768019001277000000000000031 20 nybixpdb01 2 119 60050768019001277000000000000146 20 nybixpdb01 3 118 60050768019001277000000000000147 20 nybixpdb01 4 243 60050768019001277000000000000148 20 nybixpdb01 5 244 60050768019001277000000000000149 20 nybixpdb01 6 245 6005076801900127700000000000014A 20 nybixpdb01 7 246 6005076801900127700000000000014B IBM_2145:VIGSVC1:admin>
wwpn vdisk_UID 10000000C9599F6C 10000000C9599F6C 10000000C9599F6C 10000000C9599F6C 10000000C9599F6C 10000000C9599F6C 10000000C9599F6C 10000000C9599F6C
### ### At this point both the old and new servers were brought down. ### As such, the HBAs would not be logged into the SAN fabrics, hence the use of the -force parameter. ### For the same reason, it makes no difference which update is made first - SAN zones or SVC host definitions ### svctask addhostport -hbawwpn 10000000C99956DA -force nybixpdb01 svctask addhostport -hbawwpn 10000000C9994E98 -force nybixpdb01 svctask rmhostport -hbawwpn 10000000C9599F6C -force nybixpdb01 svctask rmhostport -hbawwpn 10000000C9594026 -force nybixpdb01 ### Alias WWPN update in the first SAN fabric aliadd "nybixpdb01_fcs0", "10:00:00:00:C9:99:56:DA" aliremove "nybixpdb01_fcs0", "10:00:00:00:C9:59:9F:6C" alishow nybixpdb01_fcs0 cfgsave cfgenable "cr_BlueZone_FA" ### Alias WWPN update in the second SAN fabric aliadd "nybixpdb01_fcs2", "10:00:00:00:C9:99:4E:98" aliremove "nybixpdb01_fcs2", "10:00:00:00:c9:59:40:26" alishow nybixpdb01_fcs2 cfgsave cfgenable "cr_BlueZone_FB" ### Back to SVC to monitor as the server is brought back up IBM_2145:VIGSVC1:admin>svcinfo lshostvdiskmap nybixpdb01 id name SCSI_id vdisk_id 20 nybixpdb01 0 47 60050768019001277000000000000030 20 nybixpdb01 1 48 60050768019001277000000000000031 20 nybixpdb01 2 119 60050768019001277000000000000146 20 nybixpdb01 3 118 60050768019001277000000000000147 20 nybixpdb01 4 243 60050768019001277000000000000148
466
7521PracticalExamplesKG.fm
20 nybixpdb01 5 60050768019001277000000000000149 20 nybixpdb01 6 6005076801900127700000000000014A 20 nybixpdb01 7 6005076801900127700000000000014B IBM_2145:VIGSVC1:admin>svcinfo lshost nybixpdb01 id 20 name nybixpdb01 port_count 2 type generic mask 1111 iogrp_count 1 WWPN 10000000C9994E98 node_logged_in_count 2 state inactive WWPN 10000000C99956DA node_logged_in_count 2 state inactive IBM_2145:VIGSVC1:admin> IBM_2145:VIGSVC1:admin>svcinfo lshost nybixpdb01 id 20 name nybixpdb01 port_count 2 type generic mask 1111 iogrp_count 1 WWPN 10000000C9994E98 node_logged_in_count 2 state active WWPN 10000000C99956DA node_logged_in_count 2 state active IBM_2145:VIGSVC1:admin>
Once the new LPAR showed both its HBAs as active, we confirmed that it recognized all SAN disks previously assigned and they all had healthy disk paths.
467
7521PracticalExamplesKG.fm
The new infrastructure is installed and configured, with the new SAN switches attached to the existing SAN fabrics (preferably using trunks, for bandwidth) and the new SVC ready to use. Also, the necessary SAN zoning configuration is made between the initial and the new SVC clusters, and a Remote Copy (RC) partnership is established between them (note the -bandwidth parameter). After that, for each VDisk in use by the production server we created a target VDisk in the new environment with the same size, a RC relationship between these VDisks, and we included this relationship in a consistency group. The initial VDisks synchronization was started, and it took a while for the copies to become synchronized, given the large amount of data and the bandwidth kept in its default value as precaution.
468
7521PracticalExamplesKG.fm
Figure 16-16 New SVC and SAN are installed Example 16-7 SVC commands to set up RC relationship
SVC commands used in this phase: # lscluster # mkpartnership -bandwidth <bw> <svcpartnercluster> # mkvdisk -mdiskgrp <mdg> -size <sz> -unit gb -iogrp <iogrp> -vtype striped -node <node> -name <targetvdisk> -easytier off # mkrcconsistgrp -name <cgname> -cluster <svcpartnercluster> # mkrcrelationship -master <sourcevdisk> -aux <targetvdisk> -name <rlname> -consistgrp <cgname> -cluster <svcpartnercluster> # startrcconsistgrp -primary master <cgname> # chpartnership -bandwidth <newbw> <svcpartnercluster>
A planned outage was scheduled for after the initial synchronization was finished, so that we reconfigured the server to use the new SVC infrastructure.
469
7521PracticalExamplesKG.fm
470
7521PracticalExamplesKG.fm
Figure 16-21 Remove RC relationships and reclaim old space (backup copy)
471
7521PracticalExamplesKG.fm
The private kay for authentication (for example, icat.ppk). This key is the private that you have already created. This parameter is set under the Connection SSH Auth category as shown in Figure 16-23 on page 473.
472
7521PracticalExamplesKG.fm
The IP address of the SVC cluster. This parameter is set under the Session category as shown in Figure 16-24.
A session name. Our example uses redbook_CF8. Our Putty version is 0.61.
473
7521PracticalExamplesKG.fm
To use this predefined PuTTY session, use the following syntax: plink redbook_CF8 If a predefined PuTTY session is not used, use this syntax: plink admin@<your cluster ip address> -i C:\DirectoryPath\KeyName.PPK In the following Example 16-8 we show a script to restart GM relationships and groups:
Example 16-8 Restart GM relationships and groups
svcinfo lsrcconsistgrp -filtervalue state=consistent_stopped -nohdr -delim : | while IFS=: read id name mci mcn aci acn p state junk; do echo "Restarting group: $name ($id)" svctask startrcconsistgrp -force $name echo "Clearing errors..." svcinfo lserrlogbyrcconsistgrp -unfixed $name | while read id type fixed snmp err_type node seq_num junk; do if [ "$id" != "id" ]; then echo "Marking $seq_num as fixed" svctask cherrstate -sequencenumber $seq_num fi done done svcinfo lsrcrelationship -filtervalue state=consistent_stopped -nohdr -delim : | while IFS=: read id name mci mcn mvi mvn aci acn avi avn p cg_id cg_name state junk; do if [ "$cg_id" == "" ]; then echo "Restarting relationship: $name ($id)" svctask startrcrelationship -force $name echo "Clearing errors..." svcinfo lserrlogbyrcrelationship -unfixed $name | while read id type fixed snmp err_type node seq_num junk; do if [ "$id" != "id" ]; then echo "Marking $seq_num as fixed" svctask cherrstate -sequencenumber $seq_num fi done fi done Various limited scripts can be run directly in the SVC shell, as shown in the following three examples.
Example 16-9 Create 50 Volumes
IBM_2145:svccf8:admin>for ((num=0;num<50;num++)); do svctask mkvdisk -mdiskgrp 2 -size 20 -unit gb -iogrp 0 -vtype striped -name Test_$num; echo Volumename Test_$num created; done
Example 16-10 Change the name of 50 Volumes
474
7521PracticalExamplesKG.fm
IBM_2145:svccf8:admin>for ((num=0;num<50;num++)); do svctask rmvdisk $num; done Additionally, Engineers in IBM have developed a scripting toolkit, which is designed to help automate SVC operations. It is Perl-based and available at no-charge from: http://ww.alphaworks.ibm.com/tech/svctools The scripting toolkit includes a sample script that you can use to redistribute extents across existing MDisks in the pool. Refer to 5.7 Restriping (balancing) extents across a Storage Pool on page 77 for an example use of the redistribute extents script from the scripting toolkit. Note: The scripting toolkit is made available to users through IBMs AlphaWorks Web site. As with all software available on AlphaWorks, it is not extensively tested and is provided on an as-is basis. It is not supported in any formal way by IBM Product Support. Use it at your own risk.
475
7521PracticalExamplesKG.fm
476
7521bibl.fm
Related publications
The publications listed in this section are considered particularly suitable for a more detailed discussion of the topics covered in this book.
Other resources
These publications are also relevant as further information sources: IBM System Storage Open Software Family SAN Volume Controller: Planning Guide, GA22-1052 IBM System Storage Master Console: Installation and Users Guide, GC30-4090 IBM System Storage Open Software Family SAN Volume Controller: Installation Guide, SC26-7541 IBM System Storage Open Software Family SAN Volume Controller: Service Guide, SC26-7542 IBM System Storage Open Software Family SAN Volume Controller: Configuration Guide, SC26-7543
477
7521bibl.fm
IBM System Storage Open Software Family SAN Volume Controller: Command-Line Interface User's Guide, SC26-7544 IBM System Storage Open Software Family SAN Volume Controller: CIM Agent Developers Reference, SC26-7545 IBM TotalStorage Multipath Subsystem Device Driver Users Guide, SC30-4096 IBM System Storage Open Software Family SAN Volume Controller: Host Attachment Guide, SC26-7563 Considerations and Comparisons between IBM SDD for Linux and DM-MPIO, which is available at: http://www-1.ibm.com/support/docview.wss?rs=540&context=ST52G7&q1=linux&uid=ssg 1S7001664&loc=en_US&cs=utf-8&lang=en IBM System Storage SAN Volume Controller V6.2.0 - Software Installation and Configuration Guide, GC27-2286
478
7521bibl.fm
Related publications
479
7521bibl.fm
480
7521IX.fm
Index
traffic 317 application testing 124 applications 45, 112, 150, 193, 234, 297 architecture 54, 207, 228 architectures 215 area 205, 424 areas 191, 299, 420 array 12, 44, 7475, 126, 163, 218, 245, 275, 300 midrange storage controllers 245 array considerations storage pool 245 array layout 244 array parameters 56 array provisioning 245 array site spare 59 array sizes mixing in Storage Pool 278 arrays 12, 44, 56, 72, 74, 109, 126, 275, 301 per storage pool 246 ASIC 22 asynchronous 114 asynchronous mirroring 171 asynchronous remote 132, 142 asynchronous remote copy 133, 141142, 144 asynchronously 162 attached 13, 104, 191, 420 attention 23 attributes 71 Auto Logical Drive Transfer 56 Auto-Expand 103 automatically discover 203 automation 114 auxiliary 177, 184 Auxiliary Cluster 133 auxiliary VDisk 143 Auxiliary Volume 133 Auxiliary Volumes 141 availability 23, 72, 199
Numerics
10 Gb Ethernet adapter 7 1862 error 97 1920 error 160 bad period count 182 troubleshooting 183 1920 error code 159 1920 error message 182 1920 errors 160 2145-4F2 node support 5 2145-CG8 7, 45 2-way write-back cached 101 500 247
A
access 12, 44, 54, 106, 194 Access LUN 58 -access option 165 access pattern 112 active 54, 218, 444 adapters 193, 276, 306 DS8000 258 address 100 adds 300 Admin 302 administrator 44, 67, 420 administrators 223, 301 aggregate 54 aggregate load 73, 246 AIX 193, 297, 448 AIX host 202, 209, 427 AIX LVM admin roles 302 alert 19 alert events CPU utilization threshold 371 Overall backend response time threshold 371 overall port response time threshold 371 alerts 13, 148 algorithms 111 Alias 34 alias 34 aliases 28 storage subsystem 35 alignment 305 amount of I/O 112, 163 analysis 68, 431 application availability 306 performance 72, 108, 298 Application Specific Integrated Circuit 22
B
backend storage 233 backend storage controller 118 back-end storage controllers 163 backend transfer size 248 background copy 132 background copy bandwidth 158 backplane 22 backup 13, 126, 214, 299, 430 backup node 33 backup sessions 305 bad period count 182 balance 33, 56, 104, 199, 301 balance the workload 111 balanced 33, 56, 194 balancing 36, 95, 194, 303, 305
481
7521IX.fm
Bandwidth 146, 191 bandwidth 12, 45, 112, 126, 148, 194, 299 Bandwidth parameter 139 bandwidth parameter 139 bandwidth requirements 38 Basic 14 basic 12, 192, 399, 422 batch workloads 234 best practices xv, 11, 72, 104, 191 LUN selection 75 between 13, 54, 74, 105, 193, 300 BIOS 52, 216 blade 28 BladeCenter 39 blades 28 block 57, 105, 119, 298 block size 300 blocking 12 blocks 111 BM System Storage SAN Volume Controller Host Attachment Users Guide Version 4.2.0 216217 boot 195 boot device 212 bottleneck 24 bottleneck detection feature 24 bottlenecks 298 boundary crossing 305 bridge 16 Brocade 47, 434 Brocade Webtools GUI 27 buffer 119 Buffer Credit 153 buffers 127, 193, 307 bus 205
C
cache 12, 74, 107, 193, 245, 298300, 431 Cache battery failure 325 cache block size 276 cache disabled 114, 121 cache friendly workload 235 cache influence 237 Cache management 46 cache mode 116 cache partitioning 252 cache track size 249 cache usage 235 cache-disabled image mode 172 cache-disabled state 173 cache-disabled VDisk 115 Cache-disabled VDisks 114 cache-enabled 128 caching 111, 114 caching algorithm 250 capacity 18, 44, 75, 105, 304, 445 cards 216 Case study fabric performance 374 Server performance 355 SVC performance alerts 370
Top Volumes Response Performance 368 cfgportip command 40 changes 13, 148, 173, 192, 307, 420 changes of state 177 channel 210 chcluster command 156 chdev 209 choice 112, 200 chpartnership command 156 chquorum command 22, 77 Cisco 12, 47, 403, 435 CLI 120, 204, 421 commands 256, 441 client 213, 305 cluster 12, 43, 72, 104, 193, 247, 421 cluster affinity 54 clustered systems advantage 48 clustering 207 clustering software 207 clusters 37, 45, 150, 207 Colliding writes 144, 154 command 56, 120, 195, 304, 425 command prefix removal 7 commands CreateRelationship 164 commit 121 compatibility 51 complexity 25 conception 26 concurrent 205, 442 concurrent code update 51 configuration 11, 72, 192, 298, 421 configuration changes 203 configuration data 203, 438 configuration node 431 configuration parameters 185, 205 configure 79, 211, 306 congested 19 congestion 12 control 13, 148 connected 12, 54, 191, 422 connected state 180181 connection 208 connections 18, 54, 213 connectivity 212, 420 consistency 221 consistency freeze 181 consistent 167, 221 Consistent relationship 133 Consistent Stopped state 178 Consistent Synchronized state 178 ConsistentStopped 181 ConsistentSynchronized 181 consolidation 72 container 305 containers 305 control 44, 59, 95, 114, 194, 302 controller port 247 controller ports 258
482
7521IX.fm
DS4000 278 copy 221, 396, 429 copy rate 121 copy services 45, 50 core switch 14, 18 core switches 23 core/edge ASIC 22 core-edge 15 corrupted 221 corruption 61 cost 149 counters 223 CPU utilization 46 create a FlashCopy 121 CreateRelationship command 164 critical 85, 298 cross-bar architecture 22 current 204, 421, 450 CWDM 37, 149
D
daisy-chain topology 170 data 13, 44, 162, 193, 298, 419 consistency 122 data corruption zone considerations 37 data formats 214 data integrity 108, 120 data layout 105, 301 Data layout strategies 307 data migration 126, 214 data mining 124 Data Path View 383 data pattern 298 data rate 189, 234 data redundancy 72 data traffic 19 database 13, 112, 122, 202, 299, 420 log 300 Database Administrator 303 Datapath Explorer 378 date 426 DBA 303 dd command 166 debug 424 decibel definition 152 dedicate bandwidth 38, 150 dedicated ISLs 19 default 196 default values DS5000 276 defined 118119, 300 degraded 421 degraded performance 162 delay 122 delete a VDisk 107 deleted 121 design 11, 44, 201, 305 destage 250
destage size 276 Detect MDisks 56 device 12, 195, 303, 424 device adapter loading 58 device adapters 255 device data partitions 57, 276 device driver 173, 207 diagnose 32 diagnostic 208, 438 direct-attached host 247 director 23 directors 23 directory I/O 101 disaster 221 disaster recovery 48, 179 disaster recovery solutions 167 Disconnected state 179 discovery 56, 95, 202 disk 12, 44, 72, 105, 119, 202, 298, 448 latency 298 disk access profile 112 disk groups 48 Disk Magic 234 DiskMagic 74 disruptive 105 distance 37, 149150 limitations 37, 149 distance extension 38, 150 distances 37, 150 DMP 200 documentation 11, 396 domain 60 Domain ID 37 domain ID 37 Domain IDs 37 domains 72 download 224, 442 downtime 122 drive loops 55 driver 207, 420 drops 74 DS4000 54, 75, 80, 224, 245, 437 controller ports 278 DS4000 Storage Server 299 DS4100 247 DS4800 35, 247, 276 DS5000 array and cache parameters 275 availability 275 Storage Manager 277 throughput parameters 275 DS5000 considerations 275 DS6000 54, 211, 245246, 437 DS8000 54, 74, 211, 245, 437 adapters 258 alias considerations 35 controller ports 258 extent size 264 DS8000 architecture 72
Index
483
7521IX.fm
DS8000 bandwidth 256 dual fabrics 28 dual-redundant switch controllers 22 DWDM 37, 149 DWDM components 152
E
Easy Tier 5 manual operation 255 edge 12, 148 edge switch 13 edge switches 1415, 23 efficiency 111 egress 22 e-mail 38, 150, 223 EMC Symmetrix 66 enable 25, 44, 109, 122, 211, 299 enforce 18 error 192, 420 Error Code 449 error code 1625 60 error handling 450 error log 424, 449 error logging 67, 448 errors 192, 420 Ethernet 12 Ethernet ports 5 evenly balancing I/Os 306 event 13, 54, 111, 211 exchange 122 execution throttle 216 expand workload 50 expansion 13 explicit sequential detect 250 extended-unique identifier 40 extenders 150 extension 37, 149 extent 58, 105, 256, 300, 305 size 105, 305 extent balancing script 82 extent pool 58 extent pool affinity 253 extent pool striping 59, 253 extent pools 58 storage pool striping 255 extent size 105, 250, 255, 304 DS8000 264 extent size 8GB 6 extent sizes 105, 304 extents 105, 305
F
fabric 3, 11, 126, 192, 420 hop count limit 152 isolation 199 login 201 fabric outage 13, 148 Fabric Watch 23 fabrics 15, 193
failover 112, 192, 421 logical drive 56 failover scenario 144 failure boundaries 73, 303 failure boundary 303 FAStT 28, 216 storage 28 FAStT200 247 fastwrite cache 101, 243 fault tolerant LUNs 75 FC 12, 148, 201 FC adapters 276 FC port speed 352 fcs 36, 210 fcs device 210 features 45, 212, 420 Fibre Channel 12, 150, 191, 420 ports 38, 54 routers 150 traffic 13, 148 Fibre Channel (FC) 193 Fibre Channel IP conversion 150 Fibre Channel ports 54, 194, 443 Fibre Channel routing 24 file system 119, 217 file system level 221 filesets 213 firmware 184 flag 108, 167 FlashCopy 47, 67, 240, 249, 421, 448 applications 450 I/O operations 240 incremental 242 mapping 68 prepare 121 rules 127 source 449 storage pool 241 target 115 thin provisioning 244 FlashCopy mapping 120 FlashCopy mappings 107 FlashCopy relations thin provisioned volumes 243 flashcopy relationship target as Remote Copy source 136 FlashCopy target Remote Copy source 7 flexibility 112, 207 flow 13, 148 flush the cache 204 -fmtdisk security delete feature 177 force flag 85, 108 foreground I/O latency 158 format 214, 431 frames 12 free extents 111 full bandwidth 23 full stride writes 74, 250
484
7521IX.fm
full synchronization 164 fully allocated copy 111 fully allocated VDisk 111 fully connected mesh 170 functions 67, 211, 422, 448
G
Gb 276 General Public License (GNU) 224 Global Mirror 132133 1920 errors 182 bandwidth 156 change to Metro Mirror 167 Featuresby release 138 parameters 155 planning 161 planning rules 160 restart script 187 switching direction 166 upgrade scenarios 174 writes 143 Global Mirror parameters 139 Global Mirror Partnership 140 Global Mirror relationship 147, 166 GM Bandwidth 156 GM Partnership Bandwidth 134 gm_inter_cluster_delay_simulation 155 gm_intra_cluster_delay_simulation 155 gm_link_tolerance 155 gm_max_host_delay 155 gm_max_hostdelay 139 gm_max_hostdelay parameter 140 gmlinktolerance Bad Periods 159 gmlinktolerance feature 161 gmlinktolerance parameter 139140, 158 GNU 224 governing throttle 112 grain size 249 granularity 105, 221 graphs 206 group 18, 72, 194, 300 groups 20, 48, 58, 196, 231, 257, 302, 442 GUI 27, 51, 200 GUI option DetectMDisks 56
help 18, 126, 209, 301, 420 heterogeneous 44, 422 high-bandwidth 23 high-bandwidth hosts 14 hop count 152 hops 13 host 12, 54, 104, 191, 247, 297, 420 configuration 32, 106, 128, 301, 422 creating 34 definitions 106, 202, 299 HBAs 32 information 52, 200, 425 systems 50, 191, 299, 420 zone 31, 104, 193, 422 host bus adapter 153, 216 Host IO capacity 239 host level 195 host mapping 194, 421 host type 56 host zones 34
I
I/O capacity rule of thumb 243 I/O governing 112 I/O governing rate 114 I/O Group 45 I/O group 18, 47, 104, 199 I/O Groups 111 I/O groups 33, 104, 160, 203 I/O performance 210 I/O rate setting 113 I/O resources 234 I/O workload 303 IBM Subsystem Device Driver 54, 80, 106, 108, 173, 211 IBM TotalStorage Productivity Center 161, 424, 477 identical data 177 identification 196 identify 73, 211 idling state 181 IdlingDisconnected 182 IEEE 214 image 47, 105, 119, 195, 301 Image mode 109, 171 image mode 50, 201 image mode virtual disk 115 Image Mode volumes 172 image type VDisk 109 implement 13, 48, 215 implementing xv, 11, 207 import failed 97 improvements 51, 212, 227, 246 Improves 45 Inconsistent Copying state 178 Inconsistent Stopped state 178 InconsistentCopying 180 InconsistentStopped 180 InconsistentStopped state 179 incremental FC 242 in-flight write limit 156 Index
H
HACMP 211 hardware 54, 215, 421 HBA 38, 45, 199, 210, 216, 420 HBAs 26, 193194, 216, 299 zoning 32 head-of-line blocking 148 health 212, 439 healthy 185 heartbeat 151 heartbeat messages 135 heartbeat signal 78
485
7521IX.fm
information 11, 56, 110, 202, 297, 420 infrastructure 114 tiering 235 ingress 22 initial configuration 198 initiators 207 install 15, 127, 215 installation 11, 79 insufficient bandwidth 13, 148 Integrated Routing 25 integrity 108, 120 Inter Switch Link 12 Inter Switch Link congestion 148 Inter-cluster communication 135 intercluster Global Mirror 154 Intercluster Link 133134 distance extensions 149 intercluster link 147 Intercluster Link bandwidth 146 intercluster Metro Mirror 141 interface 44, 120, 191 Interlink bandwidth 135 internal SSD 7 Internet Protocol 38, 150 interoperability 8, 39 Intracluster copying 163 intracluster Global Mirror 154 intracluster Metro Mirror 141 IO collision 245 IO operations FlashCopy 240 IO service times 74 IO size 256 KB 248 iogrp 107, 194 IOPS 193, 228, 298 IOstats 183 IP 3738, 149150 IP traffic 38, 150 iSCSI driver 40 iSCSI initiators 40 iSCSI protocol 4 iSCSI protocol limitations 41 iSCSI qualified name 40 iSCSI support 40 iSCSI target 40 ISL 12, 148 ISL capacity 22 ISL hop count 141 ISL links 15 ISL oversubscription 13 ISL trunk 148 ISL trunks 23 ISLs 13, 148 isolated 60, 199 isolated SAN networks 258 isolation 56, 199, 421
K
kernel 217 key 202, 305 keys 208
L
last extent 105 Latency 145 latency 22, 121, 298 LBA 449 LDAP directory 4 lease expiry event 149 level 27, 195, 420 storage 221, 421 levels 75, 207, 450 lg_term_dma 210 library 218 licensing 8 light 298, 439 limitation 205 limitations 11, 172, 300, 438 limits 45, 205, 307 lines of business 303 link 12, 48, 148, 162, 214 bandwidth 38, 150151 latency 151 Link Bandwidth 134 Link Latency 134 Link latency 151 Link speed 145 links 150 Linux 216 list 26, 45, 218, 396, 420 livedump 431 load balance 112, 199 load balances traffic 17 Load balancing 212 load balancing 105, 216 loading 246, 256 LOBs 303 Local Cluster 133 Local Hosts 133 location 298 locking 207 log 424, 449 logged 60 Logical Block Address 449 logical drive 94, 209, 300, 305 failover 56 logical drive mapping 58 logical unit number 172 logical units 48 logical volumes 305 login 194 logins 194 logs 122, 300, 426 long distance 151 loops 277 lower-performance 104
J
journal 217, 300
486
7521IX.fm
LPAR 214 lquerypr utility 80 lsarray command 256 lscontroller command 91 lshostconnect command 61 lsmdisklba command 67 lsmigrate command 83 lsportip command 41 lsquorum command 77 lsrank command 256 lsvdisklba command 67 LU 194 LUN 50, 74, 115, 172, 192, 245, 300 access 207 LUN mapping 85, 195 LUN masking 37, 60, 422 LUN maximum 76 LUN Number 85 LUN per 303 LUN size XIV 265 LUNs 54, 194, 301, 303 LVM 212, 302 LVM volume groups 305
M
maintenance 52, 201, 420 maintenance procedures 444 manage 44, 192, 303 Managed Disk 312 managed disk 307, 444 Managed Disk Group 312 managed disk group 109, 307 Managed Disk Group performance 335 Managed Mode 57, 109, 276 management 16, 192, 301, 422 capability 194 port 194 software 195 managing 44, 50, 192, 305, 420 map 127, 196 map a VDisk 199 mapping 85, 107, 120, 192, 303, 421 rank to extent pools 253 mappings 107, 208, 421 maps 307, 444 mask 19, 194, 444 masking 50, 60, 127, 194, 422 master 52, 120 Master Cluster 133 Master volume 133 max_xfer_size 210 maxhostdelay parameter 159 maximum IOs 306 maximum transmission unit 40 MB 38, 57, 105, 150, 210, 276 Mb 38, 150 McDATA 47, 435 MDGs 71, 303 MDisk 71, 105, 199, 300
removing 209 selecting 73 MDisk group 300 MDisk transfer size 248 MDisks checking access 80 MDisks performance 362 media 185, 444 media error 449 medium errors 448 member 33 members 55, 421 memory 119, 192, 300, 431 messages 199 metadata 101 metadata corruption 97 MetaSANs 24 Metro 47 Metro Mirror 132133, 162 planning rules 160 Metro Mirror relationship 121 change to Global Mirror 167 microcode 450 migrate 39, 126, 195 migrate data 109, 214 migrate VDisks 106 migration 13, 50, 67, 108, 126, 201, 448 migration scenarios 18 Mirror Copy activity 46 mirrored 151, 221 mirrored copy 141 mirrored VDisk 103 mirroring 37, 149, 162, 212 mirroring relationship 37 misalignment 305 mkpartnership command 139, 141 mkrcrelationship 167 mkrcrelationship command 141 Mode 57, 196, 276, 425 mode 47, 171, 193, 301, 445 settings 128 monitor 39, 422 monitored 186, 222, 420 monitoring 161, 191, 443 mount 107, 122 MPIO 212 multi-cluster installations 15 Multi-Cluster-Mirroring 167 multipath drivers 80 multipath software 207 multipathing 54, 192, 419 Multipathing software 201 multipathing software 199 Multiple Cluster Mirroring topologies 168 multiple cluster mirroring 136 multiple paths 112, 199, 422 multiple vendors 39 Multitiered storage pool 79
Index
487
7521IX.fm
N
name server 201 names 34, 104, 215 nameserver 201 naming 27, 79, 104 naming convention 27 native copy services 172 nest aliases 33 new disks 202 new MDisk 94 no synchronization 103 NOCOPY 121 node 13, 105, 192, 234, 420 adding 49 failure 111, 201 port 28, 111, 186, 193, 422 Node Cache performance report 327 Node level reports 325 node port 28 nodes 13, 60, 105, 148, 194, 227, 247, 421 maximum 45 non 25, 44, 61, 106, 199, 303, 425 non-disruptive 108 non-preferred path 111 num_cmd_elem 209210
O
offline 108, 200, 421, 450 OLTP 300 Online 300 online 421 online transaction processing (OLTP) 300 operating system (OS) 298 operating systems 199, 305, 424 optical distance extension 37 optical multiplexing 38 Optical multiplexors 150 optical multiplexors 38 optical transceivers 38 Oracle 212, 303 OS 192, 307 overlap 28 overloading 119, 189 over-subscribed 23 oversubscribed 14 over-subscription 22 oversubscription 13 overview 71, 301, 419
P
parameters 112, 185, 194, 299 partition 213 partitions 213, 305 path 13, 54, 192, 234, 307, 420 selection 211 path count connection 62 paths 17, 52, 54, 111, 192, 422 peak 13, 148 per cluster 105
performance xv, 13, 44, 72, 148, 191, 227, 297, 420 backend storage 233 degradation 56, 74, 162, 245 performance advantage 74 striping 73 performance characteristics 105, 224, 307 LUNs 75 tiering 235 performance degradation number of extent pools 256 performance improvement 109 performance monitoring 189, 194 performance reports Managed Disk Group 335 SVC port performance 346 performance requirements 51 performance statistics 7 Perl packages 81 permit 13 persistent 207 persistent reserve 80 PFE xvi physical 45, 75, 118119, 191 physical link error 38 physical volume 214, 307 Plain Old Documentation 84 planning 32, 299 PLOGI 201 Point In Time consistency 143 point-in-time copy 172 policies 212 policy 207 pool 44 port 186, 192, 247, 422 types 66 port bandwidth 22 Port Channels 25 port layout 23 port naming convention XIV 64 port zoning 26 port-density 22 ports 12, 47, 54, 192, 421 XIV 266 power 204, 443 preferred 50, 54, 105, 156, 193, 304, 421 preferred node 33, 105, 156, 199 preferred owner node 111 preferred path 54, 111, 199 preferred paths 112, 199, 425 prefetch logic 250 prepare a FlashCopy 173 prepare FlashCopy mapping 138 prepared state 186 Pre-zoning tips 27 primary 72, 115, 302 primary considerations LUN attributes 75 primary environment 48 problems 12, 52, 66, 127, 209, 298, 419
488
7521IX.fm
profile 57, 94, 112, 276, 442 properties 218 protecting 55 provisioning 79 LUNs 75 pSeries 36, 223 PVID 214 PVIDs 215
Q
queue depth 205, 210, 216, 247, 307 quick synchronization 164 quickly 12, 121, 199 quiesce 107, 122, 204 quorum disk considerations 78 placement 21 quorum disks 76
R
RAID 58, 75, 109, 163, 277, 300 RAID 5 storage pool 237 RAID array 185, 301, 303 RAID arrays 302 RAID types 301 RAID5 algorithms 338 random IO performance 236 random writes 236 rank to extent pool mapping additional ranks 255 ranks to extent pool mapping considerations 253 RAS capabilities 4 RC management 162 RDAC 54, 80 Read cache 298 Read Data rate 326 read miss performance 111 Read Stability 143 real capacity 447 rebalancing script XIV 266 reboot 106, 204 rebooted 213 receive 96 reconstruction 144 recovery 94, 110, 122, 192, 444 recovery point 161 Redbooks Web site 479 Contact us xvii redundancy 2223, 54, 151, 193, 422 redundant 151, 193, 307, 420 redundant paths 193 redundant SAN 60, 260 redundant SAN network 60 registry 202, 426 relationship 54, 213 relationship_bandwidth_limit 139 relationship_bandwidth_limit parameter 139
reliability 32 reliability characteristics 80 Remote Cluster 133 remote cluster 151 upgrade considerations 52 Remote Copy 133 remote copy 114 Remote copy functions 4 Remote Copy parameters 139 Remote Copy Relationship 132 remote copy relationships increased number 136 Remote Copy service 132 Remote Copy services 132 remote mirroring 37, 149 removed 106, 202 rename 127, 427 repairsevdisk 97 replicate 172 reporting 424 reports 202, 311 reset 201, 420 resources 44, 95, 114, 192, 306, 431 restart 127 restarting 161 restarts 201 restore 166, 214 restricting access 207 re-synchronization support 154 Reverse FlashCopy 4 reverse FlashCopy 44 risk assessment 68 rmmdisk command 85 role 300 roles 302 root 208, 413 round 151 round-robin 96 router technologies 150 routers 151 routes 25 routing 54 RPQ 13, 217218 RSCN 201 rules 119, 127, 192, 422 Rules of Thumb SVC response 345
S
SameWWN.script 55 SAN xv, 11, 43, 45, 126, 191, 306, 419420 availability 199 fabric 11, 126, 199 SAN bridge 16 SAN configuration 11, 147 SAN fabric 11, 126, 194, 422 SAN switch models 22 SAN switches 22 SAN Volume Controller 11, 13, 26, 32, 44, 148, 477478 multipathing 216 Index
489
7521IX.fm
SAN zoning 111, 383 save capacity 102 scalability 12, 43 scalable 11 scaling 50 scan 202 scripts 114, 205 SCSI 111, 201, 442, 449 commands 207, 442 SCSI disk 214 SCSI-3 207 SDD 31, 54, 80, 106, 108, 173, 193, 211, 424 SDD for Linux 217, 478 SDDDSM 195, 424 SE VDisks 100 secondary 115, 172, 300 secondary site 48 secondary SVC 48 security 26, 212 security delete feature 177 segment size 57, 276 separate zone example 35 sequential 105, 193, 298 serial number 195 serial numbers 196 Server 212213, 306, 425 server 13, 122, 201, 275 Servers 215 servers 13, 45, 212, 297 service 52, 307, 420 service assistant 5 setquorum command 77 settings 185, 209, 298, 422 setup 209, 304, 422 SEV 125 SFPs 38 share 118, 193 shared 150, 163, 208 sharing 25, 207, 299 shortcuts 27 showvolgrp command 60 shutdown 106, 127, 201 single initiator zones 32 single storage device 199 single-member aliases 33 Single-tiered storage pool 79 site 50, 115, 449 slice 305 slot number 36 slots 55, 277 slotted design 22 snapshot 126 Software 11, 13, 26, 32, 148, 203, 420, 477478 software 192, 421 Solaris 217, 425 Solid State Drives 4, 6, 44 solution 11, 189, 298, 396 source 26, 449 Source volume 133
space 105, 300 Space Efficient 103 space efficient copy 111 Space Efficient VDisk 125 Space Efficient VDisk Performance 101 Space-efficient function 243 space-efficient VDisk 447 spare 13, 258 speed 23, 163 split 15, 43, 257 split cluster quorum disk 78 split clustered system 20 configure 20 split clustered system configuration 21 split SVC I/O group 4 splitting 103 SPS 59, 253 SSD drives quorum disk 77 SSD managed disks quorum disks 21 SSD mirror 7 SSPC 81 standards 39 star topology 169 start 118, 194, 305, 420 state 109, 192, 431 ConsistentStopped 181 ConsistentSynchronized 181 idling 181 IdlingDisconnected 182 InconsistentCopying 180 InconsistentStopped 180 overview 177 state definitions 180 statistics 222 status 208, 421, 448 storage 11, 44, 105, 191, 297, 420 Storage Bandwidth 134 storage controller 28, 45, 72, 114, 172 storage controllers 28, 45, 56, 79, 118, 245 LUN attributes 74 Storage Manager 55, 437 storage pool array considerations 245 IO capacity 237 Storage Pool Striping 59, 253 storage subsystem aliases 35 storage traffic 13 storagepool striping extent pools 255 Storwise V7000 65 Storwize V7000 30, 246, 268 Storwize V7000 system configure 66 Storwuze performance 317 streaming 112, 299 stride writes 236, 276
490
7521IX.fm
strip 305 Strip Size Considerations 305 strip sizes 305 stripe 72, 303 across disk arrays 73 striped 95, 105, 201, 300 striped mode 121, 301 striped mode VDisks 304 striping 56, 302, 305 DS5000 275 performance advantage 245 striping policy 101 striping workload 73, 246 subsystem cache influence 237 Subsystem Device Driver 31, 54, 80, 106, 108, 173, 196, 211, 425 support 54, 300 SVC xv, 3, 11, 43, 105, 132, 191, 227, 300, 419 Backend Read Response time 337 Managed Disk Group information 312 Managed Disk information 312 performance benchmarks 323 V7000 considerations 268 Virtual Disks 313 XIV considerations 63 XIV port connections 267 SVC cache utilization 334 SVC caching 57, 276 SVC cluster 13, 43, 72, 199, 430 SVC clustered system growong 47 splitting 49 SVC configuration 194 SVC Console code 6 SVC Entry Edition 5 SVC error log 97 SVC extent size 250 SVC for XIV 5 SVC health 380 SVC installations 14, 234 SVC master console 123 SVC node 31, 193, 421 SVC nodes 17, 50, 60, 194, 443 redundant 45 SVC performance 316 Top Volume I/O Rate 344 Top Volumes Data Rate 342 SVC ports 382 SVC rebalancing script 266 SVC reports Cache performance 341 cache utilization 327 CPU Utilization 320 CPU utilization by node 320 CPU utilization percentage 331 Dirty Write percentage of Cache Hits 331 Managed Disk Group 335 MDisk performance 362 Node Cache performance 327 Node CPU Utilization rate 321
node statistics 320 over utilized ports 351 overall IO rate 322 Read Cache Hit percentage 328 Read Cache Hits percentage 332 Read Data rate 326 Read Hit Percentages 331 Readahead percentage of Cache Hits 332 report metrics 320 response time 325 Top Volume Cache performance 341 Top Volume Data Rate performances 341 Top Volume Disk 344 Top Volume Disk performances 341 Top Volume I/O Rate performances 341 Top Volume Response performances 341 Total Cache Hit percentage 328 Total Data Rate 326 Write Cache Flush-through percentage 332 Write Cache Hits percentage 332 Write Cache Overflow percentage 332 Write Cache Write-through percentage 332 Write Data Rate 326 Write-cache Delay Percentage 332 SVC restrictions 48 SVC software 195, 420 SVC traffic 316 SVC V5.1 enhancements 4 SVC zoning 26, 33 svcinfo 81, 107, 195, 421 svcinfo lscluster 155 svcinfo lsmigrate 81 svcmon tool 46 svctask 56, 81, 126, 218, 430 svctask chcluster commands svctask chcluster, 155 SVCTools package 81 switch 191 fabric 13 failure 13, 148, 223 interoperability 39 switch fabric 12, 194 switch ports 18, 380 switch splitting 18 switches 420 -sync option 165 Synchronised relationship 133 synchronization 151 Synchronized 167 synchronized 103, 177 synchronized state 162 Synchronous remote copy 133 system 119, 191, 299, 424 system performance 105, 217, 431
T
tablespace 300 tape 13, 193 tape media 165 Index
491
7521IX.fm
target 67, 193, 443 target ports 60, 194 Target Volume 133 targets 203 test 12, 191 tested 193 thin provisioning 242, 249 FlashCopyconsiderations 244 This 11, 43, 54, 71, 191, 297, 311, 420 thread 205, 305 threads 305 three-way copy service functions 171 threshold 13, 148, 162 throttle 112, 216 throttle setting 113 throttles 112 throttling 113 throughput 122, 200, 210, 234, 298, 300 RAID arrays 74 throughput based 298299 throughput requirements 56 tier 79, 234 tiering 235 time 12, 54, 94, 104, 192, 234, 298, 419 tips 27 Tivoli Storage Manager 173 Tivoli Storage Manager (TSM) 299 tools 191, 422 Top 10 reports Fabric and Switches 352 SVC 318 Top 10 SVC reports I/O Group Performance reports 319 Managed Disk Group Performance reports 335 Node Cache Performance reports 327 Port Performance reports 346 Top Volume Performance reports 341 Topology 424 topology 12, 424 topology issues 17 topology problems 17 Topology Viewer Data Path Explorer 378 Data Path View 383 navigation 377 SVC and Fabric 380 SVC health 380 zone configuration 383 Total Cache Hit percentage 328 traditional 26 traffic 13, 17, 199 congestion 13, 148 Fibre Channel 38 Traffic Isolation 18 traffic threshold 19 transaction 56, 122, 209, 298 transaction based 298299 transaction based workloads 275 Transaction log 300 transceivers 150
transfer 112, 193, 298 transit 13 triangle topology 169 troubleshooting 26, 191 TSM 305 ttle 112 tuning 191
U
UID 445 UID field 85 unique identifier 195 UNIX 122, 223 unmanaged MDisk 110, 172 unsupported topology 171 unused space 105 upgrade 184, 201, 442 upgrade scenarios 174 upgrades 201, 406, 442 upgrading 52, 207 upstream 12 user data 101 user interface 5 users 23, 44, 201 using SDD 211 utility 224
V
V7000 SVC considerations 268 V7000 solution 92 VDisk 31, 194, 300, 421 creating 124 migrating 109 VDisk migration 449 VDisk Mirroring 103 VDisk size maximum 256TB 4 VIO clients 306 VIO server 214, 306 VIOC 214, 306 VIOS 212213, 306 virtual address space 100 virtual disk 111, 214 virtual disks 313 Virtual Fabrics 24 Virtual SAN 25 virtualization 43, 301, 419 virtualization layer 67 virtualization policy 103 virtualizing 201 VMware vStorage API 7 volume abstraction 301 volume group 60, 211 Volume Mirroring 75 volume mirroring feature 44 Volume to Backend Volume Assignment 314 VSAN 25 VSAN trunking 25
492
7521IX.fm
W
Windows 2003 215 workload 54, 95, 209, 298 throughput based 298 transaction based 298 workload type 299 workloads 13, 56, 74, 114, 148, 193, 234, 298 World Wide Node Name setting 54 worldwide node name 25 write cache destage 236 Write Ordering 143 write ordering 144 write penalty 236 write performance 104 writes 193, 245, 250, 300 WWNN 2526, 66, 203 WWNN zoning 26 WWNs 27 WWPN 26, 50, 247 WWPN debug 62 WWPN zoning 26 WWPNs 27, 61, 193, 421
X
XFPs 38 XIV LUN size 265 port naming conventions 64 storage pool layout 268 xiv SVC considerations 63 XIV ports 29, 266 XIV Storage System 75, 246 XIV zoning 29
Z
zone 127, 194, 422 zone configuration 383 zone name 36 zone share 37 zone SVC 17 zoned 12, 193, 443 zones 26, 126, 422 zoneset 35, 444 Zoning 37, 400 zoning 16, 26, 61, 111, 194 HBAs 32 single host 32 Storwize V7000 30 XIV 29 zoning configuration 26 zoning recommendation 149 zoning requirement 148 zoning requirements 136
Index
493
7521IX.fm
494
To determine the spine width of a book, you divide the paper PPI into the number of pages in the book. An example is a 250 page book using Plainfield opaque 50# smooth which has a PPI of 526. Divided 250 by 526 which equals a spine width of .4752". In this case, you would use the .5 spine. Now select the Spine width for the book and hide the others: Special>Conditional Text>Show/Hide>SpineSize(-->Hide:)>Set . Move the changed Conditional text settings to all files in your book by opening the book file with the spine.fm still open and File>Import>Formats the
7521spine.fm
495
To determine the spine width of a book, you divide the paper PPI into the number of pages in the book. An example is a 250 page book using Plainfield opaque 50# smooth which has a PPI of 526. Divided 250 by 526 which equals a spine width of .4752". In this case, you would use the .5 spine. Now select the Spine width for the book and hide the others: Special>Conditional Text>Show/Hide>SpineSize(-->Hide:)>Set . Move the changed Conditional text settings to all files in your book by opening the book file with the spine.fm still open and File>Import>Formats the
7521spine.fm
496
Back cover