Академический Документы
Профессиональный Документы
Культура Документы
cover
Front cover
Power Systems for AIX III: Advanced Administration and Problem Determination
(Course code AN15)
Student Notebook
ERC 1.1
Student Notebook
Trademarks The reader should recognize that the following terms, which appear in the content of this training document, are official trademarks of IBM or other companies: IBM is a registered trademark of International Business Machines Corporation. The following are trademarks of International Business Machines Corporation in the United States, or other countries, or both: AIX HACMP POWER4 POWER6 Power Systems Redbooks System i Tivoli AIX 5L MWAVE POWER5 POWER Gt1 PowerVM RS/6000 System p WebSphere DB2 POWER POWER5+ POWER Gt3 pSeries SP System p5 Workload Partitions Manager
Adobe is either a registered trademark or a trademark of Adobe Systems Incorporated in the United States, and/or other countries. Java and all Java-based trademarks and logos are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Windows is a trademark of Microsoft Corporation in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. Other company, product, or service names may be trademarks or service marks of others.
Copyright International Business Machines Corporation 2009. All rights reserved. This document may not be reproduced in whole or in part without the prior written permission of IBM. Note to U.S. Government Users Documentation related to restricted rights Use, duplication or disclosure is subject to restrictions set forth in GSA ADP Schedule Contract with IBM Corp.
V5.3
Student Notebook
TOC
Contents
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Course description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Agenda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii Unit 1. Advanced AIX administration overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2 Application outages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3 Live Partition Mobility versus Live Application Mobility . . . . . . . . . . . . . . . . . . . . . . 1-5 Maintenance window tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-7 Effective problem management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-10 Before problems occur . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-12 Before problems occur: A few good commands . . . . . . . . . . . . . . . . . . . . . . . . . . 1-14 Steps in problem resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-15 Progress and reference codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-18 Working with AIX Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-21 AIX Support test case data (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-23 AIX Support test case data (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-25 AIX software update hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-26 Relevant documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-28 Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-29 Exercise 1: Advanced AIX administration overview . . . . . . . . . . . . . . . . . . . . . . . 1-30 Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-31 Unit 2. The Object Data Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1 Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2 2.1. Introduction to the ODM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3 What is the ODM? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4 Data managed by the ODM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5 ODM components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7 ODM database files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8 Device configuration summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10 Configuration manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-11 Location and contents of ODM repositories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-12 How ODM classes act together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-14 Data not managed by the ODM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-15 Lets review: Device configuration and the ODM . . . . . . . . . . . . . . . . . . . . . . . . . 2-16 ODM commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17 Changing attribute values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-19 Using odmchange to change attribute values . . . . . . . . . . . 2-21 2.2. ODM database files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-23 Software vital product data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-24 Software states you should know about . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-26
Contents
iii
Student Notebook
Predefined devices (PdDv) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-28 Predefined attributes (PdAt) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-32 Customized devices (CuDv) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-34 Customized attributes (CuAt) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-37 Additional device object classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-38 Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-40 Exercise 3: The Object Data Manager (ODM) . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-41 Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-42 Unit 3. Error monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-1 Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-2 3.1. Working with the error log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-3 Error logging components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-4 Generating an error report using SMIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-6 The errpt command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-9 A summary report (errpt) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-11 A detailed error report (errpt -a) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-12 Types of disk errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-14 LVM error log entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-16 Maintaining the error log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-17 Exercise 2: Error monitoring (part 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-19 3.2. Error notification and syslogd. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-21 Error notification methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-22 Self-made error notification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-24 ODM-based error notification: errnotify . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-26 syslogd daemon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-29 syslogd configuration examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-31 Redirecting syslog messages to error log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-34 Directing error log messages to syslogd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-35 System hang detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-36 Configuring shdaemon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-38 Exercise 2: Error monitoring (part 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-40 3.3. Resource monitoring and control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-41 Resource monitoring and control (RMC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-42 RMC conditions property screen: General tab . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-44 RMC conditions property screen: Monitored Resources tab . . . . . . . . . . . . . . . . .3-45 RMC actions property screen: General tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-46 RMC actions property screen: When in Effect tab . . . . . . . . . . . . . . . . . . . . . . . . .3-47 RMC management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-48 Exercise 2: Error monitoring (part 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-50 Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-51 Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-52 Unit 4. Network Installation Manager basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-1 Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-2 NIM overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-3 Machine roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-5 Boot process for AIX installation (tape or CD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-7
iv AIX Advanced Administration
Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V5.3
Student Notebook
TOC
Boot process for AIX installation (network) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-9 NIM objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-11 Listing NIM objects and their attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-13 NIM configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-14 resources objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-16 resources objects: lpp_source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-18 resources objects: spot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-21 resources objects: mksysb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-24 networks objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-26 machines objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-28 Defining a machine object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-30 Define a client using SMIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-32 NIM operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-34 bos_inst operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-38 More information about NIM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-40 Additional topics in NIM course . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-45 Exercise 4 overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-46 Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-47 Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-48 Unit 5. System initialization: Part I. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1 Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3 5.1. System startup process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-5 How does a System p server or LPAR boot? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6 Loading of a boot image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8 Contents of the boot logical volume (hd5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-10 5.2. Unable to find boot image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-13 Working with bootlists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-14 Starting System Management Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-16 Working with bootlists in SMS (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-18 Working with bootlists in SMS (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-20 5.3. Corrupted boot logical volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-21 Boot device alternatives (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-22 Boot device alternatives (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-24 Accessing a system that will not boot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-25 Booting in maintenance mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-28 Working in maintenance mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-29 How to fix a corrupted BLV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-31 Checkpoint (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-33 Checkpoint (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-34 Exercise 3: System initialization: Part 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-35 Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-36 Unit 6. System initialization: Part II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1. AIX initialization part 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . System software initialization overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rc.boot 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Copyright IBM Corp. 2009
Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Contents
Student Notebook
rc.boot 2 (part 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-8 rc.boot 2 (part 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-10 rc.boot 3 (part 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-12 rc.boot 3 (part 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-14 rc.boot summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-16 Fixing corrupted file systems and logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-17 Lets review: rc.boot (1 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-19 Lets review: rc.boot (2 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-20 Lets review: rc.boot (3 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-21 6.2. AIX initialization part 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-23 Configuration manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-24 Config_Rules object class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-26 cfgmgr output in the boot log using alog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-28 /etc/inittab file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-29 Boot problem management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-31 Lets review: /etc/inittab file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-34 Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-36 Exercise 4: System initialization part 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-37 Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-38 Unit 7. Disk management theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-1 Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-2 7.1. LVM data representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-3 LVM terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-4 LVM identifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-6 LVM data on disk control blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-8 LVM data in the operating system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-10 Contents of the VGDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-11 VGDA example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-13 The logical volume control block (LVCB) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-16 How LVM interacts with ODM and VGDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-18 ODM entries for physical volumes (1 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-20 ODM entries for physical volumes (2 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-22 ODM entries for physical volumes (3 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-23 ODM entries for volume groups (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-24 ODM entries for volume groups (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-25 ODM entries for logical volumes (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-26 ODM entries for logical volumes (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-27 ODM-related LVM problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-28 Fixing ODM problems (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-30 Fixing ODM problems (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-32 Intermediate level ODM commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-35 Exercise 7: LVM metadata and problems (parts 1 and 2) . . . . . . . . . . . . . . . . . . .7-37 7.2. Failed disks: Mirroring and quorum issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-39 Mirroring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-40 Stale partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-42 Mirroring rootvg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-44 VGDA count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-46
vi AIX Advanced Administration
Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V5.3
Student Notebook
TOC
Quorum not available . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nonquorum volume groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Forced vary on (varyonvg -f) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Physical volume states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exercise 7: LVM Metadata and problems (parts 4 and 5) . . . . . . . . . . . . . . . . . . . Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Unit 8. Disk management procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-1 Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-3 8.1. Disk replacement techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-5 Disk replacement: Starting point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-6 Procedure 1: Disk mirrored . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-8 Procedure 2: Disk still working . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-10 Procedure 2: Special steps for rootvg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-12 Procedure 3: Disk in missing or removed state . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-14 Procedure 4: Total rootvg failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-16 Procedure 5: Total non-rootvg failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-18 Frequent disk replacement errors (1 of 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-20 Frequent disk replacement errors (2 of 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-21 Frequent disk replacement errors (3 of 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-22 Frequent disk replacement errors (4 of 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-23 8.2. Export and import . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-25 Exporting a volume group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-26 Importing a volume group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-28 importvg and existing logical volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-30 importvg and existing file systems (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-31 importvg and existing file systems (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-33 Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-35 Exercise 8: Exporting and importing volume groups . . . . . . . . . . . . . . . . . . . . . . . 8-36 Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-37 Unit 9. Install and backup techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-1 Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-2 9.1. Alternate disk installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-3 Topic 1 objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-4 Alternate disk installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-5 Alternate mksysb disk installation (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-8 Alternate mksysb disk installation (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-10 Alternate disk rootvg cloning (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-11 Alternate disk rootvg cloning (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-12 Removing an alternate disk installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-13 NIM alternate disk migration (nimadm) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-15 Exercise 9, topic 1: Alternate disk install . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-17 9.2. Using multibos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-19 Topic 2 objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-20 multibos overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-21 Active and standby BOS logical volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-23
Copyright IBM Corp. 2009
Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Contents
vii
Student Notebook
Setting up a standby BOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-24 Other multibos operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-26 Exercise 9, topic 2: multibos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-29 9.3. JFS2 snapshot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-31 Topic 3 objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-32 JFS2 snapshot (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-33 JFS2 snapshot (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-35 JFS2 snapshot mechanism (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-37 JFS2 snapshot mechanism (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-38 JFS2 snapshot SMIT menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-39 Creating snapshots (external) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-40 Creating snapshots (internal) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-43 Listing snapshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-44 Using a JFS2 snapshot to recover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-45 Using a JFS2 snapshot to back up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-47 JFS2 snapshot space management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-48 Exercise 9, topic 3: JFS2 snapshot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-49 Checkpoint (1 of 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-50 Checkpoint (2 of 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-51 Checkpoint (3 of 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-52 Checkpoint (4 of 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-53 Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-54 Unit 10. Workload partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-1 Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-2 10.1. Workload partitions review. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-3 Topic 1 objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-4 AIX workload partitions (WPAR) review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-5 System WPAR and application WPAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-8 System WPAR file systems space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-10 10.2. WPAR Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-13 Topic 2 objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-14 Workload Partition Manager overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-15 Workload Partition Manager main GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-17 WPAR Manager topology: Default configuration . . . . . . . . . . . . . . . . . . . . . . . . .10-19 Installation and configuration: WPAR Manager . . . . . . . . . . . . . . . . . . . . . . . . . .10-21 Installation and configuration: WPAR agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-24 Authentication and WPAR Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-26 WPAR Manager functional view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-28 Basic management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-30 Creating a WPAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-31 WPAR monitoring and reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-32 Resources view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-33 Manual relocation or mobility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-34 Tasks activity and logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-35 WPAR 1.2 log locations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-37 10.3. Application mobility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-39 Topic 3 objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-40
viii AIX Advanced Administration
Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V5.3
Student Notebook
TOC
Application mobility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . WPAR Manager relocation support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Compatibility issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Live partition mobility versus live application mobility . . . . . . . . . . . . . . . . . . . . . WPAR enhanced live mobility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Steps for WPAR enhanced live mobility (WPAR Mgr GUI) . . . . . . . . . . . . . . . . . Enhanced relocation workflow (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Enhanced relocation workflow (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Enhanced relocation error (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Enhanced relocation error (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Steps for WPAR enhanced live mobility (command line) . . . . . . . . . . . . . . . . . . Enhanced live relocation: CLI (1 of 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Enhanced live relocation: CLI (2 of 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Enhanced live relocation: CLI (3 of 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Enhanced live relocation: CLI (4 of 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Steps for WPAR static relocation (WPAR Mgr GUI) . . . . . . . . . . . . . . . . . . . . . . Steps for checkpoint and restart relocation: CLI . . . . . . . . . . . . . . . . . . . . . . . . . Checkpoint and restart relocation: CLI (1 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . Checkpoint and restart relocation: CLI (2 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . Checkpoint and restart relocation: CLI (3 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . Checkpoint (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Checkpoint (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10-41 10-42 10-44 10-46 10-48 10-50 10-52 10-53 10-54 10-55 10-56 10-57 10-58 10-59 10-62 10-63 10-65 10-67 10-68 10-69 10-71 10-72 10-73
Unit 11. The AIX system dump facility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-1 Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-2 System dumps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-3 Types of dumps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-4 How a system dump is invoked . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-6 LED 888 code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-8 When a dump occurs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-10 The sysdumpdev command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-11 Dedicated dump device (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-16 Dedicated dump device (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-17 Estimating dump size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-19 dumpcheck utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-21 Methods of starting a dump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-23 Start a dump from a TTY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-26 Generating dumps with SMIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-28 Dump-related LED codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-29 Copying system dump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-31 Automatically reboot after a crash . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-33 Sending a dump to IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-35 Use kdb to analyze a dump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-38 Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-41 Exercise 11: System dump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-42 Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-43
Contents
ix
Student Notebook
Appendix A. Checkpoint solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1 Appendix B. Command summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1 Appendix C. AIX dump code and progress codes. . . . . . . . . . . . . . . . . . . . . . . . . . . C-1 Appendix D. Auditing security related events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-1 Appendix E. Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-1
V5.3
Student Notebook
TMK
Trademarks
The reader should recognize that the following terms, which appear in the content of this training document, are official trademarks of IBM or other companies: IBM is a registered trademark of International Business Machines Corporation. The following are trademarks of International Business Machines Corporation in the United States, or other countries, or both: AIX HACMP POWER4 POWER6 Power Systems Redbooks System i Tivoli AIX 5L MWAVE POWER5 POWER Gt1 PowerVM RS/6000 System p WebSphere DB2 POWER POWER5+ POWER Gt3 pSeries SP System p5 Workload Partitions Manager
Adobe is either a registered trademark or a trademark of Adobe Systems Incorporated in the United States, and/or other countries. Java and all Java-based trademarks and logos are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Windows is a trademark of Microsoft Corporation in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. Other company, product, or service names may be trademarks or service marks of others.
Trademarks
xi
Student Notebook
xii
V5.3
Student Notebook
pref
Course description
Power Systems for AIX III: Advanced Administration and Problem Determination Duration: 5 days Purpose
This course provides advanced AIX system administrator skills with a focus on availability and problem determination. It provides detailed knowledge of the ODM database where AIX maintains so much configuration information. It shows how to monitor for and deal with AIX problems. There is special focus on dealing with Logical Volume Manager problems, including procedures for replacing disks. Several techniques for minimizing the system maintenance window are covered. It also covers how to migrate AIX Workload Partitions to another system with minimal disruption. While the course includes some AIX 6.1 enhancements, most of the material is applicable to prior releases of AIX.
Audience
This is an advanced course for AIX system administrators, system support, and contract support individuals with at least six months of experience in AIX.
Prerequisites
You should have basic AIX System Administration skills. These skills include: Use of the Hardware Management Console (HMC) to activate a logical partition running AIX and to access the AIX system console Install an AIX operating system from an already configured NIM server Implementation of AIX backup and recovery Manage additional software and base operating system updates Familiarity with management tools such as SMIT Understand how to manage file systems, logical volumes, and volume groups
Course description
xiii
Student Notebook
Understand basic Workload Partition (WPAR) concepts and commands (recommended for the WPAR Manager content) Mastery of the UNIX user interface including use of the vi editor, command execution, input and output redirection, and the use of utilities such as grep These skills could be developed through experience or by formal training. Recommended training courses to obtain these prerequisite skills are either of the following: Power Systems for AIX III: Advanced Administration and Problem Determination (AN12) and its prerequisites AIX System Administration I: Implementation (AU14) and its prerequisites. (Note that AU14 does not cover WPARs) If the student has AIX system administration skills, but is not familiar with the LPAR environment, those skills may be obtained by attending either of the following: AU73/Q1373 System p Virtualization I: Planning and Configuration AN11 Power Systems Administration I: LPAR Configuration
Objectives
On completion of this course, students should be able to: Perform system problem determination and reporting procedures including analyzing error logs, creating dumps of the system, and providing needed data to the AIX Support personnel Examine and manipulate Object Data Manager databases Identify and resolve conflicts between the Logical Volume Manager (LVM) disk structures and the Object Data Manager (ODM) Complete a very basic configuration of Network Installation Manager to provide network boot support for either system installation or booting to maintenance mode Identify various types of boot and disk failures and perform the matching recovery procedures Implement advanced methods such as alternate disk install, multibos, and JFS2 snapshots to use a smaller maintenance window Install and configure Workload Partition Manager to support WPAR management and to implement Live Application Mobility (LAM)
xiv
V5.3
Student Notebook
pref
Contents
Overview of advanced administration techniques Error monitoring The Object Data Manager (ODM) Basic Network Installation Manager (NIM) configuration System initialization problem determination Disk management theory and procedures Advanced techniques for installation and backup Workload Partition (WPAR) Manager and Live Application Mobility The AIX system dump facility
Course description
xv
Student Notebook
xvi
V5.3
Student Notebook
pref
Agenda
Day 1
Welcome Unit 1 - Advanced AIX administration overview Exercise 1 - Problem diagnostic information Unit 2 - The Object Data Manager Exercise 2 - The Object Data Manager Unit 3 - Error monitoring Exercise 3 - Error monitoring
Day 2
Unit 4 - Network Installation Manager basics Exercise 4 - Basic NIM configuration Unit 5 - System initialization: Part I Exercise 5 - System initialization: Part I (optional) Exercise 3 Part 3 - Using RMC to monitor resources on a system
Day 3
Unit 6 - System initialization: Part II Exercise 6 - System initialization: Part: II Unit 7 - Disk management theory Exercise 7 - LVM metadata and problems Unit 8 - Disk management procedures Exercise 8 parts 1 and 2: Disk replacement techniques (optional) Exercise 7 part 5 - Manually fixing an LVM ODM problem
Day 4
Unit 8, Part 2 - Export and import (to fix VGDA/ODM conflict) Exercise 8 parts 3 and 4 - Disk management procedures Unit 9 - Install and backup techniques Exercise 9, part 1 - Alternate disk copy (pre-clone) Unit 9, topic 2 - multibos Exercise 9, part 1 - Wait for clone completion (30 min clone) Exercise 9, part 1 - Alternate disk copy (post-clone) Exercise 9, part 2 - multibos (pre-clone) Unit 9, topic 3 - JFS2 snapshot Exercise 9, part 2: wait for clone completion (37 min clone Exercise 9, part 2: multibos (post-clone)
Copyright IBM Corp. 2009
Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Agenda
xvii
Student Notebook
Exercise 9, part 3: JFS2 snapshot Unit 10, topic 1 - Workload partitions review Unit 10, topic 2 - WPAR Manager Exercise 10 part 1 - Installing WPAR Manager (optional) Exercise 7 part 3 - Using intermediate LVM commands
Day 5
Exercise 10 part 2 - Create and activate a WPAR Unit 10, topic 3 - Application mobility Exercise 10 part 3 - Enhanced Live Application Mobility Exercise 10 part 4- Working with static relocation Unit 11 - The AIX system dump facility Exercise 11 - System dump facility (optional) Exercise 10 part 4 - Working with static relocation Wrap up / Evaluations
V5.3
Student Notebook
Uempty
References
SG24-5496 SG24-5766 SG24-7559 Problem Solving and Troubleshooting in AIX 5L (Redbook) AIX 5L Differences Guide Version 5.3 Edition (Redbook) IBM AIX Version 6.1 Differences Guide (Redbook)
1-1
Student Notebook
Unit objectives
IBM Power Systems
After completing this unit, you should be able to: List the steps of a basic methodology for problem determination List AIX features that assist in minimizing planned downtime or shortening the maintenance window Explain how to find documentation and other key resources needed for problem resolution
AN151.0
Notes:
1-2
V5.3
Student Notebook
Uempty
Application outages
IBM Power Systems
AN151.0
Notes: Introduction
Providing system availability is a major responsibility of any system administrator. An outage may be caused by a functional problem (such as an application or system crash) or a server performance problem (business is seriously impacted due to poor response times or late jobs). There are many approaches to dealing with this.
Unplanned outages
When most of us think of availability, we think of unplanned outages. Regular hardware and software maintenance can often avoid these outages. Designing the computing facility to have redundant components (power, network adapters, network switches, storage, and more) can make the overall system resilient to the failure of individual components. Performance problems are often the result of failing to do proper capacity planning, resulting in not enough resources (memory, processors, network bandwidth, or disk I/O bandwidth) to handle the increased workload. If there is no change control to manage what
Copyright IBM Corp. 2009 Unit 1. Advanced AIX administration overview
Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
1-3
Student Notebook
work is placed on a system, capacity planning is even more challenging. Furthermore, uncontrolled changes to a system result in uncontrolled exposure to possible outages created by those changes, an thus unplanned outages. Computer viruses and other malicious attacks by computer hackers can also reduce system availability (in addition to the exposure of losing proprietary information). Good data security policies are essential. Even when implementing good policies in these areas, some unplanned outages will still happen. In these situations, the system administrator needs to have a plan for minimizing the impact and recovering as quickly as possible. One common approach is to have an alternate system that can take over the work of the failed system. High Availability Cluster Multi-Processing (HACMP) provides a system for either concurrent processing by multiple systems, or an automated fall-over to a backup system, thus minimizing the impact of a server failure. Such server redundancy can be designed to work within a single facility or be divided between different geographical locations. Obviously, rapid notification of a problem, effective and prompt diagnosis of the cause, and being able to quickly implement an effective solution will all contribute to a smaller mean time to recovery.
Planned outages
By using change control, the risk associated with certain categories of potential unplanned outages can be managed by implementing the changes during planned windows of time when the impact of any unexpected problem (resulting from the change) is minimized. In addition, there are certain types of changes for which an outage is unavoidable. Some facilities will implement multiple types of maintenance windows. One type would be frequent short maintenance windows for any administrative work that will compete with applications for resources (performance impact) or have a small chance of having a functional disruption. Another type would be a less frequent window in which any reboot of the system or any major change to the level of the operating system or major subsystems, such as database software, would be allowed. Sometimes, the amount of time in a maintenance window is relatively small and the work has to be carefully planned. You also need to allow time to recover if any thing goes wrong due to the maintenance. Any needed resources that can be pre-staged will help expedite the work. Any approach that can speed recovery after a problem occurs is also useful. For systems which need to be up 24 hours a day, seven days a week, and every day in the year (24x7x365), even a short outage cannot be tolerated. In those situation, a method to non-disruptively move the applications to another system can be invaluable. If an HACMP cluster solution is already in place to handle unplanned outages, then this can be used to manually fall-over the services to another system while maintenance is being done. Other solutions are to use Live Partition Mobility or Live Application Mobility.
1-4
V5.3
Student Notebook
Uempty
Live Partition Mobility allows the Multiple systems managed by a single HMC migration of a running logical partition to another physical server. Server 1 Server 2 Operating system, applications, P1 P2 P3 P1 P5 and services are not stopped during the process Requires POWER6 , AIX 5.3 HMC and VIO server Network
VIOS
Live Application Mobility allows moving a workload partition from one server to another. Without requiring the workload running in the AIX # 2 WPAR to be restarted Provides outage avoidance Workload 2. AIX # 1 1. Partition and multi-system Workload Billing AIX # 3 Partition workload balancing Workload Workload Workload Data Mining n Partition Partitio Partition Test EMail App Srv ad rklo Requires AIX 6.1 Wo tition
Workload Partition Web Par Training Workload Partition Dev
Policy Workload Partitions Manager
VIOS
AN151.0
Notes:
As the number of hosted partitions and applications increases, finding a maintenance window acceptable to all becomes increasingly difficult. Live partition or application mobility allow you to move your partitions around such that you can perform disruptive operations on the machine when it best suits you, rather than when it causes the least inconvenience to the users.
1-5
Student Notebook
Live Application Mobility (LAM) is a new capability that allows a client to relocate a running WPAR from one system to another, without requiring the workload running in the WPAR to be restarted. LAM is intended for use within a data center and requires the use of the new Licensed Program Product, the IBM AIX Workload Partitions Manager. Live Application Mobility differs significantly from Live Partition Mobility in that Live Partition Mobility is a feature of POWER6 processors. As such, it can be used on operating systems other than AIX 6, such as Linux or earlier AIX versions. On the other hand, WPAR is specifically a feature of AIX 6, but it can run on various hardware platforms (for example: POWER6, POWER5 or POWER5+, or POWER4 systems).
1-6
V5.3
Student Notebook
Uempty
System backups
Minimizing rootvg size Snapshot techniques for user file systems
AN151.0
1-7
Student Notebook
An important technique, that we will cover, is the use of an alternate storage for the target of the software update. What we mean is that the updates are not made to the rootvg, but rather to a copy of the rootvg. This has two advantages. First, there is no change being made to the active rootvg. For locations that make a distinction between changing the level of the operating system and simply doing work that has a performance impact, the actual time consuming update activity can be done in a more frequently available window. Then when a major maintenance window arrives, you only need to reboot to make it effective. The second advantage, and to some the more important advantage, is the ease of recovery. If you find that there are serious problems with running under the new level of code, you only need to reboot back to the earlier code level, rather than recover from a mksysb or reject the entire update. Of course, the down side is that you will need to reboot to make the update effective; but, this is something a major maintenance window should expect. There are two techniques that we will cover. One technique, is creating an alternate set of logical volumes that are copies of the rootvg BOS logical volumes. This is called multibos. The other technique, is creating an alternate volume group which is a clone of the rootvg. In each case, you would apply the maintenance to the copy and then later reboot to make it effective.
Expediting backups
Another common maintenance activity is backing up the system. Unless you have an application that is designed to manage a recovery process using fuzzy backups, you will need to quiesce the application activity long enough to be sure that there are no inconsistencies in the backup. The term fuzzy backup refers to a backup in which the application was making changes during the backup. For a given transaction, multiple data changes are made. Some of these transaction related changes are made before that data was backed up, while other changes were made after that data was backed up. Thus the backup has one piece of data which reflects the transaction and another piece of data that does not reflect the transaction. The two pieces of data are inconsistent and such a backup is referred to as fuzzy. For the rootvg itself, the size of the rootvg should be minimized. It should only contain what is needed for the OS. All user data and other non-essential files should be backed up and restored separately. An example would be the standard location of a software repository: /usr/sys/inst.images. The software repository can be very large and yet this common path resides in the /usr file system, which is in the rootvg. Placing the software repository in a separate file system with its own recovery plan (could be using the original media as the backup) can help reduce backup and recovery time. Another common example is the /home filesystem. If users have vast amounts of data stored there, then over mounting with a separate file system can again speed up working with the rootvg. There other file systems such as /tmp that could have contents be eliminated from the system backup.The trick is that these would need to be excluded (not mounted or identified in /etc/exclude.rootvg) from the backup during mksysb execution, and then
1-8
V5.3
Student Notebook
Uempty
separately recovered from their own backup. Other user data will be in separate user volume groups. With the emphasis on separate backups for non-BOS data, there comes a need to minimize how long the applications need to be quiesced and still have data consistency. One technique that AIX provides is JFS2 snapshots, which will allow us to only very briefly quiesce the application and still have a consistent picture of the data at a single point in time. Then we can either use that snapshot of the data as its own backup, or base an actual backup upon that snapshot (in order to have off-site storage of the backup). There other facilities for doing snapshot captures of data. Some are part of the storage subsystems and some are part of total storage solutions such as Tivoli Storage Manager. Our focus will be on the facility that is provided with AIX: JSF2 snapshot.
1-9
Student Notebook
Keep system documentation current Keep maintenance up to date. Use a problem determination methodology. If an AIX bug:
Collect problem information. Open problem report with AIX Support. Provide snap with information.
AN151.0
System maintenance
Sometimes code works well under normal testing or production circumstances, but can have a poor logic discovered when faced with an unanticipated situation. Alternatively, it could be some non-central aspect of the code that is not noticed normally. The number of facilities using this code is large enough that there is a good chance that one of the facilities will detect and report the problem not long after release of the new code level.
1-10 AIX Advanced Administration
Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V5.3
Student Notebook
Uempty
The fix for the code defect will usually come out in the next released fix pack. On the other hand, many facilities may not be effected by or be concerned about the code defect problem for months, until the circumstances arise in which it represents a problem. By installing newer service packs, a facility can benefit from the experience of others and avoid being impacted by known problems. Obviously there is always the possible exposure that a new fix pack will introduce new problems, while solving many old problems. This course will cover some techniques to use in applying fix packs.
Problem determination
Once you find yourself impacted by what you believe to be a product defect, you will need to obtain prompt resolution. While there is no substitute for experience (the ability to recognize a situation and remember the details of how you dealt with it the last time a similar problem occurred), many problems will be most effectively solved by following a well developed problem determination methodology. This course will cover a basic problem determination methodology.
Problem determination
When you find yourself impacted by what you believe to be a product defect, you will need to contact AIX Support. Before contacting AIX Support, you should write up a description of the problem and the surrounding circumstances. When you open a new Problem Management Report (PMR) with AIX Support, you will be expected to provide them with a wealth of information to assist them in determining the cause of the problem. The snap command is a common tool to assist in collecting a vast amount of information about the environment surrounding the problem. The course materials will cover these problem reporting procedures.
1-11
Student Notebook
Effective problem determination starts with a good understanding of the system and its components. The more information you have about the normal operation of a system, the better.
System configuration Operating system level Applications installed Baseline performance Installation, configuration, and service manuals
AN151.0
V5.3
Student Notebook
Uempty
- Volume groups (names, just a bunch of disks (JBOD) or redundant array of independent disks (RAID) - Logical volumes (mirrored or not, which VG, type) - Filesystems (which VG, what applications) - Memory (size) and paging spaces (how many, location)
1-13
Student Notebook
lspv lscfg prtconf lsvg lsps lsfs lsdev getconf bootinfo snap
Lists physical volumes, PVID, VG membership Provides information regarding system components Displays system configuration information Lists the volume groups Displays information about paging spaces Gives file system information Provides device information Displays values of system configuration variables Displays system configuration information (unsupported) Collects system data
Copyright IBM Corporation 2009
AN151.0
V5.3
Student Notebook
Uempty
1.Identify the problem 2. Talk to users to define the problem 3. Collect system data 4. Resolve the problem
AN151.0
1-15
Student Notebook
Suggested questions
What is the problem? What is the system doing (or not doing)? How did you first notice the problem? When did it happen? Have any changes been made recently?
Keep them talking until the picture is clear. Ask as many questions as you need to in order to get the entire history of the problem.
V5.3
Student Notebook
Uempty
1-17
Student Notebook
Progress codes System reference codes (SRCs) Service request numbers (SRNs) Obtained from:
Front panel of system enclosure HMC or IVM (for logically partitioned systems) Operator console message or diagnostics (diag utility)
AN151.0
Notes: Introduction
AIX provides progress and error indicators (display codes) during the boot process. These display codes can be very useful in resolving startup problems. Depending on the hardware platform, the codes are displayed on the console and the operator panel.
Operator panel
For non-LPAR systems, the operator panel is an LED display on the front panel. POWER4, POWER5, and POWER6-based systems can be divided into multiple Logical Partitions (LPARs). In this case, a system-wide LED display still exists on the front panel. However, the operator panel for each LPAR is displayed on the screen of the Hardware Management Console (HMC). The HMC is a separate system which is required when running multiple LPARs. Regardless of where they are displayed, they are often referred to as LED Display Codes.
V5.3
Student Notebook
Uempty
Documentation
Note: all information on Web sites and their design is based upon what is available at the time of this course revision. Web site URLs and the design of the related Web pages often change. Online hardware documentation and AIX message codes are available at: http://publib.boulder.ibm.com/infocenter/systems - Many of the codes you will deal with are actually hardware or firmware related. For those codes, you need to navigate to the infocenter that specializes in system hardware. The content area has popular links for accessing code information, or you can use search strings such as: system reference codes, service request numbers, or service support troubleshooting. - For AIX codes and messages, you will need to navigate to the Operating System infocenter for AIX.
1-19
Student Notebook
From here you can use the search string of AIX message center to obtain information on various codes (including the seven digit message codes). One very useful reference that you can find at the AIX infocenter is the: RS/6000 Eserver pSeries Diagnostic Information for Multiple Bus Systems (SA38-0509). Chapter 30 has AIX diagnostic numbers and location codes. It provides descriptions for the numbers and characters that display on the operator panel and descriptions of the location codes used to identify a particular item.
V5.3
Student Notebook
Uempty
1-800-IBM-SERV (1-800-426-7378) Level 1 will collect information and assign PMR number Route to level 2 responsible for the product You may be asked to collect additional information to upload They may ask you to update to a specific TL or SP
APAR for your problem already addressed Need to have a standard environment for them to investigate
Copyright IBM Corporation 2009
AN151.0
Notes:
If you believe that your problem is the result of a system defect, you can call AIX Support to request assistance. Before you call 1-800-IBM-SERV, it is a good idea to have certain information ready. They will want to verify your name against a list of names associated with your customer number, and validate that your customer number has support for the product in question. They will also need to know some details about the hardware and software environment in which the problem is occurring - such as your MTMS (machine type, model, serial), your AIX OS level, and the level of any other relevant software. Of course, you need to explain your problem, providing as much detail as possible, especially any error messages or codes. The level 1 personnel will ask you for the priority of your problem. Severity level 1(critical) indicates that the function does not work, your business is severely impacted, there is no work around, and that there needs to be an immediate solution. Be aware that, for severity level 1, you will be expected to be available 24x7 until the problem is resolved.
1-21
Student Notebook
Severity level 2 (significant impact) indicates that the function is usable but is limited in a way that your business is severely impacted. Severity level 3 (some impact) indicates that the program is usable with less significant features (not critical to operations) unavailable. Severity level 4 (minimal impact) indicates that the problem causes little impact on operations, or a reasonable circumvention to the problem has been implemented. Level 1 will assign you a PMR number (actually a PMR and branch number combination) for tracking purposes. Each time, in the future, when you call about this problem, you should have the PMR and branch numbers at hand. Once the basic information has been collected, you are passed to level 2 personal for the product area for which you are having a problem. They will work with you in investigating the nature and cause of your problem. They will search the support database to see if it is a known problem that is either already being worked on or has a solution already developed. In many cases, they will request that you update to a specific technology level and service pack that already includes the fix. If they do not have a fix, they may still ask you to update your system and determine if the problem still exists. If the problem still exists, they now have a known software environment to work with. At this point they will often ask for a complete set of information from your system to be collected and uploaded to their server, to support their investigation. The basic tool for collecting your system information is the snap command.
V5.3
Student Notebook
Uempty
Run the following (or very similar) commands to gather snap information:
# snap a <Copy any extra data to the /tmp/ibmsupt/testcase or the /tmp/ibmsupt/other directory.> # snap c
# mv /tmp/ibmsupt/snap.pax.Z \ PMR#.b<branch#>.c<country#>.snap.pax.Z
Copyright IBM Corporation 2009
AN151.0
1-23
Student Notebook
V5.3
Student Notebook
Uempty
AN151.0
1-25
Student Notebook
Fix bundles
Collections of fileset updates
Interim fixes
Special situation code replacements Delay for normal PTF packaging is too slow Managed with efix tool
Copyright IBM Corporation 2009
AN151.0
V5.3
Student Notebook
Uempty
Fix bundles
It is useful to collect many accumulated PTFs together and test them together. This can then be used as a base line for a new cycle of enhancements and corrections. By testing them together, it is often possible to catch unexpected interactions between them. There are two types of AIX fix bundles. One type of fix bundle is a Technology Level (TL) update (formally known as Maintenance Level or ML). This is a major fix bundle which not only includes many fixes for code problems, but also includes minor functional enhancements. You can identify the current AIX technology level by running the oslevel -r command. Another type of bundling is a Service Pack (SP). A Service Pack is released more frequently than a Technology Level (between TL releases) and usually only contains needed fixes. You can identify the current AIX technology level and service pack by running the oslevel -s command. For the oslevel command to reflect a new TL or SP, all related filesets fixes must be installed. If a single fileset update in the fix bundle is not installed, the TL or SP level will not change.
Interim fixes
On rare occasions, a customer has an urgent situation which needs fixes for a problem so quickly that they cannot wait for the formal PTF to be released. In those situations, a developer may place one or more individual file replacements on an FTP server and allow the system administrator to download and install them. Originally, this would simply involve manually copying the new files over the old files. But this created problems, especially in identifying the state of a system which later experienced other (possibly related) problems or in backing out the changes. Today, there is a better methodology for managing these interim fixes using the efix command. Security alerts will often provide interim fixes for the identified security exposure. Depending upon your own risk analysis, you might immediately use the interim fix, or wait for the next service pack (which will include these security fixes). The syntax and use of the efix command was covered in the prerequisite course.
1-27
Student Notebook
Relevant documentation
IBM Power Systems
The System p and AIX information Center and links for both:
AIX 5L Version 5.3 AIX Version 6.1
AN151.0
IBM Redbooks
Redbooks can be viewed, downloaded, or ordered from the IBM Redbooks Web site: http://www.redbooks.ibm.com
V5.3
Student Notebook
Uempty
Checkpoint
IBM Power Systems
1. What are the four major problem determination steps? _________________________________________ _________________________________________ _________________________________________ _________________________________________ 2. Who should provide information about system problems? _________________________________________ _________________________________________ 3. True or False: If there is a problem with the software, it is necessary to get the next release of the product to resolve the problem. 4. True or False: Documentation can be viewed or downloaded from the IBM Web site.
Copyright IBM Corporation 2009
AN151.0
Notes:
1-29
Student Notebook
Recording system information Finding reference code documentation Creating a snap file
AN151.0
Notes:
V5.3
Student Notebook
Uempty
Unit summary
IBM Power Systems
Having completed this unit, you should be able to: List the steps of a basic methodology for problem determination List AIX features that assist in minimizing planned downtime or shortening the maintenance window Explain how to find documentation and other key resources needed for problem resolution
AN151.0
Notes:
1-31
Student Notebook
V5.3
Student Notebook
Uempty
References
Online Online Online AIX Version 6.1 Command Reference volumes 1-6 AIX Version 6.1 General Programming Concepts: Writing and Debugging Programs AIX Version 6.1 Technical Reference: Kernel and Subsystems
Note: References listed as online above are available through the IBM Systems Information Center at the following address: http://publib.boulder.ibm.com/infocenter/systems
2-1
Student Notebook
Unit objectives
IBM Power Systems
After completing this unit, you should be able to: Describe the structure of the ODM Use the ODM command line interface Explain the role of the ODM in device configuration Describe the function of the most important ODM files
AN151.0
2-2
V5.3
Student Notebook
Uempty
2-3
Student Notebook
The Object Data Manager (ODM) is a database intended for storing system information. Physical and logical device information is stored and maintained through the use of objects with associated characteristics.
AN151.0
Notes:
2-4
V5.3
Student Notebook
Uempty
Devices
Software
ODM
SMIT menus
TCP/IP configuration
NIM
AN151.0
2-5
Student Notebook
2-6
V5.3
Student Notebook
Uempty
ODM components
IBM Power Systems
uniquetype
tape/scsi/scsd
attribute
block_size
deflt
none
values
0-2147483648,1
disk/scsi/osdisk
pvid
none
tty/rs232/tty
login
disable
AN151.0
2-7
Student Notebook
Predefined device information Customized device information Software vital product data SMIT menus Error log, alog, and dump information System resource controller Network Installation Manager (NIM)
Figure 2-5. ODM database files
PdDv, PdAt, PdCn CuDv, CuAt, CuDep, CuDvDr, CuVPD, Config_Rules history, inventory, lpp, product sm_menu_opt, sm_name_hdr, sm_cmd_hdr, sm_cmd_opt SWservAt SRCsubsys, SRCsubsvr, ... nim_attr, nim_object, nim_pdattr
AN151.0
Current focus
In this unit, we will concentrate on ODM classes that are used to store device information and software product data. At this point, we will narrow our focus even further and confine our discussion to ODM classes that store device information.
2-8
V5.3
Student Notebook
Uempty
2-9
Student Notebook
Predefined databases
PdCn
PdDv PdAt
Config_Rules
Customized databases
CuDep CuDv CuAt
CuDvDr
Copyright IBM Corporation 2009
CuVPD
AN151.0
V5.3
Student Notebook
Uempty
Configuration manager
IBM Power Systems
Predefined
PdDv PdAt PdCn
Config_Rules
cfgmgr
Customized
CuDv CuAt CuDep CuDvDr CuVPD
Copyright IBM Corporation 2009
Methods
Define
Device Driver
Load
Configure Change
Unload
Unconfigure Undefine
AN151.0
2-11
Student Notebook
CuDv CuAt CuDep CuDvDr CuVPD Config_Rules history inventory lpp product nim_* SWservAt SRC*
Network
/etc/objrepos
/usr/lib/objrepos
/usr/share/lib/objrepos
AN151.0
Notes: Introduction
To support diskless, dataless and other workstations, the ODM object classes are held in three repositories. Each of these repositories is described in the material that follows.
/etc/objrepos
This repository contains the customized devices object classes and the four object classes used by the Software Vital Product Database (SWVPD) for the / (root) part of the installable software product. The root part of the software contains files that must be installed on the target system. To access information in the other directories, this directory contains symbolic links to the predefined devices object classes. The links are needed because the ODMDIR variable points to only /etc/objrepos. It contains the part of the product that cannot be shared among machines. Each client must have its own copy. Most of this software requiring a separate copy for each machine is associated with the configuration of the machine or product.
2-12 AIX Advanced Administration
Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V5.3
Student Notebook
Uempty
/usr/lib/objrepos
This repository contains the predefined devices object classes, SMIT menu object classes, and the four object classes used by the SWVPD for the /usr part of the installable software product. The object classes in this repository can be shared across the network by /usr clients, dataless and diskless workstations. Software installed in the /usr part can be can be shared among several machines with compatible hardware architectures.
/usr/share/lib/objrepos
Contains the four object classes used by the SWVPD for the /usr/share part of the installable software product. The /usr/share part of a software product contains files that are not hardware dependent. They can be shared among several machines, even if the machines have a different hardware architecture. An example of this are terminfo files that describe terminal capabilities. As terminfo is used on many UNIX systems, terminfo files are part of the /usr/share part of a system product.
lslpp options
The lslpp command can list the software recorded in the ODM. When run with the -l (lower case L) flag, it lists each of the locations (/, /usr/lib, /usr/share/lib) where it finds the fileset recorded. This can be distracting if you are not concerned with these distinctions. Alternately, you can run lslpp -L which only reports each fileset once, without making distinctions between the root, usr, and share portions.
2-13
Student Notebook
PdDv: type = "14106902" class = "adapter" subclass = "pci" prefix = "ent" cfgmgr DvDr = "pci/goentdd" Define = /usr/lib/methods/define_rspc" Configure = "/usr/lib/methods/cfggoent" uniquetype = "adapter/pci/14106902"
CuDv: name = "ent1" status = 1 chgstatus = 2 ddins = "pci/goentdd" location = "02-08" parent = "pci2" connwhere = "8 PdDvLn = "adapter/pci/14106902"
AN151.0
V5.3
Student Notebook
Uempty
Filesystem information
? ? ?
Copyright IBM Corporation 2009
User/security information
AN151.0
2-15
Student Notebook
_______
1.
Undefined
Defined
Available
2.
3.
AIX kernel Applications
D____ D____ 4.
/____/_____ 5.
AN151.0
Notes: Instructions
Please answer the following questions by writing them on the picture above. If you are unsure about a question, leave it out. 1. Which command configures devices in an AIX system? Note: This is not an ODM command.)Which ODM class contains all devices that your system supports? 2. Which ODM class contains all devices that are configured in your system? 3. Which programs are loaded into the AIX kernel to control access to the devices? 4. If you have a configured tape drive rmt1, which special file do applications access to work with this device?
V5.3
Student Notebook
Uempty
ODM commands
IBM Power Systems
uniquetype
tape/scsi/scsd
attribute
block_size
deflt
none
values
0-2147483648,1
disk/scsi/osdisk
pvid
none
tty/rs232/tty
login
disable
AN151.0
Notes: Introduction
Different commands are available for working with each of the ODM components: object classes, descriptors, and objects.
2-17
Student Notebook
2. To delete an entire ODM class, use the odmdrop command. The odmdrop command has the following syntax: odmdrop -o object_class_name The name object_class_name is the name of the ODM class you want to remove. Be very careful with this command. It removes the complete class immediately.
V5.3
Student Notebook
Uempty
PdAt: uniquetype = "tape/scsi/scsd" attribute = "block_size" deflt = 512" values = "0-2147483648,1" width = "" type = "R" generic = "DU" rep = "nr" nls_index = 6
AN151.0
2-19
Student Notebook
Possible queries
As with any database, you can perform queries for records matching certain criteria. The tests are on the values of the descriptors of the objects. A number of tests can be performed: = != > >= < <= like equal not equal greater greater than or equal to less than less than or equal to similar to; finds patterns in character string data
For example, to search for records where the value of the lpp_name attribute begins with bosext1., you would use the syntax lpp_name like bosext1.* Tests can be linked together using normal boolean operations, as shown in the following example: uniquetype=tape/scsi/scsd and attribute=block_size In addition to the * wildcard, a ? can be used as a wildcard character.
V5.3
Student Notebook
Uempty
PdAt: uniquetype = "tape/scsi/scsd" attribute = "block_size" deflt = 512" values = "0-2147483648,1" width = "" type = "R" generic = "DU" rep = "nr" nls_index = 6
AN151.0
2-21
Student Notebook
V5.3
Student Notebook
Uempty
2-23
Student Notebook
lpp: name = "bos.rte.printers size = 0 state = 5 ver = 6 rel = 1 mod =0 fix = 0 description = "Front End Printer Support lpp_id = 38 inventory: lpp_id = 38 private = 0 file_type = 0 format = 1 loc0 = "/etc/qconfig loc1 = " loc2 = " size = 0 checksum = 0
Figure 2-15. Software vital product data
product: lpp_name = "bos.rte.printers comp_id = "5765-C3403 state = 5 ver = 6 rel = 1 mod =0 fix = 0 ptf = " prereq = "*coreq bos.rte 5.1.0.0 description = " supersedes = "" history: lpp_id = 38 ver = 6 rel = 1 mod = 0 fix = 0 ptf = " state = 1 time = 1187714064 comment = ""
AN151.0
Contents of SWVPD
The following information is part of the SWVPD: The name of the software product (for example, bos.rte.printers) The version, release, modification, and fix level of the software product (for example, 5.3.0.10 or 6.1.0.0) The fix level, which contains a summary of fixes implemented in a product Any program temporary fix (PTF) that has been installed on the system The state of the software product: - Available (state = 1)
2-24 AIX Advanced Administration
Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V5.3
Student Notebook
Uempty
Applying (state = 2) Applied (state = 3) Committing (state = 4) Committed (state = 5) Rejecting (state = 6) Broken (state = 7)
SWVPD classes
The Software Vital Product Data is stored in the following ODM classes: lpp The lpp object class contains information about the installed software products, including the current software product state and description. The inventory object class contains information about the files associated with a software product. The product object class contains product information about the installation and updates of software products and their prerequisites. The history object class contains historical information about the installation and updates of software products.
inventory product
history
2-25
Student Notebook
Only possible for PTFs or Updates Previous version stored in /usr/lpp/Package_Name Rejecting update recovers to saved version Committing update deletes previous version Removing committed software is possible No return to previous version If installation was not successful: a) installp -C b) smit maintain_software
AN151.0
Notes: Introduction
The AIX software vital product database uses software states that describe the status of an install or update package.
V5.3
Student Notebook
Uempty
Once a product is committed, if you would like to return to the old version, you must remove the current version and reinstall the old version.
2-27
Student Notebook
PdDv: type = scsd" class = "tape" subclass = "scsi" prefix = "rmt" ... base = 0 ... detectable = 1 ... led = 2418 setno = 54 msgno = 0 catalog = "devices.cat" DvDr = "tape" Define = "/etc/methods/define" Configure = "/etc/methods/cfgsctape" Change = "/etc/methods/chggen" Unconfigure = "/etc/methods/ucfgdevice" Undefine = "etc/methods/undefine" Start = "" Stop = "" ... uniquetype = "tape/scsi/scsd"
Copyright IBM Corporation 2009
AN151.0
type
This specifies the product name or model number, for example, 8 mm (tape).
class
Specifies the functional class name. A functional class is a group of device instances sharing the same high-level function. For example, tape is a functional class name representing all tape devices.
2-28 AIX Advanced Administration
Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V5.3
Student Notebook
Uempty
subclass
Device classes are grouped into subclasses. The subclass scsi specifies all tape devices that may be attached to a SCSI interface.
prefix
This specifies the Assigned Prefix in the customized database, which is used to derive the device instance name and /dev name. For example, rmt is the prefix name assigned to tape devices. Names of tape devices would then look like rmt0, rmt1, or rmt2.
base
This descriptor specifies whether a device is a base device or not. A base device is any device that forms part of a minimal base system. During system boot, a minimal base system is configured to permit access to the root volume group (rootvg) and hence to the root file system. This minimal base system can include, for example, the standard I/O diskette adapter and a SCSI hard drive. The device shown on the visual is not a base device. This flag is also used by the bosboot and savebase commands, which are introduced later in this course.
detectable
This specifies whether the device instance is detectable or undetectable. A device whose presence and type can be determined by the cfgmgr, once it is actually powered on and attached to the system, is said to be detectable. A value of 1 means that the device is detectable, and a value of 0 that it is not (for example, a printer or tty).
led
This indicates the value displayed on the LEDs when the configure method begins to run. The value stored is decimal, but the value shown on the LEDs is hexadecimal (2418 is 972 in hex).
setno, msgno
Each device has a specific description (for example, SCSI Tape Drive) that is shown when the device attributes are listed by the lsdev command. These two descriptors are used to look up the description in a message catalog.
2-29
Student Notebook
catalog
This identifies the filename of the national language support (NLS) catalog. The LANG variable on a system controls which catalog file is used to show a message. For example, if LANG is set to en_US, the catalog file /usr/lib/nls/msg/en_US/devices.cat is used. If LANG is de_DE, catalog /usr/lib/nls/msg/de_DE/devices.cat is used.
DvDr
This identifies the name of the device driver associated with the device (for example, tape). Usually, device drivers are stored in directory /usr/lib/drivers. Device drivers are loaded into the AIX kernel when a device is made available.
Define
This names the define method associated with the device type. This program is called when a device is brought into the defined state.
Configure
This names the configure method associated with the device type. This program is called when a device is brought into the available state.
Change
This names the change method associated with the device type. This program is called when a device attribute is changed through the chdev command.
Unconfigure
This names the unconfigure method associated with the device type. This program is called when a device is unconfigured by rmdev -l.
Undefine
This names the undefine method associated with the device type. This program is called when a device is undefined by rmdev -l -d.
Start, stop
Few devices support a stopped state (only logical devices). A stopped state means that the device driver is loaded, but no application can access the device. These two attributes name the methods to start or stop a device.
V5.3
Student Notebook
Uempty
uniquetype
This is a key that is referenced by other object classes. Objects use this descriptor as a pointer back to the device description in PdDv. The key is a concatenation of the class, subclass, and type values.
2-31
Student Notebook
PdAt: uniquetype = "tape/scsi/scsd" attribute = "block_size" deflt = "" values = "0-2147483648,1" ... PdAt: uniquetype = "disk/scsi/osdisk" attribute = "pvid" deflt = "none" values = "" ... PdAt: uniquetype = "tty/rs232/tty" attribute = "term" deflt = "dumb" values = "" ...
AN151.0
uniquetype
This descriptor is used as a pointer back to the device defined in the PdDv object class.
V5.3
Student Notebook
Uempty
attribute
This identifies the name of the attribute. This is the name that can be passed to the mkdev or chdev command. For example, to change the default name of dumb to ibm3151 for tty0, you can issue the following command: # chdev -l tty0 -a term=ibm3151
deflt
This identifies the default value for an attribute. Nondefault values are stored in CuAt.
values
This identifies the possible values that can be associated with the attribute name. For example, allowed values for the block_size attribute range from 0 to 2147483648, with an increment of 1.
2-33
Student Notebook
CuDv: name = "ent1" status = 1 chgstatus = 2 ddins = "pci/goentdd" location = "02-08" parent = "pci2" connwhere = "8" PdDvLn = "adapter/pci/14106902" CuDv: name = "hdisk2" status = 1 chgstatus = 2 ddins = "scdisk" location = "01-08-01-8,0" parent = "scsi1" connwhere = "8,0" PdDvLn = "disk/scsi/scsd"
Copyright IBM Corporation 2009
AN151.0
V5.3
Student Notebook
Uempty
name
A customized device object for a device instance is assigned a unique logical name to distinguish the device from other devices. The visual shows two devices, an Ethernet adapter ent1 and a disk drive hdisk2.
status
This identifies the current status of the device instance. Possible values are: - status = 0 - Defined - status = 1 - Available - status = 2 - Stopped
chgstatus
This flag tells whether the device instance has been altered since the last system boot. The diagnostics facility uses this flag to validate system configuration. The flag can take these values: - chgstatus = 0 - New device - chgstatus = 1 - Don't care - chgstatus = 2 - Same - chgstatus = 3 - Device is missing
ddins
This descriptor typically contains the same value as the Device Driver Name descriptor in the Predefined Devices (PdDv) object class. It specifies the name of the device driver that is loaded into the AIX kernel.
location
Identifies the AIX location of a device. The location code is a path from the system unit through the adapter to the device. In case of a hardware problem, the location code is used by technical support to identify a failing device.
parent
Identifies the logical name of the parent device. For example, the parent device of hdisk2 is scsi1.
2-35
Student Notebook
connwhere
Identifies the specific location on the parent device where the device is connected. For example, the device hdisk2 uses the SCSI address 8,0.
PdDvLn
Provides a link to the device instance's predefined information through the uniquetype descriptor in the PdDv object class.
V5.3
Student Notebook
Uempty
CuAt: name = "ent1" attribute = "jumbo_frames" value = "yes" ... CuAt: name = "hdisk2" attribute = "pvid" value = "00c35ba0816eafe50000000000000000" ...
AN151.0
2-37
Student Notebook
PdCn: uniquetype = "adapter/pci/sym875 connkey = "scsi connwhere = "1,0" PdCn: uniquetype = "adapter/pci/sym875 connkey = "scsi connwhere = "2,0"
CuDvDr: resource value1 = value2 = value3 = CuDvDr: resource value1 = value2 = value3 =
CuDep: name = "rootvg dependency = "hd6" CuDep: name = "datavg dependency = "lv01"
CuVPD: name = "hdisk2" vpd_type = 0 vpd = "*MFIBM *TM\n\ HUS151473VL3800 *F03N5280 *RL53343341*SN009DAFDF*ECH17 923D *P26K5531 *Z0\n\ 000004029F00013A*ZVMPSS43A *Z20068*Z307220"
Copyright IBM Corporation 2009
AN151.0
Notes: PdCn
The Predefined Connection (PdCn) object class contains connection information for adapters (or sometimes called intermediate devices). This object class also includes predefined dependency information. For each connection location, there are one or more objects describing the subclasses of devices that can be connected. The sample PdCn objects on the visual indicate that, at the given locations, all devices belonging to subclass SCSI could be attached.
CuDep
The Customized Dependency (CuDep) object class describes device instances that depend on other device instances. This object class describes the dependence links between logical devices and physical devices as well as dependence links between
V5.3
Student Notebook
Uempty
logical devices, exclusively. Physical dependencies of one device on another device are recorded in the Customized Devices (CuDev) object class. The sample CuDep objects on the visual show the dependencies between logical volumes and the volume groups they belong to.
CuDvDr
The Customized Device Driver (CuDvDr) object class is used to create the entries in the /dev directory. These special files are used from applications to access a device driver that is part of the AIX kernel. The attribute value1 is called the major number and is a unique key for a device driver. The attribute value2 specifies a certain operating mode of a device driver. The sample CuDvDr objects on the visual reflect the device driver for disk drives hdisk2 and hdisk3. The major number 36 specifies the driver in the kernel. In our example, the minor numbers 0 and 1 specify two different instances of disk dives, both using the same device driver. For other devices, the minor number may represent different modes in which the device can be used. For example, if we were looking at a tape drive, the operating mode 0 would specify a rewind on close for the tape drive, the operating mode 1 would specify no rewind on close for a tape drive.
CuVPD
The Customized Vital Product Data (CuVPD) object class contains vital product data (manufacturer of device, engineering level, part number, and so forth) that is useful for technical support. When an error occurs with a specific device, the vital product data is shown in the error log.
2-39
Student Notebook
Checkpoint
IBM Power Systems
1. In which ODM class do you find the physical volume IDs of your disks?
________________________________________________
AN151.0
Notes:
V5.3
Student Notebook
Uempty
Review of device configuration ODM classes Modifying a device default attribute Creating self-defined ODM classes (Optional)
AN151.0
Notes:
2-41
Student Notebook
Unit summary
IBM Power Systems
Having completed this unit, you should be able to: Describe the structure of the ODM Use the ODM command line interface Explain the role of the ODM in device configuration Describe the function of the most important ODM files
AN151.0
Notes:
The ODM is made from object classes, which are broken into individual objects and descriptors. AIX offers a command line interface to work with the ODM files. The device information is held in the customized and the predefined databases (Cu*, Pd*).
V5.3
Student Notebook
Uempty
References
Online AIX Version 6.1 General Programming Concepts: Writing and Debugging Programs (Chapter 5. Error-Logging Overview) AIX Version 6.1 Command Reference volumes 1-6
Online
Note: References listed as online above are available at the following address: http://publib.boulder.ibm.com/infocenter/systems
3-1
Student Notebook
Unit objectives
IBM Power Systems
After completing this unit, you should be able to: Analyze error log entries Identify and maintain the error logging components Describe different error notification methods Log system messages using the syslogd daemon Monitor and take actions for threshold conditions using RMC Monitor and take actions for hang conditions using shdaemon
Copyright IBM Corporation 2009
AN151.0
Notes:
3-2
V5.3
Student Notebook
Uempty
3-3
Student Notebook
console
errnotify
CuDv, CuAt CuVPD error record template /var/adm/ras/errtmplt errstop application errlog() /dev/error (timestamp)
Copyright IBM Corporation 2009
error daemon
errlog /var/adm/ras/errlog /usr/lib/errdemon
errclear errlogger
User Kernel
AN151.0
3-4
V5.3
Student Notebook
Uempty
To create an entry in the error log, the errdemon daemon retrieves the appropriate template from the repository, the resource name of the unit that caused the error, and the detail data. Also, if the error signifies a hardware-related problem and hardware vital product data (VPD) exists, the daemon retrieves the VPD from the ODM. When you access the error log, either through SMIT or with the errpt command, the error log is formatted according to the error template in the error template repository and presented in either a summary or detailed report. Most entries in the error log are attributable to hardware and software problems, but informational messages can also be logged, for example, by the system administrator.
3-5
Student Notebook
# smit errpt
Generate an Error Report ... CONCURRENT error reporting? Type of Report Error CLASSES (default is all) Error TYPES (default is all) Error LABELS (default is all) Error ID's (default is all) Resource CLASSES (default is all) Resource TYPES (default is all) Resource NAMES (default is all) SEQUENCE numbers (default is all) STARTING time interval ENDING time interval Show only Duplicated Errors Consolidate Duplicated Errors LOGFILE TEMPLATE file MESSAGE file FILENAME to send report to (default is stdout) ...
Copyright IBM Corporation 2009
+ + + + +X
AN151.0
Notes: Overview
The SMIT fastpath smit errpt takes you to the screen used to generate an error report. Any user can use this screen. As shown on the visual, the screen includes a number of fields that can be used for report specifications. Some of these fields are described in more detail below.
3-6
V5.3
Student Notebook
Uempty
Type of report
Summary, intermediate, and detailed reports are available. Detailed reports give comprehensive information. Intermediate reports display most of the error information. Summary reports contain concise descriptions of errors.
Error classes
Values are H (hardware), S (software), and O (operator messages created with errlogger). You can specify more than one error class.
Error types
Valid error types include the following: - PEND - The loss of availability of a device or component is imminent. - PERF - The performance of the device or component has degraded to below an acceptable level. - TEMP - Recovered from condition after several attempts. - PERM - Unable to recover from error condition. Error types with this value are usually the most severe errors and imply that you have a hardware or software defect. Error types other than PERM usually do not indicate a defect, but they are recorded so that they can be analyzed by the diagnostic programs. - UNKN - Severity of the error cannot be determined. - INFO - The error type is used to record informational entries
Error labels
An error label is the mnemonic name used for an error ID.
Error IDs
An error ID is a 32-bit hexadecimal code used to identify a particular failure.
Resource classes
Means device class for hardware errors (for example, disk).
Resource types
Indicates device type for hardware (for example, 355 MB).
3-7
Student Notebook
Resource names
Provides common device name (for example hdisk0).
3-8
V5.3
Student Notebook
Uempty
Summary report:
# errpt
Intermediate report:
# errpt -A
Detailed report:
# errpt -a
AN151.0
3-9
Student Notebook
The -d option
The -d option (flag) can be used to limit the report to a particular class of errors. Two examples illustrating use of this flag are shown on the visual: - The command errpt -d H specifies a summary report of all hardware (-d H) errors. - The command errpt -a -d S specifies a detailed report (-a) of all software (-d S) errors.
The -c option
If you want to display the error entries concurrently, that is, at the time they are logged, you must execute errpt -c. In the example on the visual, we direct the output to the system console.
The -D flag
Duplicate errors can be consolidated using errpt -D. When used with the -a option, errpt -D reports only the number of duplicate errors and the timestamp for the first and last occurrence of the identical error.
The -P flag
Shows only errors which are duplicates of the previous error. The -P flag applies only to duplicate errors generated by the error log device driver.
Additional information
The errpt command has many options. Refer to your AIX Commands Reference (or the man page for errpt) for a complete description.
V5.3
Student Notebook
Uempty
# errpt IDENTIFIER 192AC071 C6ACA566 A6DF45AA 2BFA76F6 9DBCFDEE 192AC071 AA8AB241 C6ACA566 2BFA76F6 EAA3D429 EAA3D429 F7DDA124 TIMESTAMP 1010130907 1010130807 1010130707 1010130707 1010130707 1010123907 1010120407 1010120007 1010094907 1010094207 1010094207 1010094207 T U I T T T T U T U U U T O S O S O O O S S S S H C RESOURCE_NAME errdemon syslog RMCdaemon SYSPROC errdemon errdemon OPERATOR syslog SYSPROC LVDD LVDD LVDD DESCRIPTION ERROR LOGGING TURNED OFF MESSAGE REDIRECTED FROM SYSLOG The daemon is started. SYSTEM SHUTDOWN BY USER ERROR LOGGING TURNED ON ERROR LOGGING TURNED OFF OPERATOR NOTIFICATION MESSAGE REDIRECTED FROM SYSLOG SYSTEM SHUTDOWN BY USER PHYSICAL PARTITION MARKED STALE PHYSICAL PARTITION MARKED STALE PHYSICAL VOLUME DECLARED MISSING
AN151.0
3-11
Student Notebook
LABEL: IDENTIFIER:
LVM_SA_PVMISS F7DDA124 Wed Oct 10 09:42:20 CDT 2007 113 00C35BA04C00 rt1s3vlp2 H UNKN Global LVDD NONE NONE
Date/Time: Sequence Number: Machine Id: Node Id: Class: Type: WPAR: Resource Name: Resource Class: Resource Type: Location:
Description PHYSICAL VOLUME DECLARED MISSING Probable Causes POWER, DRIVE, ADAPTER, OR CABLE FAILURE Detail Data MAJOR/MINOR DEVICE NUMBER 8000 0011 0000 0001 SENSE DATA 00C3 5BA0 0000 4C00 0000 0115 7F54 BF78 00C3 5BA0 7FCF 6B93 0000 0000 0000 0000
AN151.0
V5.3
Student Notebook
Uempty
3-13
Student Notebook
Error Recommendations Type P Failure of physical volume media Action: Replace device as soon as possible P T Device does not respond Action: Check power supply Error caused by bad block or occurrence of a recovered error Rule of thumb: If disk produces more than one DISK_ERR4 per week, replace the disk
P = Permanent T = Temporary
Copyright IBM Corporation 2009
AN151.0
V5.3
Student Notebook
Uempty
4. Sometimes SCSI errors are logged, mostly with the LABEL SCSI_ERR10. They indicate that the SCSI controller is not able to communicate with an attached device. In this case, check the cable (and the cable length), the SCSI addresses, and the terminator.
DISK_ERR5 errors
A very infrequent error is DISK_ERR5. It is the catch-all (that is, the problem does not match any of the above DISK_ERRx symptoms). You need to investigate further by running the diagnostic programs which can detect and produce more information about the problem.
3-15
Student Notebook
Recommendations No more bad block relocation Action: Replace disk as soon as possible.
S,P
LVM_SA_QUORCLOSE
H,P
Quorum lost, volume group closing Action: Check disk, consider working without quorum.
AN151.0
V5.3
Student Notebook
Uempty
# smit errdemon
Change / Show Characteristics of the Error Log Type or select values in entry fields. Press Enter AFTER making all desired changes. LOGFILE *Maximum LOGSIZE Memory Buffer Size ... [/var/adm/ras/errlog] [1048576] [32768]
# #
# smit errclear
Clean the Error Log Type or select values in entry fields. Press Enter AFTER making all desired changes. Remove entries older than this number of days Error CLASSES Error TYPES ... Resource CLASSES ... [30] [ ] [ ] [ ] + # + +
AN151.0
3-17
Student Notebook
Entries in /var/spool/cron/crontabs/root use errclear to remove software and hardware errors. Software and operator errors are purged after 30 days, hardware errors are purged after 90 days.
V5.3
Student Notebook
Uempty
AN151.0
3-19
Student Notebook
V5.3
Student Notebook
Uempty
3-21
Student Notebook
ODM-Based: /etc/objrepos/errnotify
Error notification
AN151.0
V5.3
Student Notebook
Uempty
3. ODM-based error notification: The errdemon program uses the ODM class errnotify for error notification. How to work with errnotify is discussed later in this topic.
3-23
Student Notebook
# Compare the two files. # If no difference, let's sleep again cmp -s /tmp/errlog.1 /tmp/errlog.2 &&
continue
# Files are different: Let's inform the operator: print "Operator: Check error log " > /dev/console errpt done
Copyright IBM Corporation 2009
>
/tmp/errlog.1
AN151.0
V5.3
Student Notebook
Uempty
- The two files are compared using the command cmp -s (silent compare, that means no output will be reported). If the files are not different, we jump back to the beginning of the loop (continue), and the process will sleep again. - If there is a difference, a new error entry has been posted to the error log. In this case, we inform the operator that a new entry is in the error log. Instead of print you could use the mail command to inform another person.
3-25
Student Notebook
errnotify: en_pid = 0 en_name = "sample" en_persistenceflg = 1 en_label = "" en_crcid = 0 en_class = "H" en_type = "PERM" en_alertflg = "" en_resource = "" en_rtype = "" en_rclass = "disk" en_method = "errpt -a -l $1 | mail -s DiskError root"
AN151.0
V5.3
Student Notebook
Uempty
Example on visual
The example on the visual shows an object that creates a mail message to root whenever a disk error is posted to the log.
List of descriptors
Here is a list of all descriptors for the errnotify object class: en_alertflg Identifies whether the error is alertable. This descriptor is provided for use by alert agents with network management applications. The values are TRUE (alertable) or FALSE (not alertable). Identifies the class of error log entries to match. Valid values are H (hardware errors), S (software errors), O (operator messages), and U (undetermined). Specifies the error identifier associated with a particular error. Specifies the label associated with a particular error identifier as defined in the output of errpt -t (show templates). Specifies a user-programmable action, such as a shell script or a command string, to be run when an error matching the selection criteria of this Error Notification object is logged. The error notification daemon uses the sh -c command to execute the notify method. The following keywords are passed to the method as arguments: $1 Sequence number from the error log entry $2 Error ID from the error log entry $3 Class from the error log entry $4 Type from the error log entry $5 Alert flags from the error log entry $6 Resource name from the error log entry $7 Resource type from the error log entry $8 Resource class from the error log entry $9 Error label from the error log entry en_name Uniquely identifies the object
en_class
en_persistenceflg Designates whether the Error Notification object should be removed when the system is restarted. 0 means removed at boot time; 1 means persists through boot.
3-27
Student Notebook
en_pid
Specifies a process ID for use in identifying the Error Notification object. Objects that have a PID specified should have the en_persistenceflg descriptor set to 0. Identifies the class of the failing resource. For hardware errors, the resource class is the device class (see PdDv). Not used for software errors. Identifies the name of the failing resource. For hardware errors, the resource name is the device name. Not used for software errors. Identifies the type of the failing resource. For hardware errors, the resource type is the device type (see PdDv). Not used for software errors. Enables notification of an error accompanied by a symptom string when set to TRUE. Identifies the severity of error log entries to match. Valid values are: INFO: Informational PEND: Impending loss of availability PERM: Permanent PERF: Unacceptable performance degradation TEMP: Temporary UNKN: Unknown TRUE: Matches alertable errors FALSE: Matches non-alertable errors 0: Removes the Error Notification object at system restart non-zero: Retains the Error Notification object at system restart
en_rclass
en_resource
en_rtype
en_symptom en_type
en_err64 en_dup
Identifies the environment of the error. TRUE indicates that the error is from a 64-bit environment. Identifies whether the kernel identified the error as a duplicate. TRUE indicates that it is a duplicate error.
V5.3
Student Notebook
Uempty
syslogd daemon
IBM Power Systems
/tmp/syslog.debug:
syslogd
inetd[16634]: A connection requires tn service inetd[16634]: Child process 17212 has ended
# stopsrc
-s
# startsrc -s
AN151.0
3-29
Student Notebook
V5.3
Student Notebook
Uempty
mail.debug
/tmp/mail.debug
daemon.debug /tmp/daemon.debug
*.debug; mail.none
@server
AN151.0
- The following line specifies that all mail messages are to be collected in the file /tmp/mail.debug: mail.debug /dev/mail.debug
- The following line specifies that all messages produced from daemon processes are to be collected in the file /tmp/daemon.debug: daemon.debug
Copyright IBM Corp. 2009
/tmp/daemon.debug
Unit 3. Error monitoring
Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
3-31
Student Notebook
- The following line specifies that all messages, except messages from the mail subsystem, are to be sent to the syslogd daemon on the host server: *.debug; mail.none @server
Note that, if this example and the preceding example appear in the same /etc/syslog.conf file, messages sent to /tmp/daemon.debug will also be sent to the host server.
Facilities
Use the following system facility names in the selector field: kern user mail daemon auth syslog lpr news uucp * Kernel User level Mail subsystem System daemons Security or authorization syslogd messages Line-printer subsystem News subsystem uucp subsystem All facilities
Priority levels
Use the following levels in the selector field. Messages of the specified level and all levels above it are sent as directed.
3-32 AIX Advanced Administration
Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V5.3
Student Notebook
Uempty
Specifies emergency messages. These messages are not distributed to all users. Specifies important messages such as serious hardware errors. These messages are distributed to all users. Specifies critical messages, not classified as errors, such as improper login attempts. These messages are sent to the system console. Specifies messages that represent error conditions. Specifies messages for abnormal, but recoverable conditions. Specifies important informational messages. Specifies information messages that are useful in analyzing the system. Specifies debugging messages. If you are interested in all messages of a certain facility, use this level. Excludes the selected facility.
3-33
Student Notebook
# errpt IDENTIFIER TIMESTAMP T C ... C6ACA566 0505071399 U FROM SYSLOG ... RESOURCE_NAME DESCRIPTION S syslog MESSAGE REDIRECTED
AN151.0
V5.3
Student Notebook
Uempty
errnotify:
en_name = "syslog1" en_persistenceflg = l en_method = "logger Error Log: `errpt -l $1 | grep -v TIMESTAMP`"
errnotify:
en_name = "syslog1" en_persistenceflg = l en_method = "logger Error Log: $(errpt -l $1 | grep -v TIMESTAMP)"
Direct the last error entry (-l $1) to the syslogd. Do not show the error log header (grep -v) or (tail -1).
errnotify:
en_name = "syslog1" en_persistenceflg = l en_method = "errpt -l $1 | tail -1 | logger -t errpt -p daemon.notice"
Copyright IBM Corporation 2009
AN151.0
Command substitution
You will need to use command substitution (or pipes) before calling the logger command. The first two examples on the visual illustrate the two ways to do command substitution in a Korn shell environment: - Using the UNIX command syntax (with backquotes) - shown in the first example on the visual - Using the newer $(UNIX command) syntax - shown in the second example on the visual
Copyright IBM Corp. 2009 Unit 3. Error monitoring
Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
3-35
Student Notebook
System hangs:
High priority process Other
Actions:
Log error in the Error log Display a warning message on the console Launch recovery login on a console Launch a command Automatically REBOOT system
AN151.0
V5.3
Student Notebook
Uempty
Actions
If lower priority processes are not being scheduled, shdaemon will perform the specified action. Each action can be individually enabled and has its own configurable priority and time-out values. There are five actions available: - Log error in the Error log. - Display a warning message on a console. - Launch a recovery login on a console. - Launch a command. - Automatically REBOOT the system.
3-37
Student Notebook
Configuring shdaemon
IBM Power Systems
# shconf -E -l prio sh_pp disable pp_errlog pp_eto pp_eprio pp_warning pp_wto pp_wprio pp_wterm pp_login pp_lto pp_lprio pp_lterm pp_cmd pp_cto pp_cprio pp_cpath pp_reboot pp_rto pp_rprio disable 2 60 enable 2 60 /dev/console enable 2 100 /dev/console
Enable Process Priority Problem Log Error in the Error Logging Detection Time-out Process Priority Display a warning message on a console Detection Time-out Process Priority Terminal Device Launch a recovering login on a console Detection Time-out Process Priority Terminal Device
disable Launch a command 2 Detection Time-out 60 Process Priority /home/unhang Script disable 5 39 Automatically REBOOT system Detection Time-out Process Priority
Copyright IBM Corporation 2009
AN151.0
Notes: Introduction
shdaemon configuration information is stored as attributes in the SWservAt ODM object class. Configuration changes take effect immediately and survive across reboots. Use shconf (or smit shd) to configure or display the current configuration of shdaemon. The values shown in the visual are the default values.
Enabling shdaemon
At least two parameters must be modified to enable shdaemon: - Enable priority monitoring (sh_pp) - Enable one or more actions (pp_errlog, pp_warning, and so forth)
V5.3
Student Notebook
Uempty
When enabling shdaemon, shconf performs the following steps: - Modifies the SWservAt parameters - Starts shdaemon - Modifies /etc/inittab so that shdaemon will be started on each system boot
Action attributes
Each action has its own attributes, which set the priority and timeout thresholds and define the action to be taken. The timeout attribute unit of measure is in minutes.
Example
By changing the chconf attributes, we can enable, disable, and modify the behavior of the facility. For example:, shdaemon is enabled to monitor process priority (sh_pp=enable), and the following actions are enabled: - Enable the to monitor process priority monitoring: # shconf -l prio -a sh_pp=enable - Log error in the Error Logging: # shconf -l prio -a pp_errlog=enable Every two minutes (pp_eto=2), shdaemon will check to see if any process has been run with a process priority number greater than 60 (pp_eprio=60). If not, shdaemon logs an error to the error log. - Display a warning message on a console: # shconf -l prio -a pp_warning=enable (default value)
Every two minutes (pp_wto=2), shdaemon will check to see if any process has been run with a process priority number greater than 60 (pp_wprio=60). If not, shdaemon sends a warning message to the console specified by pp_wterm. - Launch a command: # shconf -l prio -a pp_cmd=enable -a pp_cto=5 Every five minutes (pp_cto=5), shdaemon will check to see if any process has been run with a process priority number greater than 60 (pp_cprio=60). If not, shdaemon runs the command specified by pp_cpath (in this case, /home/unhang).
3-39
Student Notebook
Part 2, section 1: Working with syslogd Part 2, section 2: Error notification with errnotify
AN151.0
Notes:
V5.3
Student Notebook
Uempty
3-41
Student Notebook
Associates predefined responses with predefined conditions for monitoring system resources Example: Broadcast a message to the system administrator when the /tmp file system becomes 90% full
AN151.0
V5.3
Student Notebook
Uempty
Set up
The following steps are provided to assist you in setting up an efficient monitoring system: 1. Review the predefined conditions of your interests. Use them as they are, customize them to fit your configurations, or use them as templates to create your own. 2. Review the predefined responses. Customize them to suit your environment and your working schedule. For example, the response Critical notifications is predefined with three actions: a) Log events to /tmp/criticalEvents. b) E-mail to root. c) Broadcast a message to all logged-in users anytime when an event or a rearm event occurs. You may modify the response, such as to log events to a different file anytime when events occur, e-mail to you during non-working hours, and add a new action to page you only during working hours. With such a setup, different notification mechanisms can be automatically switched, based on your working schedule. 3. Reuse the responses for conditions. For example, you can customize the three severity responses, Critical notifications, Warning notifications, and Informational notifications to take actions in response to events of different severities, and associate the responses to the conditions of respective severities. With only three notification responses, you can be notified of all the events with respective notification mechanisms based on their urgencies. 4. Once the monitoring is set up, your system continues being monitored whether your Web-based System Manager session is running or not. To know the system status, you may bring up a Web-based System Manager session and view the Events plug-in, or simply use the lsaudrec command from the command line interface to view the audit log.
More information
A very good Redbook describing this topic is: A Practical Guide for Resource Monitoring and Control (SG24-6615). This redbook can be found at http://www.redbooks.ibm.com/redbooks/pdfs/sg246615.pdf.
3-43
Student Notebook
AN151.0
Notes: Conditions
A condition monitors a specific property, such as total percentage used, in a specific resource class, such as JFS. Each condition contains an event expression to define an event and an optional rearm event.
V5.3
Student Notebook
Uempty
AN151.0
3-45
Student Notebook
AN151.0
V5.3
Student Notebook
Uempty
AN151.0
3-47
Student Notebook
RMC management
IBM Power Systems
AN151.0
V5.3
Student Notebook
Uempty
3-49
Student Notebook
AN151.0
V5.3
Student Notebook
Uempty
Checkpoint
IBM Power Systems
1. Which command generates error reports? Which flag of this command is used to generate a detailed error report?
__________________________________________________ __________________________________________________
AN151.0
Notes:
3-51
Student Notebook
Unit summary
IBM Power Systems
Having completed this unit, you should be able to: Analyze error log entries Identify and maintain the error logging components Describe different error notification methods Log system messages using the syslogd daemon Monitor and take actions for threshold conditions using RMC Monitor and take actions for hang conditions using shdaemon
Copyright IBM Corporation 2009
AN151.0
Notes:
Use the errpt (smit errpt) command to generate error reports. Different error notification methods are available. Use smit errdemon and smit errclear to maintain the error log. Some components use syslogd for error logging. The syslogd configuration file is /etc/syslog.conf. You can redirect syslogd and error log messages. You can monitor resource conditions and take automated action, such as sending mail to root.
V5.3
Student Notebook
Uempty
References
SC23-6616 SG24-7296 AIX Version 6.1 Installation and migration NIM from A to Z in AIX 5L (Redbook)
4-1
Student Notebook
Unit objectives
IBM Power Systems
After completing this unit, you should be able to: Configure an AIX partition for use as a NIM master Set up NIM to support the installation of AIX onto a client
AN151.0
Notes:
4-2
V5.3
Student Notebook
Uempty
NIM overview
IBM Power Systems
Eliminate tape/CD at each system Distribute installation load Support for push or pull installations NIM administrative tools
Command line interface SMIT WebSM
Client
Client
AN151.0
Advantages
NIM provides several advantages: - Provides one central point for AIX software administration for all the NIM clients - Eliminates need to walk a CDROM or tape to each system and the need for a tape drive or CDROM drive at every system - Installations can be initiated from the master machine (push) or from the client (pull)
4-3
Student Notebook
- The installation load can be distributed. Most simply, the NIM master machine is configured as the server for all the filesets to be installed. However, you can also configure one or more client machines to act as servers to distribute the load if you have many clients.
SMIT
- As you become familiar with the NIM environment, you may find that you use a combination of methods. For example, you may use the command line to list NIM status and perform simple NIM operations, while using SMIT or WebSM for more complex operations or for operations that you do not perform frequently.
4-4
V5.3
Student Notebook
Uempty
Machine roles
IBM Power Systems
Master
File sets:
bos.sysmgt.nim.master bos.sysmgt.nim.client Stores NIM database
NIM administration Can initiate push installations to NIM clients AIX version >= all other NIM machines
Client
File sets:
bos.sysmgt.nim.client
Server
Any machine, master or client Serves NIM resources to clients, thus requires adequate disk space and throughput
Copyright IBM Corporation 2009
AN151.0
Notes:
There are three basic roles that a machine can assume in the NIM environment: master, client, and resource server. There can only be one master machine in a NIM environment, all other machines are clients. Any machine, master or client, can be a resource server.
NIM software
All machines in the NIM environment must install bos.sysmgt.nim.client. The master machine must also install bos.sysmgt.nim.master and bos.sysmgt.nim.spot.
Master
The NIM master manages all other machines that participate in the NIM environment. The NIM database is stored on the NIM master. The NIM master is fundamental for all
4-5
Student Notebook
of the operations in the NIM environment and must be set up and operational before performing any NIM operations. The master can initiate a software installation to a client, which is called a push installation. Also, the NIM master is the only machine that is given the permissions and ability to execute NIM operations on other machines within the NIM environment. The rsh command is used to remotely execute commands on clients which allows the NIM master to install to a number of clients with one NIM operation. With AIX 5.3 or AIX 6.1, nimsh can be used as an alternative to rsh.
Client
All other machines in a NIM environment are clients. Clients can request a software installation from a server machine (pull installation).
Server
Any machine, the master or a client, can be configured by the master as a server for a particular software resource. Most often, the master is also the server. However, if your environment has many nodes or consists of a complex network environment, you may want to configure some nodes to act as servers to improve installation performance. Servers must have adequate disk space for the resources they will be providing. They also need network connections to the client machines they serve and sufficient bandwidth to respond to the expected volume.
4-6