VXVM Troubleshooting Toi

Veritas Volume Manager 3.X/4.
X troubleshooting
This TOI addresses Veritas Volume Manager 3.X/4.X troubleshooting
TOI Objectives
• Objective 1 Booting/Veritas Not Starting Issues

• Objective 2 Troubleshooting Exercise #1
• Objective 9 Split Brain
Objective 1 Booting/Veritas Not Starting Issues VxVM = Veritas
Volume Manager
#1 OS/Boot disk and mirror are under Veritas Volume Manager 3.x (4.0 is different) control and
system is not booting
1. Gather Information
- history, what let up to this
- any hardware replaced
- error messages
- what versionof VxVM
- tried booting off mirror or clone disks yet
What issues can we identify just from asking the right questions?
2. Isolate issue (OS, Hardware or VxVM) by bypassing VxVM (Basic Unencapsulation)
- at ok prompt
ok> printenv use-nvramrc?
use-nvramrc? = false
ok> setenv use-nvramrc? true # if false
ok> setenv auto-boot?=false # keep system from booting to Solaris
- reset the system

ok> reset-all
- boot cdrom -s
- mount the root file system to /a
From which disk?

How do I know which is primary boot disk and/or the mirror?
- make backup copies of following files

cp /a/etc/system /a/etc/system.vx-rootdev
cp /a/etc/vfstab /a/etc/vfstab.vx
cp /a/etc/vx/volboot /a/etc/vx/volboot.org
- edit system file

cd /a/etc/
grep -v rootdev system > system.no-rootdev
cp system.no-rootdev system
- edit vfstab file converting it from volume to partition based
How can we verify vfstab.prevm is accurate? What c#t#d# should be used?
- Create a file called /a/etc/vx/reconfig.d/state.d/install-db.
What other file/directory can keep VxVM from starting?
- cd; umount /a; fsck the OS file systems on this disk

- init 0; then boot -s first
(make sure you boot off disk you editted)
If you are unsure what disk is the primary boot disk. You can use
'luxadm -y set_boot_dev /dev/dsk/c?t?d?s?' to set the boot dev to the OBP.
- If the system boots ok, then ctrl-d to continue booting to run level 3.
(this helps verify that the underlying hardware and OS are in good state or not)
3. If the system successfully boots then we need to see if we can manually start VxVM.
- vxiod set 10; vxconfigd -d; vxdctl init; vxdctl enable
- cd /etc/vx; diff volboot volboot.org

(to check for differences that may be a factor in the boot issue)
What are some of the things in the volboot file that can cause bootVxVM starting issues?
- vxdctl mode to verify if VxVM started
4. If we can manually start VxVM. Then check to see if there are OS mirrors, and if the mirror disk is
available.
IMPORTANT: If the boot disk contains mirrored volumes, one must take all the mirrors offline for
those volumes, except for the one on the boot disk. Offlining a mirror prevents VxVM from ever
performing a recovery on that plex. This step is critical in preventing data corruption.
What's the difference between offlining, disassociating, and detaching mirrors?
- vxdisk -o alldgs -e list # to check disk status
What issues can be seen in vxdisk list output that can cause boot/starting issues?
- vxprint -htrLg rootdg # rootdg disk group should be imported and all volumes should be in
a DISABLED state.
Example, if the boot disk is c0t0d0 with a vxprint output as follows:

# vxprint -htg rootdg
...
v rootvol root DISABLED ACTIVE 1026000 PREFER rootvol-01
pl rootvol-01 rootvol DISABLED ACTIVE 1026000 CONCAT - RW
sd rootdisk-B0 rootvol-01 rootdisk 8378639 1 0 c0t0d0 ENA
sd rootdisk-02 rootvol-01 rootdisk 0 1025999 1 c0t0d0 ENA
pl rootvol-02 rootvol DISABLED ACTIVE 1027026 CONCAT - RW
sd rootmir-06 rootvol-02 rootmir 0 1027026 0 c0t1d0 ENA
...
- In this case the rootvol-02 plex should be offlined as it resides on c0t1d0:

vxmend -g rootdg off rootvol-02
- Start all volumes:
vxrecover -ns
- Start any recovery operations on volumes if needed:
vxrecover -bs
5. At this point the OS is booted, and VxVM has been started successfully, and mirrors offlined.
- next step is to re-enable VxVM and reboot to test if VxVM will start automatically
What can cause VxVM to start manually but not automatically on boot up?
6. Once any debugging actions and/or any other operations are completed, VxVM can be re-enabled again
with the following steps.
- cp /etc/system.vx-rootdev /etc/system
- cp / etc/vfstab.vx /etc/vfstab
- rm /etc/vx/reconfig.d/state.d/install-db
- cd /; ls -l | grep -i vx # if see a /VXVM_.... type directory rename it to old.VXVM
- sync; sync; init 0
- at ok prompt boot off of same disk
7a. If system boots up:
- check for status of volumes and plexes and address any issues
vxinfo -g rootdg -p | egrep -iv "started|active"
- verify all file systems mounted and system is operating correctly
- online mirrors and start recovery operations on the mirrors that were just onlined.
- Example:
vxmend -g rootdg on rootvol-02
vxrecover -bs
7b. If system does NOT boot up use the following document to enable debug logging of boot up.
Document ID:17461Title:Veritas Volume Manager: How to log error messages
also send customer following link or document so they can generate a vxexplorer and send to us and Veritas
Document ID: 243150

http://support.veritas.com/docs/243150 E-Mail this document to a colleague
vxexplorer: How to download, execute, and send it to VERITAS Technical Support

#2 System boots to run level 3 but Veritas Volume Manager does not start, causing
lost of access to volumes containing file systems/data
1. Delete/rename following files if present and either reboot or manaully start VxVM (see procedure
above)
- /etc/vx/reconfig.d/state.d/install-db
- cd /; ls -l | grep -i vx # if see a /VXVM_.... type directory rename it to old.VXVM
2. If these files are not there check for:
- disks with private region
If there are a lot of disks, you can try using this script:
#! /usr/bin/ksh
file="/tmp/diskvtoc.out"
y=0
rm $file
for x in `/bin/ls /dev/rdsk/c*s2`
do
echo "" >> $file
echo "" >> $file
let y="$y"+1
echo "number:" $y >> $file; echo $x >> $file
echo "#########################" >> $file
/bin/ls -l $x >> $file 2>&1
echo "" >> $file
/usr/sbin/prtvtoc $x >> $file 2>&1
echo "*************************" >> $file
echo "" >> $file
echo "" >> $file
done
- Can the private regions be read? Are there disks in rootdg diskgroup? Does the host id
match the /etc/volboot file? Are the group ids of all the rootdg disks the same?
You can try this script as well.
#! /usr/bin/ksh
file="/tmp/vx.out"
y=0
rm $file
for x in `/bin/ls /dev/rdsk/c*s2`
do
echo "" >> $file
echo "" >> $file
let y="$y"+1
echo "number:" $y >> $file; echo $x >> $file
echo "#########################" >> $file
/bin/ls -l $x >> $file 2>&1
echo "" >> $file
/usr/lib/vxvm/diag.d/vxprivutil list $x >> $file 2>&1
echo "*************************" >> $file
echo "" >> $file
echo "" >> $file
done
3. use vxdctl init <hostid> to change volboot file DO NOT EDIT MANUALLY
Note: You can edit the volboot file if it becomes necessary. Just keep in mind that you can
cause vxconfigd to not recognize it, if the format is changed, and/or the file size is not 512 bytes.
4. otherwise:
Document ID:17461Title:Veritas Volume Manager: How to log error messages
also send customer following link or document so they can generate a vxexplorer and send to us and Veritas
Document ID: 243150

http://support.veritas.com/docs/243150 E-Mail this document to a colleague
vxexplorer: How to download, execute, and send it to VERITAS Technical Support

Objective 2 Troubleshooting Exercise #1 Part A
c1t0d0 (rootdisk) was proactively removed from Veritas and was replaced, but vxdiskadm option 5
completes with no errors, (or at least none that are noticed) without bringing the disk out of the removed
state?
Why?
What would you do to fix this?
What would you do to keep this from happening again?
# vxdisk list
DEVICE TYPE DISK GROUP STATUS
c1t0d0s2 sliced - - error
c1t0d0s2 sliced - - offline
c1t1d0s2 sliced disk01 rootdg online
c3t0d89s2 sliced datadg01 datadg online
c3t0d146s2 sliced - - online
- - rootdisk rootdg removed was:c1t0d0s2
Objective 2 Troubleshooting Exercise #1 Part B
Same customer reboots but system fails to come up and gets following message.
"The file just loaded does not appear to be executable"
Why?
What would you do to get around this?
NOTE: see following page for needed outputs
Searching for disks...done

AVAILABLE DISK SELECTIONS:
0. c1t0d0 <SUN36G cyl 24620 alt 2 hd 27 sec 107>
/pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w2100000c50ac76dc,0
1. c1t1d0 <SUN36G cyl 24620 alt 2 hd 27 sec 107>
/pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w2100002037e3d51b,0
2. c3t0d0 <EMC-SYMMETRIX-5267 cyl 14 alt 2 hd 15 sec 64>
/pci@8,700000/fibre-channel@1/sd@0,0
...
Specify disk (enter its number):
# eeprom
auto-boot?=true
boot-command=boot
boot-file: data not available.
boot-device=/pci@8,600000/SUNW,qlc@4/fp@0,0/disk@w2100002037e3e93e,0:a disk
nvramrc=devalias net /pci@8,700000/network@3
devalias vx-disk01 /pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w2100002037e3d51b,0:a
devalias vx-rootdisk /pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w2100002037e3e93e,0:a
error-reset-recovery=boot
Objective 2 Troubleshooting Exercise #1 Part C
Same customer system is now booted, vxdiskadm option 5 completes (no errors) without
bringing the disk out of the removed state.
# vxdisk list
- - rootdisk rootdg removed was:c1t0d0s2
When we try to manually bring back c1t0d0 (rootdisk) we get following error.
# vxdg -g rootdg -k adddisk rootdisk=c1t0d0s2

vxvm:vxdg: ERROR: associating disk-media rootdisk with c1t0d0s2:
Disk public region is too small
Why?
How would you fix this?
NOTE: see following pages for needed outputs

DG NAME NCONFIG NLOG MINORS GROUP-ID
DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE
RV NAME RLINK_CNT KSTATE STATE PRIMARY DATAVOLS SRL
RL NAME RVG KSTATE STATE REM_HOST REM_DG REM_RLNK
V NAME RVG KSTATE STATE LENGTH READPOL PREFPLEX UTYPE
PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE
SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE
SV NAME PLEX VOLNAME NVOLLAYR LENGTH [COL/]OFF AM/NM MODE
dg rootdg default default 0 992973065.1025.ralph
dm disk01 c1t1d0s2 sliced 2888 71124291 -

dm rootdisk - - - - REMOVED
sd c1t0d0s2Priv - rootdisk 4194828 2888 PRIVATE - RMOV
v crash - ENABLED ACTIVE 4194828 ROUND - fsgen

pl crash-01 crash DISABLED REMOVED 4194828 CONCAT - WO
sd c1t0d0s2-04 crash-01 rootdisk 22875101 4194828 0 - RMOV
pl crash-02 crash ENABLED ACTIVE 4194828 CONCAT - RW
sd disk01-01 crash-02 disk01 0 4194828 0 c1t1d0 ENA
v home - ENABLED ACTIVE 44057250 ROUND - fsgen

pl home-01 home DISABLED REMOVED 44057250 CONCAT - WO
sd c1t0d0s2-03 home-01 rootdisk 27069929 44057250 0 - RLOC
pl home-02 home ENABLED ACTIVE 44057250 CONCAT - RW
sd disk01-02 home-02 disk01 4194828 44057250 0 c1t1d0 ENA
v rootvol - ENABLED ACTIVE 4194828 ROUND - root

pl rootvol-01 rootvol DISABLED REMOVED 4194828 CONCAT - RW
sd c1t0d0s2-B0 rootvol-01 rootdisk 4194827 1 0 - RMOV
sd c1t0d0s2-02 rootvol-01 rootdisk 0 4194827 1 - RLOC
pl rootvol-02 rootvol ENABLED ACTIVE 4194828 CONCAT - RW
sd disk01-03 rootvol-02 disk01 48252078 4194828 0 c1t1d0 ENA
v swapvol - ENABLED ACTIVE 16579971 ROUND - swap

pl swapvol-01 swapvol DISABLED REMOVED 16579971 CONCAT - WO
sd c1t0d0s2-01 swapvol-01 rootdisk 4197716 16579971 0 - RMOV
pl swapvol-02 swapvol ENABLED ACTIVE 16579971 CONCAT - RW
sd disk01-04 swapvol-02 disk01 52446906 16579971 0 c1t1d0 ENA
v var - ENABLED ACTIVE 2097414 ROUND - fsgen

pl var-01 var DISABLED REMOVED 2097414 CONCAT - WO
sd c1t0d0s2-05 var-01 rootdisk 20777687 2097414 0 - RLOC
pl var-02 var ENABLED ACTIVE 2097414 CONCAT - RW
sd disk01-05 var-02 disk01 69026877 2097414 0 c1t1d0 ENA
#Root disk partition
* /dev/rdsk/c1t0d0s2 partition map
*
* Dimensions:
* 512 bytes/sector
* 107 sectors/track
* 27 tracks/cylinder
* 2889 sectors/cylinder
* 24622 cylinders
* 24620 accessible cylinders
*
* Flags:
* 1: unmountable
* 10: read-only
*
* First Sector Last
* Partition Tag Flags Sector Count Sector Mount Directory
2 5 01 0 71127180 71127179
3 15 01 0 2889 2888
4 14 01 2889 71124291 71127179
<root@ralph:/tmp>
#
#Mirror disk partition

*
* Dimensions:
* 512 bytes/sector
* 107 sectors/track
* 27 tracks/cylinder
* 2889 sectors/cylinder
* 24622 cylinders
* 24620 accessible cylinders
*
* Flags:
* 1: unmountable
* 10: read-only
*
* Unallocated space:
* First Sector Last
* Sector Count Sector
* 71127180 4272095083 48254966
*
* First Sector Last
0 2 00 48254967 4194828 52449794
1 3 01 52449795 16579971 69029765
2 5 01 0 71127180 71127179
3 15 01 0 2889 2888
4 14 01 2889 71124291 71127179
6 7 00 69029766 2097414 71127179
# vxdisk list c1t0d0s2
Device: c1t0d0s2
devicetag: c1t0d0
type: sliced
hostid:
disk: name= id=1108863756.2353.ralph
group: name= id=
info: privoffset=1
flags: online ready private autoconfig autoimport
pubpaths: block=/dev/vx/dmp/c1t0d0s4 char=/dev/vx/rdmp/c1t0d0s4
privpaths: block=/dev/vx/dmp/c1t0d0s3 char=/dev/vx/rdmp/c1t0d0s3
version: 2.1
iosize: min=512 (bytes) max=2048 (blocks)
public: slice=4 offset=0 len=71124291
private: slice=3 offset=1 len=2888
update: time=1108863756 seqno=0.1
headers: 0 248
configs: count=1 len=2112
logs: count=1 len=320
Defined regions:
config priv 000017-000247[000231]: copy=01 offset=000000 disabled
config priv 000249-002129[001881]: copy=01 offset=000231 disabled
log priv 002130-002449[000320]: copy=01 offset=000000 disabled
<root@ralph:/>
#
#
#
# vxdisk list c1t1d0s2
Device: c1t1d0s2
devicetag: c1t1d0
type: sliced
hostid: ralph
disk: name=disk01 id=992973519.1091.ralph
group: name=rootdg id=992973065.1025.ralph
flags: online ready private autoconfig autoimport imported
privpaths: block=/dev/vx/dmp/c1t1d0s3 char=/dev/vx/rdmp/c1t1d0s3
version: 2.1
public: slice=4 offset=0 len=71124291
private: slice=3 offset=1 len=2888
headers: 0 248
Defined regions:
config priv 000017-000247[000231]: copy=01 offset=000000 enabled
log priv 002130-002449[000320]: copy=01 offset=000000 enabled
Objective 2 Troubleshooting Exercise #1 Answer Sheet
Part A
c1t0d0 (rootdisk) was proactively removed from Veritas and was replaced, but vxdiskadm option 5
completes (no errors) without bring disk out of removed state?
Why? duplicate entries in vxdisk list
What would you do to fix this? reboot or follow infodoc 70929 (luxadm offline)
What would you do to keep this from happening again? Installed Patch 113201-05
Part B
Same customer reboots but system fails to come up.
Why? boot-device is set to wwn of replaced disk
What would you do to get around this?boot off mirror
What would you do to keep this from happening again? use luxadm command to set boot-device to new
disk wwn
# luxadm -v set_boot_dev /devices//pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w2100002037e3d51b,0:a
Current boot-device = /pci@8,600000/SUNW,qlc@4/fp@0,0/disk@w2100002037e3e93e,0:a disk

New boot-device = /pci@8,600000/SUNW,qlc@4/fp@0,0/disk@w2100002037e3d51b,0:a
Do you want to change boot-device to the new setting? (y/n) y
<root@ralph:/tmp>
# eeprom boot-device
boot-device=/pci@8,600000/SUNW,qlc@4/fp@0,0/disk@w2100002037e3d51b,0:a
<root@ralph:/tmp>
#
Part C
Same customer system is now booted, vxdiskadm option 5 completes (no errors) without bring disk out of
removed state.
When we try to manually bring back c1t0d0 (rootdisk) we get following error.
# vxdg -g rootdg -k adddisk rootdisk=c1t0d0s2
vxvm:vxdg: ERROR: associating disk-media rootdisk with c1t0d0s2:
Why?
From what I could gather from Veritas Knowledge Base for a encapsulated primary boot disk where the
special subdisk to protect the private region is created; if something other than "rootdisk" is used for boot
disk or something other than "rootvol" is used for the root filesystem volume, using vxdiskadm option 5 will
not work since Veritas will not ignore the special subdisk in this case called "c1t0d0s2Priv"
We can see from the vxprint that the dm list rootdisk but the subdisk in rootvol are c1t0d0
How would you fix this? 2 options to recover
#1 Document ID:73801
Title:VERITAS Volume Manager 3.2 or higher: During replacement of a disk, the error "Disk public region
is too small" is displayed
# vxdg -g <dg> -p -k adddisk rootdisk=c#t#d#
Using this command can be helpful, especially on a root disk, because the subdisk offsets get skewed,
disallowing the same subdisks to be created on the new disk in the same locations as on the previous disk.
The '-p' option disregards the offsets of the subdisks' starting points, allowing the subdisks to be created on
the replacement disk.
#2 this will clear up name issue and resolve issue
disassociate original bootdisk plexes and recursively remove them, then remove subdisk "c1t0d0s2Priv"
then rootdisk, then initialize rootdisk and remirror
# vxplex -f -g rootdg dis rootvol-01
# vxedit -rf rm rootvol-01
# vxplex -f -g rootdg dis crash-01
# vxedit -rf rm crash-01
# vxplex -f -g rootdg dis home-01
# vxedit -rf rm home-01
# vxplex -f -g rootdg dis swapvol-01
# vxedit -rf rm swapvol-01
# vxplex -f -g rootdg dis var-0
# vxedit -rf rm var-01
# vxedit -sf rm c1t0d0s2Priv
# vxedit -rf rm rootdisk
# vxdctl enable
DG NAME NCONFIG NLOG MINORS GROUP-ID

DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE
RV NAME RLINK_CNT KSTATE STATE PRIMARY DATAVOLS SRL
RL NAME RVG KSTATE STATE REM_HOST REM_DG REM_RLNK
V NAME RVG KSTATE STATE LENGTH READPOL PREFPLEX UTYPE
PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE
SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE
SV NAME PLEX VOLNAME NVOLLAYR LENGTH [COL/]OFF AM/NM MODE
dg rootdg default default 0 992973065.1025.ralph
v crash - ENABLED ACTIVE 4194828 ROUND - fsgen

pl crash-02 crash ENABLED ACTIVE 4194828 CONCAT - RW
sd disk01-01 crash-02 disk01 0 4194828 0 c1t1d0 ENA
v home - ENABLED ACTIVE 44057250 ROUND - fsgen

pl home-02 home ENABLED ACTIVE 44057250 CONCAT - RW
sd disk01-02 home-02 disk01 4194828 44057250 0 c1t1d0 ENA
v rootvol - ENABLED ACTIVE 4194828 ROUND - root

pl rootvol-02 rootvol ENABLED ACTIVE 4194828 CONCAT - RW
sd disk01-03 rootvol-02 disk01 48252078 4194828 0 c1t1d0 ENA
v swapvol - ENABLED ACTIVE 16579971 ROUND - swap

pl swapvol-02 swapvol ENABLED ACTIVE 16579971 CONCAT - RW
sd disk01-04 swapvol-02 disk01 52446906 16579971 0 c1t1d0 ENA
v var - ENABLED ACTIVE 2097414 ROUND - fsgen

pl var-02 var ENABLED ACTIVE 2097414 CONCAT - RW
sd disk01-05 var-02 disk01 69026877 2097414 0 c1t1d0 ENA
<root@ralph:/>
#>
then initialize back in and mirror using vxdiskadm option 6
use default names, reserved names for bootdisk & os volume, no hotsparing on rootdisk
Objective 3 Troubleshooting Exercise #2 Part A
The Customer attempted to grow a file system and volume from 2 gb to 12 gb since it was almost full.
Command typed in by the customer...
vxresize -bx -F vxfs -g iodg ccs 25161728 iodg03 iodg04
df -k shows it succeeded, but the volume is still at 2 gb in size, and can not create large files in file system.
They can create small files though.
df -k
Filesystem kbytes used avail capacity Mounted on
/dev/vx/dsk/iodg/ccs 12580864 1071444 10790204 10% /ccs
12,580,864 k (12 gb)
Disk group: iodg

...
v ccs - ENABLED ACTIVE 4194304 SELECT - fsgen
pl ccs-01 ccs ENABLED ACTIVE 4195072 CONCAT - RW
sd iodg01-02 ccs-01 iodg01 4195072 4195072 0 Disk_10 ENA
sd iodg02-02 ccs-02 iodg02 16778496 4195072 0 Disk_44 ENA
4194304 (vxprint default unit are sectors) x 512 (bytes per sector)= 2,147,483,648 (2 gb)
Why?

actions taken
tries vxdctl enable and vxconfigd -k -x cleartempdir - no change
checked file system orginal creation options using " mkfs -F vxfs -m"
unix> mkfs -F vxfs -m /dev/vx/rdsk/iodg/ccs
mkfs -F vxfs -o ninode=unlimited,bsize=1024,version=5,inosize=256,logsize=16384,nolargefiles /

dev/vx/rdsk/iodg/ccs 25161728
unix>
checked sunsolve checked patches
veritas recommended first reducing file system size back to original size
root@ndwest # cd /
root@ndwest # fsadm -F vxfs -b 4194304 /ccs
UX:vxfs fsadm: INFO: /dev/vx/rdsk/iodg/ccs is currently 25161728 sectors - size will be reduced
root@ndwest # df -k /ccs
Filesystem kbytes used avail capacity Mounted on
/dev/vx/dsk/iodg/ccs 2097152 1072970 960293 53% /ccs
root@ndwest # cd /ccs
root@ndwest # touch john
root@ndwest # rm john
root@ndwest # cd /
next try to grow volume by itself then file system separately
root@ndwest # vxassist -g iodg growto ccs 25161728 iodg03 iodg04

vxvm:vxassist: WARNING: dm:iodg03: No disk space matches specification
vxvm:vxassist: WARNING: dm:iodg04: No disk space matches specification
command completed even with messages and vxprint shows correct new volume size
vxprint -ht
v ccs - ENABLED ACTIVE 25161728 SELECT - fsgen
sd 603e_503-03 ccs-01 603e_503 4195072 4195072 0 Disk_10 ENA
sd 603W_1299-01 ccs-01 603W_1299 0 20967040 4195072 Disk_52 ENA
sd 603W_1022-03 ccs-02 603W_1022 16778496 4195072 0 Disk_44 ENA
sd 603E_299-01 ccs-02 603E_299 0 20967040 4195072 Disk_53 ENA
next sse successfully grew file system as well
issue resolved
Objective 4 Troubleshooting Exercise #3
Customer temporarily lost access to several fibre disks, now unable to access several file systems even
though df -k show them mounted
gemini:#vxdisk -o alldgs list

c0t0d0s2 sliced rootdisk rootdg online
c0t8d0s2 sliced rootmirror rootdg online nohotuse
c6t0d1s2 sliced whsdthordg01 whsdthordg online
c6t0d2s2 sliced whsdthordg02 whsdthordg online
c6t0d4s2 sliced - (prodthordg) online
c6t0d6s2 sliced - (prodthordg) online
c6t0d10s2 sliced prodthordg03 prodthordg online dgdisabled
c6t0d11s2 sliced prodthordg00 prodthordg online dgdisabled
c6t0d20s2 sliced tmp-new-oraapplokidg00 oraapplokidg online
c6t0d21s2 sliced tmp-new-ncamplokidg00 ncamplokidg online
c6t0d22s2 sliced tmp-new-data1lokidg00 data1lokidg online
c6t0d60s2 sliced data1thordg00 data1thordg online
gemini:#vxdg list
NAME STATE ID
rootdg enabled 1067639375.1025.gemini
data1lokidg enabled 1092461553.2008.loki.recycled-greetings.com
data1thordg enabled 1084465351.1614.thor-new
ncamplokidg enabled 1092461211.1995.loki.recycled-greetings.com
oraapplokidg enabled 1092461176.1986.loki.recycled-greetings.com
prodthordg disabled 1091985480.1979.thor-new
whsdthordg enabled 1091985481.1982.thor-new
gemini:#vxprint -htg prodthordg

vxvm:vxprint: ERROR: Disk group prodthordg: No such disk group
Why?
- grep prodthordg /etc/vfstab
- lockfs -fhv /mountpoint
- fuser -kc /mountpoint
- umount /mountpoint
- vxdg deport prodthordg
- vxdctl enable
- vxdg import prodthordg

Customer Upgrade from Solaris 8 to Solaris 9 using Live Upgrade, customer then tried to boot off disk with
Solaris 9 and got following message:
...
SunOS Release 5.9 Version Generic_117171-02 64-bit
Copyright 1983-2003 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
WARNING: vxio: incompatible kernel version (5.9), expecting 5.8 <<<<<<<<<<<<
WARNING: forceload of misc/md_trans failed
WARNING: forceload of misc/md_raid failed

...
Why?
why? veritas still using sol 8 driver
What would you do to fix this? had customer to pkgrm & pkgadd VRTSvxvm so veritas would now use
correct driver
A5200 has hung, customer had to recycle array to regain access to disks.
RAID5 volumes are in "DISABLED ACTIVE" state and show several disks in "RCOV"
state
Reviewed of /var/adm/messages file and root emails from veritas confirm lose of access
to several disks at same time
How would you address this issue so as to have best chance of recovering data?
Volume " standby"
v standby - DISABLED ACTIVE 640092800 RAID - raid5

pl findata-01 standby DISABLED ACTIVE 640115520 RAID 11/32 RW
sd bcpsdg08-01 findata-01 bcpsdg08 0 64011573 0/0 c4t6d0 ENA
sd bcpsdg18-01 findata-01 bcpsdg18 0 64011573 5/0 c6t22d0 RCOV
Volume "swrdata"
v swrdata - DETACHED REPLAY 640092800 RAID - raid5

pl swrdata-01 swrdata ENABLED ACTIVE 640115520 RAID 11/32 RW
sd bcpsdg01-01 swrdata-01 bcpsdg01 0 64011573 0/0 c4t0d0 ENA
sd bcpsdg12-01 swrdata-01 bcpsdg12 0 64011573 6/0 c6t16d0 RCOV
pl swrdata-02 swrdata ENABLED LOG 11556 CONCAT - RW <<<< log plex
sd bcpsdg08-02 swrdata-02 bcpsdg08 64011573 11556 0 c4t6d0 ENA
NOTE: actual SUN customer case 64395069 where veritas support was involved, also referencing doc from
veritas http://seer.support.veritas.com/docs/251793.htm and sun internal document on raid5 volume
recovery procedures.
see Document ID:79162 Title:VERITAS Volume Manager: Recovering a RAID5 Volume State After a
Disk Channel Failure
How would you address this issue so as to have best chance of recovering data?
actions taken to TRY to recover raid5 volumes

- verify connectivity issue resolved and OS (ie. format) can see all disks in a good state
#### NOTE: do not run format > anaylsis > refresh on a raid5 ####
- vxconfigd -k -x cleartempdir, vxdctl enable to verify we have most uptodate info

- verify vxdisk list shows all relevant disks online now
- check vxprint -htrLg bcpsdg output to see if still have volumes with several subdisks in RCOV state
- run vxtask -l -g <dg> list to verify no current recovery processes running on it
- first try /etc/vx/bin/vxreattach -br <<< give it a few minutes to complete
- run vxtask -l -g <dg> list to see if any resyncs started
- if no change
# vxvol -g <diskgroup> -o delayrecover start <volume>
if this does not work then use force option even though document states not to force start volume because
this is a situation where VxVM lost access to several disks at same time which is not covered in document.
# vxvol -g <diskgroup> -f -o delayrecover start <volume>
delayrecover
Does not perform any plex revive operations when starting a volume. Instead, the volume and any plexes
are enabled. This may leave some stale plexes, and may leave a mirrored volume in a special read-
writeback (NEEDSYNC) recover state that performs limited plex recovery for each read to the volume.
-. check file system integrity
# fsck -y -F <file sys type> /dev/vx/rdsk/<diskgroup>/<volume>
- mount and check data in file system
- if data is good, then recommended a full backup be done immediately
NOTE: For RAID 0 volumes and RAID 5 volumes without LOG plexes
the use of the "-f" option of the vxvol command will be necessary
to start the volume or the log plex will have to be disassociated then reattach
after starting volume
NOTE: For RAID 1+0, known as Striped Pro and Concat Pro volume types
you will need to run "vxrecover -s -E <volume>" AFTER all underlying layered
volumes have been repaired.
$LOGNAME@sunbcpsunx001$PWD# vxvol -g bcpsdg -o delayrecover start standby
vxvm:vxvol: ERROR: Volume standby is not startable; some subdisks are unusable and the parity is stale
vxvm:vxvol: ERROR: Volume standby is invalid
$LOGNAME@sunbcpsunx001$PWD#
veritas advised customer to force start volume by running
vxvol -g <dg> -f -o delayrecover start standby
v standby - ENABLED NEEDSYNC 640092800 RAID - raid5

pl findata-01 standby ENABLED ACTIVE 640115520 RAID 11/32 RW
customer ran fsck then cleared flag and was able to mount the vxfs file system in this volume
root@sunbcpsunx001/standby#vxvol -g bcpsdg -o delayrecover start swrdata

vxvm:vxvol: ERROR: Volume swrdata is not startable; Raid5 plex does not map the entire volume length
dissassociated the log plex
then ran vxvol -g <dg> -f -o delayrecover start swrdata
enabled needsync now
attached log plex
v swrdata - ENABLED NEEDSYNC 640092800 RAID - raid5

pl swrdata-01 swrdata ENABLED ACTIVE 640115520 RAID 11/32 RW
pl swrdata-02 swrdata ENABLED LOG 11556 CONCAT - RW
sd bcpsdg08-02 swrdata-02 bcpsdg08 64011573 11556 0 c4t6d0 ENA
fsck file system and mounted file system

vxvm:vxvol: ERROR: Volume <vol_name> has no CLEAN or non-volatile ACTIVE plexes
Starting a volume reports the error above. The vxprint output shows that the plexes for the volume are in
"DISABLED RECOVER" state.
Here is an example:
The disk group dg01 has 2 volumes, apps and home. Trying to start all the volumes reported the following
error:
# vxvol -g dg01 startall

vxvm:vxvol: ERROR: Volume home has no CLEAN or non-volatile ACTIVE plexes
# vxprint -g dg01 -th <== Showed the following

...
dg dg01 2 2 123000 1021305687.1295.obp1
dm appsdisk c0t1d0s2 sliced 11555 71112735 -

dm appsmirror c1t1d0s2 sliced 11555 71112735 -
dm homedisk c2t0d0s2 sliced 14135 35349424 -
dm homemirror c3t0d0s2 sliced 14135 35349424 -
v apps - ENABLED ACTIVE 70840320 SELECT - fsgen

pl apps-01 apps ENABLED ACTIVE 70841169 CONCAT - RW
sd appsdisk-01 apps-01 appsdisk 0 70841169 0 c0t1d0s2 ENA
pl apps-02 apps ENABLED ACTIVE 70841169 CONCAT - RW
sd appsmirror-01 apps-02 appsmirror 0 70841169 0 c1t1d0s2 ENA
v home - DISABLED ACTIVE 16896000 SELECT - fsgen

pl home-01 home DISABLED RECOVER 16897232 CONCAT - RW
sd homedisk-01 home-01 homedisk 0 16897232 0 c2t0d0 ENA
pl home-02 home DISABLED RECOVER 16897232 CONCAT - RW
sd h omemirror-01 home-02 homemirror 0 16897232 0 c3t0d0 ENA
- If both of the Plexes in a Volume are in RECOVER state, it is recommened to stop one plex, force start
the Volume, and check the filesystem data. Similarly, stop the volume, stop the previously started plex, start
the previously stopped plex, force start the volume and check the filesystem data. Compare which data is
recent and change the state of the Plexes accordingly to synchronize.
# vxmend -o force off home-02
# vxmend on home-01
# vxmend fix clean home-01
The volume will then start successfully using the cleaned plex:
# vxrecover -s -g dg01 home
# fsck -F <fs type> /dev/vx/rdsk/<diskgroup>/<volume>
# mount -F <fs type> /dev/vx/dsk/<diskgroup>/<volume> /mountpoint
check data
check other side by
# vxvol stop home
# vxmend on home-02
# vxmend fix clean home-02
The volume will then start successfully using the cleaned plex:
# vxrecover -s -g dg01 home
# fsck -F <fs type> /dev/vx/rdsk/<diskgroup>/<volume>
# mount -F <fs type> /dev/vx/dsk/<diskgroup>/<volume> /mountpoint
check data
then decided which side to resync from

Issue:
- getting "...Disk public region is too small..." error when bring new/replacemnet
disk back under Veritas control
...
Select a removed or failed disk [<disk>,list,q,?] disk01
Select disk device to initialize [<address>,list,q,?] c0t11d0
The requested operation is to initialize disk device c0t11d0 and

to then use that device to replace the removed or failed disk
disk01 in disk group crackle_bkup.
Continue with operation? [y,n,q,?] (default: y)
Use fastresync for plex synchronization? [y,n,q,?] (default: n)
Use a default private region length for the disk?

[y,n,q,?] (default: y)
Replacement of disk disk01 in group crackle_bkup with device c0t11d0

failed.
vxvm:vxdg: ERROR: associating disk-media disk01 with c0t11d0s2:
Replace a different disk? [y,n,q,?] (default: n)
Why?

# vxprint -htrLg crackle_bkup
dg crackle_bkup default default 126000 1101067439.1662.crackle
dm disk01 - - - - REMOVED
v swapvol2 - DISABLED ACTIVE 35352576 SELECT - fsgen

pl swapvol2-01 swapvol2 DISABLED REMOVED 35353395 CONCAT - RW
sd disk01-01 swapvol2-01 disk01 3590 17678493 0 - RMOV
sd disk02-01 swapvol2-01 disk02 0 17674902 17678493 c0t13d0 ENA
prtvtoc /dev/rdsk/c0t11d0s2
*
...
* First Sector Last
2 5 00 0 17682084 17682083
3 15 01 3591 3591 7181
4 14 01 7182 17674902 17682083
# prtvtoc /dev/rdsk/c0t13d0s2
...
* First Sector Last
2 5 00 0 17682084 17682083
3 15 01 3591 3591 7181
4 14 01 7182 17674902 17682083
# vxconfigd -k -x cleartempdir
# /usr/sbin/vxdctl enable
# vxdisk -o alldgs list

c0t12d0s2 sliced rootmirr rootdg online
c0t13d0s2 sliced disk02 crackle_bkup online
c10t0d1s2 sliced CRACKLE01 CRACKLEdg online
...
- - disk01 crackle_bkup removed was:c0t11d0s2
2 options to recover
#1 Document ID:73801
Title:VERITAS Volume Manager 3.2 or higher: During replacement of a disk, the error "Disk public region
is too small" is displayed
# vxdg -g <dg> -p -k adddisk rootdisk=c#t#d#
Using this command can be helpful, especially on a root disk, because the subdisk offsets get skewed,
disallowing the same subdisks to be created on the new disk in the same locations as on the previous disk.
The '-p' option disregards the offsets of the subdisks' starting points, allowing the subdisks to be created on
the replacement disk.
#2 since data was already gone we were able to get around issue by:
-deleting volume swapvol2 then disk01
- then reinitializing disk01 back into diskgroup
- then recreate volume
# vxedit -g crackle_bkup -fr rm swapvol2
# vxedit -g crackle_bkup rm disk01
# vxdisk list
...
# vxprint -htg crackle_bkup

# ./vxdisksetup -i c0t11d0
# prtvtoc /dev/rdsk/c0t11d0s2
2 5 00 0 17682084 17682083
3 15 01 3591 3591 7181
4 14 01 7182 17674902 17682083
# vxdg -g crackle_bkup adddisk disk01=c0t11d0
# vxassist -g crackle_bkup maxsize

Maximum volume size: 35348480 (17260Mb)
# vxassist -g crackle_bkup make swapvol2 35348480
# vxprint -htg crackle_bkup

v swapvol2 - ENABLED ACTIVE 35348480 SELECT - fsgen
pl swapvol2-01 swapvol2 ENABLED ACTIVE 35349804 CONCAT - RW
# vxdisk list
...
Objective 9 Information on Split Brain
cannot get disk(s) to re-attach getting the following message: VxVM 4.X./vxreattach -r c#t#d#
VxVM vxdg ERROR V-5-1-10127 associating disk-media vistadg101 with c#t#d#s2:

Serial Split Brain detected. Run vxsplitlines
OR
VxVM vxconfigd NOTICE V-5-0-33 Split Brain. da id is 0.1, while dm id is 0.0 for DM <dg name>
VxVM vxdg ERROR V-5-1-587 Disk group <dg name>: import failed: Serial Split Brain detected. Run
vxsplitlines
Veritas Document ID: 269233 http://support.veritas.com/docs/269233

How to recover from a serial split brain
Details:
Background:
The Serial Split Brain condition arises because VERITAS Volume Manager (tm) increments the serial ID in
the disk media record of each imported disk in all the disk group configurations on those disks. A new serial
(SSB) ID has been included as part of the new disk group version=110 in Volume Manager 4 to assist with
recovery of the disk group from this condition. The value that is stored in the configuration database
represents the serial ID that the disk group expects a disk to have. The serial ID that is stored in a disk's
private region is considered to be its actual value.
If some disks went missing from the disk group (due to physical disconnection or power failure) and those
disks were imported by another host, the serial IDs for the disks in their copies of the configuration
database, and also in each disk's private region, are updated separately on that host. When the disks are
subsequently reimported into the original shared disk group, the actual serial IDs on the disks do not agree
with the expected values from the configuration copies on other disks in the disk group.
The disk group cannot be reimported because the databases do not agree on the actual and expected serial
IDs. You must choose which configuration database to use. This is a true serial split brain condition, which
Volume Manager cannot correct automatically. In this case, the disk group import fails, and the vxdg utility
outputs error messages similar to the following before exiting:
VxVM vxconfigd NOTICE V-5-0-33 Split Brain. da id is 0.1, while dm id is 0.0 for DM <dg name>
VxVM vxdg ERROR V-5-1-587 Disk group <dg name>: import failed: Serial Split Brain detected. Run
vxsplitlines
The import does not succeed even if you specify the -f flag to vxdg.
Although it is usually possible to resolve this conflict by choosing the version of the configuration database
with the highest valued configuration ID (shown as config_tid in the output from the vxprivutil dumpconfig
<device>), this may not be the correct thing to do in all circumstances.
To resolve conflicting configuration information, you must decide which disk contains the correct version
of the disk group configuration database. To assist you in doing this, you can run the vxsplitlines command
to show the actual serial ID on each disk in the disk group and the serial ID that was expected from the
configuration database. For each disk, the command also shows the vxdg command that you must run to
select the configuration database copy on that disk as being the definitive copy to use for importing the disk
group.
The following example shows the result of JBOD losing access to one of the four disks in the disk group:
c2t1d0s2 auto:cdsdisk - (dgD280silo1) online
c2t2d0s2 auto:cdsdisk d2 dgD280silo1 online
- - d1 dgD280silo1 failed was:c2t1d0s2
# vxreattach -c c2t1d0s2
dgD280silo1 d1
# vxreattach -br c2t1d0s2

VxVM vxdg ERROR V-5-1-10127 associating disk-media d1 with c2t1d0s2:
# vxsplitlines -g dgD280silo1
VxVM vxsplitlines NOTICE V-5-2-2708 There are 1 pools.
The Following are the disks in each pool. Each disk in the same pool
has config copies that are similar.
VxVM vxsplitlines INFO V-5-2-2707 Pool 0.
c2t1d0s2 d1
To see the configuration copy from this disk issue /etc/vx/diag.d/vxprivutil dumpconfig /
dev/vx/dmp/c2t1d0s2
To import the diskgroup with config copy from this disk use the following command;
/usr/sbin/vxdg -o selectcp=1092974296.21.gopal import dgD280silo1
The following are the disks whose serial split brain (SSB) IDs don't match in this configuration copy:
d2
At this stage, you need to gain confidence prior to running the recommended command by generating the
following outputs :
In this example, the disk group split so that one disk (d1) appears to be on one side of the split. You can
specify the -c option to vxsplitlines to print detailed information about each of the disk IDs from the
configuration copy on a disk specified by its disk access name:
# vxsplitlines -g dgD280silo1 -c c2t3d0s2
VxVM vxsplitlines INFO V-5-2-2701 DANAME(DMNAME) || Actual SSB || Expected SSB

VxVM vxsplitlines INFO V-5-2-2700 c2t1d0s2( d1 ) || 0.0 || 0.0 ssb ids match
VxVM vxsplitlines INFO V-5-2-2700 c2t2d0s2( d2 ) || 0.1 || 0.0 ssb ids don't match
VxVM vxsplitlines INFO V-5-2-2706
This output can be verified by using vxdisk list on each disk. A summary is shown below:
# vxdisk list c2t1d0s2 # vxdisk list c2t3d0s2

Device: c2t1d0s2 Device: c2t3d0s2
disk: name= id=1092974296.21.gopal disk: name=d3 id=1092974311.23.gopal
group: name=dgD280silo1 id=1095738111.20.gopal group: name=dgD280silo1 id=1095738111.20.gopal
ssb: actual_seqno=0.0 ssb: actual_seqno=0.1
# vxdisk list c2t2d0s2 # vxdisk list c2t9d0s2
Device: c2t2d0s2 Device: c2t9d0s2
disk: name=d2 id=1092974302.22.gopal disk: name=d4 id=1092974318.24.gopal
group: name=dgD280silo1 id=1095738111.20.gopal group: name=dgD280silo1 id=1095738111.20.gopal
ssb: actual_seqno=0.1 ssb: actual_seqno=0.1
Note that though some disks SSB IDs might match that does not necessarily mean that those disks' config
copies have all the changes. From some other configuration copies, those disks' SSB IDs might not match.
To see the configuration from this disk, run

/etc/vx/diag.d/vxprivutil dumpconfig /dev/rdsk/c2t3d0s2 > dumpconfig_c2t3d0s2
If the other disks in the disk group were not imported on another host, Volume Manager resolves the
conflicting values of the serial IDs by using the version of the configuration database from the disk with the
greatest value for the updated ID (shown as update_tid in the output from /etc/vx/diag.d/vxprivutil
dumpconfig /dev/rdsk/<device>).
In this example , looking through the dumpconfig, there are the following update_tid and ssbid values:
dumpconfig c2t3d0s2 dumpconfig c2t9d0s2

config:tid=0.1058 config:tid=0.1059
dm d1 dm d1
update_tid=0.1038 update_tid=0.1059
ssbid=0.0 ssbid=0.0
dm d2 dm d2
ssbid=0.0 ssbid=0.0
dm d3 dm d3
ssbid=0.0 ssbid=0.0
dm d4 dm d4
ssbid=0.0 ssbid=0.1
Using the output from the dumpconfig for each disk determines which config output to use by running the
command:
# cat dumpconfig_c2t3d0s2 | vxprint -D - -ht
Before deciding on which option to use for import, ensure the disk group is currently in a valid deport
state:

At this stage, your knowledge of how the serial split brain condition came about may be a little clearer and
you should have chosen a configuration from one disk to be used to import the disk group. In this example,
the following command imports the disk group using the configuration copy from d2:
# /usr/sbin/vxdg -o selectcp=1092974302.22.gopal import dgD280silo1
Once the disk group has been imported, Volume Manager resets the serial IDs to 0 for the imported disks.
The actual and expected serial IDs for any disks in the disk group that are not imported at this time remain
unchanged.
# vxprint -htg dgD280silo1

dg dgD280silo1 default default 26000 1095738111.20.gopal
dm d1 c2t1d0s2 auto 2048 35838448 -
dm d2 c2t2d0s2 auto 2048 35838448 -
dm d3 c2t3d0s2 auto 2048 35838448 -
dm d4 c2t9d0s2 auto 2048 35838448 -
v SNAP-vol_db2silo1.1 - DISABLED ACTIVE 1024000 SELECT - fsgen

pl SNAP-vol_db2silo1.1-01 SNAP-vol_db2silo1.1 DISABLED ACTIVE 1024000 STRIPE 2/1024 RW
sd d3-01 SNAP-vol_db2silo1.1-01 d3 0 512000 0/0 c2t3d0 ENA
dc SNAP-vol_db2silo1.1_dco SNAP-vol_db2silo1.1 SNAP-vol_db2silo1.1_dcl
v SNAP-vol_db2silo1.1_dcl - DISABLED ACTIVE 544 SELECT - gen
pl SNAP-vol_db2silo1.1_dcl-01 SNAP-vol_db2silo1.1_dcl DISABLED ACTIVE 544 CONCAT - RW
sd d3-02 SNAP-vol_db2silo1.1_dcl-01 d3 512000 544 0 c2t3d0 ENA
v orgvol - DISABLED ACTIVE 1024000 SELECT - fsgen

pl orgvol-01 orgvol DISABLED ACTIVE 1024000 STRIPE 2/128 RW
sd d1-01 orgvol-01 d1 0 512000 0/0 c2t1d0 ENA
# vxrecover -g dgD280silo1 -sb

# mount -F vxfs /dev/vx/dsk/dgD280silo1/orgvol /orgvol
UX:vxfs mount: ERROR: V-3-21268: /dev/vx/dsk/dgD280silo1/orgvol is corrupted. needs checking
# fsck -F vxfs /dev/vx/rdsk/dgD280silo1/orgvol
log replay in progress
replay complete - marking super-block as CLEAN
# mount -F vxfs /dev/vx/dsk/dgD280silo1/orgvol /orgvol
# df /orgvol
/orgvol (/dev/vx/dsk/dgD280silo1/orgvol): 1019102 blocks 127386 files

# vxprint -htg dgD280silo1
dg dgD280silo1 default default 26000 1095738111.20.gopal
dm d1 c2t1d0s2 auto 2048 35838448 -

dm d2 c2t2d0s2 auto 2048 35838448 -
dm d3 c2t3d0s2 auto 2048 35838448 -
dm d4 c2t9d0s2 auto 2048 35838448 -
v SNAP-vol_db2silo1.1 - ENABLED ACTIVE 1024000 SELECT SNAP-vol_db2silo1.1-01 fsgen

pl SNAP-vol_db2silo1.1-01 SNAP-vol_db2silo1.1 ENABLED ACTIVE 1024000 STRIPE 2/1024 RW
dc SNAP-vol_db2silo1.1_dco SNAP-vol_db2silo1.1 SNAP-vol_db2silo1.1_dcl
v SNAP-vol_db2silo1.1_dcl - ENABLED ACTIVE 544 SELECT - gen
pl SNAP-vol_db2silo1.1_dcl-01 SNAP-vol_db2silo1.1_dcl ENABLED ACTIVE 544 CONCAT - RW
sd d3-02 SNAP-vol_db2silo1.1_dcl-01 d3 512000 544 0 c2t3d0 ENA
v orgvol - ENABLED ACTIVE 1024000 SELECT orgvol-01 fsgen

pl orgvol-01 orgvol ENABLED ACTIVE 1024000 STRIPE 2/128 RW
Supported Features in Disk Group Version 110
There is limited documentation on ssb. The previous tech note is meant to clarify
some of the features.
The only reference to this subject is included in the VERITAS Volume Manager™ 4.0
Administrator’s Guide for Solaris.
The table below is from the Disk group support section.
Previous
Disk Group Version
Version Features
New Features Supported Supported
110 Cross-platform Data Sharing (CDS) 20, 30, 40, 50, 60,
70, 80, 90
Device Discovery Layer (DDL) 2.0
Disk Group Configuration Backup and Restore
Elimination of rootdg as a Special Disk Group
Full-Sized and Space-Optimized Instant Snapshots
Intelligent Storage Provisioning (ISP)
Serial Split Brain Detection
Volume Sets (Multiple Device Support for VxFS)
For more detail you can download the
VERITAS Volume Manager (tm) 4.0 Administrator's Guide for Solaris
From the following URL:
Public Link: http://support.veritas.com/docs/<265469>
When dealing with issues where the ssb id is referenced , you need to look at
the following 2 outputs;
# vxdg list sharedg
Group: sharedg
dgid: 1105340390.50.jacaranda
import-id: 33792.53
flags: shared
version: 110
alignment: 8192 (bytes)
local-activation: shared-write
cluster-actv-modes: jacaranda=sw sundar=sw
ssb: on
detach-policy: global
copies: nconfig=default nlog=default
config: seqno=0.1089 permlen=0 free=0 templen=0 loglen=0
# vxdisk list sdc3t12d0s5

Device: sdc3t12d0s5
devicetag: sdc3t12d0s5
type: simple
clusterid: sunjac
disk: name=sharedg125 id=1104366468.5014.jacaranda
group: name=sharedg id=1105340390.50.jacaranda
flags: online ready private foreign shared autoimport imported
pubpaths: block=/dev/simple/sdc3t12d0s5 char=/dev/rsimple/sdc3t12d0s5
version: 2.2
public: slice=0 offset=1025 len=4194551 disk_offset=4199692
private: slice=0 offset=1 len=1024 disk_offset=4199692
ssb: actual_seqno=0.0
headers: 0 248
Defined regions:
Case Example
new_quest# ./vxreattach -r c2t0d0
VxVM vxdg ERROR V-5-1-10127 associating disk-media vistadg101 with c2t0d0s2:
vxdiskadm option 5 does not list disk thus cannot re-attach it.
cu is using vxvm 4.0 and the disks were initialized using the cds format see below.
Device: c2t0d0s2
devicetag: c2t0d0
type: auto
hostid: quest
disk: name= id=1086208967.1288.quest
group: name=vistadg1 id=1086209872.2043.quest
info: format=cdsdisk,privoffset=256,pubslice=2,privslice=2 <---cds format
flags: online ready private autoconfig autoimport
version: 3.1
public: slice=2 offset=2304 len=8391936 disk_offset=0
private: slice=2 offset=256 len=2048 disk_offset=0
ssb: actual_seqno=0.0
headers: 0 240
Defined regions:
lockrgn priv 001536-001679[000144]: part=00 offset=000000
Multipathing information:
numpaths: 1
c2t0d0s2 state=enabled
Document Audience: INTERNAL

Document ID: 64313344
Title: NCN/G/vxvm4.0/ cannot re-attach disks/Serial Split Brain detected : Sun Fire 880
Synopsis: NCN/G/vxvm4.0/ cannot re-attach disks/Serial Split Brain detected; Solution: walked cu
thru procedure/see sol notes
Update Date: Wed Oct 20 21:40:29 MDT 2004
Case Number: 64313344

Geo Code (AMER, APAC, EMEA): AMER
Synopsis: NCN/G/vxvm4.0/ cannot re-attach disks/Serial Split Brain detected
Product: Sun Fire 880
HW Platform: Sun Fire 880
OS Version: Solaris 9 (S9)
Engineer: JOE KAUFMANN
Engineer Phone: 781-442-8635
Open Date: 20-Oct-2004 15:57:51
Close Date: 20-Oct-2004 21:40:08

Resolution: walked cu thru procedure/see sol notes
Disk group: kintanadg
dg kintanadg default default 28000 1087515421.304.quest

dm kintanadg01 c3t0d100s2 auto 2048 8391936 -
dm kintana06 c2t0d147s2 auto 2048 8391936 -
dm kintana08 - - - - NODEVICE
.
v vol01 - ENABLED ACTIVE 17825792 SELECT - fsgen
pl vol01-01 vol01 ENABLED ACTIVE 17825792 CONCAT - RW
sd kintanadg01-01 vol01-01 kintanadg01 0 8391936 0 c3t0d100 ENA
v vol02 - DISABLED ACTIVE 54525952 SELECT - fsgen

pl vol02-01 vol02 DISABLED NODEVICE 54525952 CONCAT - RW
sd kintana08-01 vol02-01 kintana08 0 8391936 0 - NDEV
sd kintana09-01 vol02-01 kintana09 0 8391936 8391936 c2t0d127 ENA


Solution
procedure to get cu back volumes back:
1. vxdg list vistadg1
config disk c2t0d12s2 copy 1 len=1280 state=clean online <--- located a clean
online disk
2. vxdisk list c2t0d12s2 <---- do a vxdisk list on clean online disk and get
disk id#
e.g. disk id=1086209041.1375.quest
3. vxdg deport vistadg1 (deport entire disk group, make sure all volumes are
unmounted first)
4. vxdg -o selectcp=(diskidstring from above) import dgname <--- import the

diskgroup using the clean diskid#
e.g. vxdg -o selectcp=1086209041.1375.quest import vistadg1
5. force start each individual volumes of the diskgroup
e.g. vxvol -f -g vistadg1 start vol01
6. done
Many Thanks to All Who Helped (review &
contribute), including:
- Dave Graham
- Mike Young
- Spencer Borck
- Larry Tyburczy
- Joel Garrett
- Jeff Huff

VXVM Troubleshooting Toi

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

VXVM Troubleshooting Toi

Загружено:

Авторское право:

Доступные форматы

Veritas Volume Manager 3.X/4.

This TOI addresses Veritas Volume Manager 3.X/4.X troubleshooting

• Objective 1 Booting/Veritas Not Starting Issues

2. Isolate issue (OS, Hardware or VxVM) by bypassing VxVM (Basic Unencapsulation)

- reset the system

- mount the root file system to /a

From which disk?

- make backup copies of following files

- edit system file

- edit vfstab file converting it from volume to partition based

How can we verify vfstab.prevm is accurate? What c#t#d# should be used?

- Create a file called /a/etc/vx/reconfig.d/state.d/install-db.

What other file/directory can keep VxVM from starting?

- cd; umount /a; fsck the OS file systems on this disk

- vxiod set 10; vxconfigd -d; vxdctl init; vxdctl enable

- cd /etc/vx; diff volboot volboot.org

- vxdctl mode to verify if VxVM started

What's the difference between offlining, disassociating, and detaching mirrors?

- vxdisk -o alldgs -e list # to check disk status

Example, if the boot disk is c0t0d0 with a vxprint output as follows:

- In this case the rootvol-02 plex should be offlined as it resides on c0t1d0:

7a. If system boots up:

- verify all file systems mounted and system is operating correctly

Document ID:17461Title:Veritas Volume Manager: How to log error messages

Document ID: 243150

vxexplorer: How to download, execute, and send it to VERITAS Technical Support

- cd /; ls -l | grep -i vx # if see a /VXVM_.... type directory rename it to old.VXVM

2. If these files are not there check for:

- disks with private region

You can try this script as well.

Document ID:17461Title:Veritas Volume Manager: How to log error messages

Document ID: 243150

vxexplorer: How to download, execute, and send it to VERITAS Technical Support

What would you do to fix this?

What would you do to keep this from happening again?

"The file just loaded does not appear to be executable"

What would you do to get around this?

What would you do to keep this from happening again?

NOTE: see following page for needed outputs

Searching for disks...done

Specify disk (enter its number):

# vxdg -g rootdg -k adddisk rootdisk=c1t0d0s2

How would you fix this?

NOTE: see following pages for needed outputs

dg rootdg default default 0 992973065.1025.ralph

dm disk01 c1t1d0s2 sliced 2888 71124291 -

sd c1t0d0s2Priv - rootdisk 4194828 2888 PRIVATE - RMOV

v crash - ENABLED ACTIVE 4194828 ROUND - fsgen

v home - ENABLED ACTIVE 44057250 ROUND - fsgen

v rootvol - ENABLED ACTIVE 4194828 ROUND - root

v swapvol - ENABLED ACTIVE 16579971 ROUND - swap

v var - ENABLED ACTIVE 2097414 ROUND - fsgen

#Mirror disk partition

* /dev/rdsk/c1t1d0s2 partition map

Why? duplicate entries in vxdisk list

Same customer reboots but system fails to come up.

Why? boot-device is set to wwn of replaced disk

What would you do to get around this?boot off mirror

# luxadm -v set_boot_dev /devices//pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w2100002037e3d51b,0:a

Current boot-device = /pci@8,600000/SUNW,qlc@4/fp@0,0/disk@w2100002037e3e93e,0:a disk

# vxdg -g rootdg -k adddisk rootdisk=c1t0d0s2

vxvm:vxdg: ERROR: associating disk-media rootdisk with c1t0d0s2:

Disk public region is too small