Академический Документы
Профессиональный Документы
Культура Документы
Overview
Server Configuration
Disk Belonging to a Volume Group Removed
Corrupted LVM Meta Data
Disk Permanently Removed
Conclusion
Overview
Logical Volume Management (LVM) provides a high level, flexible view of a server's disk
storage. Though robust, problems can occur. The purpose of this document is to review the
recovery process when a disk is missing or damaged, and then apply that process to plausible
examples. When a disk is accidentally removed or damaged in some way that adversely affects
the logical volume, the general recovery process is:
1.
2.
3.
4.
If you are planning on removing a disk from the server that belongs to a volume group, you
should refer to the LVM HOWTO before doing so.
Server Configuration
In all three examples, a server with SUSE Linux Enterprise Server 10 with Service Pack 1
(SLES10 SP1) will be used with LVM version 2. The examples will use a volume group called
"sales" with a linear logical volume called "reports". The logical volume and it's mount point are
shown below. You will need to substitute your mount points and volume names as needed to
match your specific environment.
ls-lvm:~ # cat /proc/partitions
major minor #blocks name
8
8
8
8
8
8
8
8
0
1
2
3
5
16
32
48
ls-lvm:~ #
Physical
Physical
Physical
Physical
4194304
514048
1052257
1
248976
524288
524288
524288
sda
sda1
sda2
sda3
sda5
sdb
sdc
sdd
ls-lvm:~ # vgs
VG
#PV #LV #SN Attr
VSize VFree
sales
4
1
0 wz--n- 1.72G 740.00M
ls-lvm:~ # lvs
LV
VG
Attr
LSize Origin Snap%
reports sales -wi-ao 1.00G
Filesystem
Size
/dev/mapper/sales-reports
1008M
925M
4% /sales/reports
If you are automatically mounting /dev/sales/reports, then the server will fail to boot and prompt
you to login as root to fix the problem.
4. Reboot
The LVM symptom is a missing sales volume group. Typing cat /proc/partitions
confirms the server is missing one of it's disks.
ls-lvm:~ # cat /proc/partitions
major minor #blocks name
8
8
8
8
8
8
8
0
1
2
3
5
16
32
4194304
514048
1052257
1
248976
524288
524288
sda
sda1
sda2
sda3
sda5
sdb
sdc
ls-lvm:~ # pvscan
Couldn't find device with uuid
Couldn't find device with uuid
Couldn't find device with uuid
Couldn't find device with uuid
Couldn't find device with uuid
Couldn't find device with uuid
PV /dev/sda5
VG sales
PV /dev/sdb
VG sales
PV unknown device
VG sales
PV /dev/sdc
VG sales
Total: 4 [1.72 GB] / in use: 4
'56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'.
'56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'.
'56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'.
'56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'.
'56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'.
'56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'.
lvm2 [240.00 MB / 240.00 MB free]
lvm2 [508.00 MB / 0
free]
lvm2 [508.00 MB / 0
free]
lvm2 [508.00 MB / 500.00 MB free]
[1.72 GB] / in no VG: 0 [0
]
Solution:
1.
2.
3.
4.
5.
Fortunately, the meta data and file system on the disk that was /dev/sdc are intact.
So the recovery is to just put the disk back.
Reboot the server.
The /etc/init.d/boot.lvm start script will scan and activate the volume group at boot time.
Don't forget to uncomment the /dev/sales/reports device in the /etc/fstab file.
If this procedure does not work, then you may have corrupt LVM meta data.
This symptom is the result of a minor change in the meta data. In fact, only three bytes were
overwritten. Since only a portion of the meta data was damaged, LVM can compare it's internal
check sum against the meta data on the device and know it's wrong. There is enough meta data
for LVM to know that the "sales" volume group and devices exit, but are unreadable.
ls-lvm:~ # pvscan
/dev/sdc: Checksum error
/dev/sdc: Checksum error
/dev/sdc: Checksum error
/dev/sdc: Checksum error
/dev/sdc: Checksum error
/dev/sdc: Checksum error
/dev/sdc: Checksum error
/dev/sdc: Checksum error
PV /dev/sda5
VG sales
lvm2
PV /dev/sdb
VG sales
lvm2
PV /dev/sdc
VG sales
lvm2
PV /dev/sdd
VG sales
lvm2
Total: 4 [1.72 GB] / in use: 4
Notice pvscan shows all devices present and associated with the sales volume group. It's not
the device UUID that is not found, but the volume group UUID.
Solution 1:
1. Since the disk was never removed, leave it as is.
2. There were no device UUID errors, so don't attempt to restore the UUIDs.
3. This is a good candidate to just try restoring the LVM meta data.
ls-lvm:~ # vgcfgrestore sales
/dev/sdc: Checksum error
/dev/sdc: Checksum error
Restored volume group sales
ls-lvm:~ # vgchange -ay sales
Symptom 2:
Minor damage to the LVM meta data is easily fixed with vgcfgrestore. If the meta data is gone,
or severely damaged, then LVM will consider that disk as an "unknown device." If the volume
group contains only one disk, then the volume group and it's logical volumes will simply be
gone. In this case the symptom is the same as if the disk was accidentally removed, with the
exception of the device name. Since /dev/sdc was not actually removed from the server, the
devices are still labeled a through d.
ls-lvm:~ # pvscan
Couldn't find device with uuid
Couldn't find device with uuid
Couldn't find device with uuid
Couldn't find device with uuid
Couldn't find device with uuid
Couldn't find device with uuid
PV /dev/sda5
VG sales
PV /dev/sdb
VG sales
PV unknown device
VG sales
PV /dev/sdd
VG sales
Total: 4 [1.72 GB] / in use: 4
'56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'.
'56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'.
'56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'.
'56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'.
'56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'.
'56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'.
lvm2 [240.00 MB / 240.00 MB free]
lvm2 [508.00 MB / 0
free]
lvm2 [508.00 MB / 0
free]
lvm2 [508.00 MB / 500.00 MB free]
[1.72 GB] / in no VG: 0 [0
]
Solution 2:
1. First, replace the disk. Most likely the disk is already there, just damaged.
2. Since the UUID on /dev/sdc is not there, a vgcfgrestore will not work.
ls-lvm:~ # vgcfgrestore sales
Couldn't find device with uuid '56ogEk-OzLS-cKBc-z9vJ-kP65-DUBIhwZPSu'.
3. Comparing the output of cat /proc/partitions and pvscan shows the missing
device is /dev/sdc, and pvscan shows which UUID it needs for that device. So, copy
and paste the UUID that pvscan shows for /dev/sdc.
ls-lvm:~ # pvcreate --uuid 56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu
/dev/sdc
Physical volume "/dev/sdc" successfully created
1. Add a replacement disk to the server. Make sure the disk is empty.
2. Create the LVM meta data on the new disk using the old disk's UUID that pvscan
displays.
ls-lvm:~ # pvcreate --uuid 56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu
/dev/sdc
Physical volume "/dev/sdc" successfully created
3. Restore the backup copy of the LVM meta data for the sales volume group.
ls-lvm:~ # vgcfgrestore sales
Restored volume group sales
ls-lvm:~ # vgscan
Reading all physical volumes. This may take a while...
Found volume group "sales" using metadata type lvm2
ls-lvm:~ # vgchange -ay sales
1 logical volume(s) in volume group "sales" now active
Conclusion
LVM by default keeps backup copies of it's meta data for all LVM devices. These backup files
are stored in /etc/lvm/backup and /etc/lvm/archive. If a disk is removed or the meta data gets
damaged in some way, it can be easily restored, if you have backups of the meta data. This is
why it is highly recommended to never turn off LVM's auto backup feature. Even if a disk is
permanently removed from the volume group, it can be reconstructed, and often times the
remaining data on the file system recovered.
Initial Configuration
Set user_friendly_name. The devices will be created as /dev/mapper/mpath[n]. Uncomment the blacklist.
# vim /etc/multipath.conf
#blacklist {
#
devnode "*"
#}
defaults {
user_friendly_names yes
path_grouping_policy multibus
}
Load the needed modul and the startup service.
# modprobe dm-multipath
# /etc/init.d/multipathd start
# chkconfig multipathd on
Print out the multipathed device.
# multipath -v2
or
# multipath -v3
Configuration
Configure device type in config file.
# cat /sys/block/sda/device/vendor
HP
# cat /sys/block/sda/device/model
HSV200
# vim /etc/multipath.conf
devices {
device {
vendor
product
path_grouping_policy
no_path_retry
}
}
"HP"
"HSV200"
multibus
"5"
# cat /var/lib/multipath/bindings
# Format:
# alias wwid
#
mpath0 3600508b400070aac0000900000080000
# vim /etc/multipath.conf
multipaths {
multipath {
wwid
alias
path_grouping_policy
path_checker
path_selector
3600508b400070aac0000900000080000
mpath0
multibus
readsector0
"round-robin 0"
failback
rr_weight
no_path_retry
}
}
"5"
priorities
"5"
Set not mutipathed devices on the blacklist. (f.e. local Raid-Devices, Volume Groups)
# vim /etc/multipath.conf
devnode_blacklist {
devnode "^cciss!c[0-9]d[0-9]*"
devnode "^vg*"
}
Show Configured Multipaths.
# dmsetup ls --target=multipath
mpath0 (253, 1)
# multipath -ll
mpath0 (3600508b400070aac0000900000080000) dm-1 HP,HSV200
[size=10G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=4][active]
\_ 0:0:0:1 sda 8:0
[active][ready]
\_ 0:0:1:1 sdb 8:16 [active][ready]
\_ 1:0:0:1 sdc 8:32 [active][ready]
\_ 1:0:1:1 sdd 8:48 [active][ready]
# fdisk /dev/sda
# kpartx -a /dev/mapper/mpath0
# ls /dev/mapper/*
mpath0 mpath0p1
# mkfs.ext3 /dev/mapper/mpath0p1
# mount /dev/mapper/mpath0p1 /mnt/san
After that /dev/mapper/mpath0p1 is the first partition on the multipathed device.
The md multipathing solution is only a failover solution what means that only one path is used at one time and no
load balancing is made.
Start the MD Multipathing Service
# chkconfig mdmpd on
# /etc/init.d/mdmpd start
On the first Node (if it is a shared device)
Make Label on Disk
# fdisk /dev/sda
Disk /dev/sdt: 42.9 GB, 42949672960 bytes
64 heads, 32 sectors/track, 40960 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Device Boot
/dev/sdt1
Start
1
End
40960
Blocks
Id System
41943024
fd Linux raid autodetect
# partprobe
Bind multiple paths together
# vim /etc/mdadm.conf
# Multiple Paths to RAC SAN
DEVICE /dev/sd[qrst]1
ARRAY /dev/md4 uuid=b13031b5:64c5868f:1e68b273:cb36724e
# cat /proc/mdstat
On the second Node (Copy the /etc/mdadm.conf from the first node)
# mdadm -As
# cat /proc/mdstat