Вы находитесь на странице: 1из 12

Recovering a Lost LVM Volume Disk

Novell Cool Solutions: AppNote


By Jason Record
Digg This - Slashdot This
Posted: 27 Jul 2007
Contents:

Overview
Server Configuration
Disk Belonging to a Volume Group Removed
Corrupted LVM Meta Data
Disk Permanently Removed
Conclusion

Overview
Logical Volume Management (LVM) provides a high level, flexible view of a server's disk
storage. Though robust, problems can occur. The purpose of this document is to review the
recovery process when a disk is missing or damaged, and then apply that process to plausible
examples. When a disk is accidentally removed or damaged in some way that adversely affects
the logical volume, the general recovery process is:
1.
2.
3.
4.

Replace the failed or missing disk


Restore the missing disk's UUID
Restore the LVM meta data
Repair the file system on the LVM device

The recovery process will be demonstrated in three specific cases:


1. A disk belonging to a logical volume group is removed from the server
2. The LVM meta data is damaged or corrupted
3. One disk in a multi-disk volume group has been permanently removed
This article discusses how to restore the LVM meta data. This is a risky proposition. If you
restore invalid information, you can loose all the data on the LVM device. An important part of
LVM recovery is having backups of the meta data to begin with, and knowing how it's supposed
to look when everything is running smoothly. LVM keeps backup and archive copies of it's meta
data in /etc/lvm/backup and /etc/lvm/archive. Backup these directories regularly, and be familiar
with their contents. You should also manually backup the LVM meta data with vgcfgbackup
before starting any maintenance projects on your LVM volumes.

If you are planning on removing a disk from the server that belongs to a volume group, you
should refer to the LVM HOWTO before doing so.

Server Configuration
In all three examples, a server with SUSE Linux Enterprise Server 10 with Service Pack 1
(SLES10 SP1) will be used with LVM version 2. The examples will use a volume group called
"sales" with a linear logical volume called "reports". The logical volume and it's mount point are
shown below. You will need to substitute your mount points and volume names as needed to
match your specific environment.
ls-lvm:~ # cat /proc/partitions
major minor #blocks name
8
8
8
8
8
8
8
8

0
1
2
3
5
16
32
48

ls-lvm:~ #
Physical
Physical
Physical
Physical

4194304
514048
1052257
1
248976
524288
524288
524288

sda
sda1
sda2
sda3
sda5
sdb
sdc
sdd

pvcreate /dev/sda5 /dev/sd[b-d]


volume "/dev/sda5" successfully created
volume "/dev/sdb" successfully created
volume "/dev/sdc" successfully created
volume "/dev/sdd" successfully created

ls-lvm:~ # vgcreate sales /dev/sda5 /dev/sd[b-d]


Volume group "sales" successfully created
ls-lvm:~ # lvcreate -n reports -L +1G sales
Logical volume "reports" created
ls-lvm:~ # pvscan
PV /dev/sda5
VG sales
lvm2
PV /dev/sdb
VG sales
lvm2
PV /dev/sdc
VG sales
lvm2
PV /dev/sdd
VG sales
lvm2
Total: 4 [1.72 GB] / in use: 4

[240.00 MB / 240.00 MB free]


[508.00 MB / 0
free]
[508.00 MB / 0
free]
[508.00 MB / 500.00 MB free]
[1.72 GB] / in no VG: 0 [0
]

ls-lvm:~ # vgs
VG
#PV #LV #SN Attr
VSize VFree
sales
4
1
0 wz--n- 1.72G 740.00M
ls-lvm:~ # lvs
LV
VG
Attr
LSize Origin Snap%
reports sales -wi-ao 1.00G

Move Log Copy%

ls-lvm:~ # mount | grep sales


/dev/mapper/sales-reports on /sales/reports type ext3 (rw)
ls-lvm:~ # df -h /sales/reports

Filesystem
Size
/dev/mapper/sales-reports
1008M

Used Avail Use% Mounted on


33M

925M

4% /sales/reports

Disk Belonging to a Volume Group Removed


Removing a disk, belonging to a logical volume group, from the server may sound a bit strange,
but with Storage Area Networks (SAN) or fast paced schedules, it happens.
Symptom:
The first thing you may notice when the server boots are messages like:
"Couldn't find all physical volumes for volume group sales."
"Couldn't find device with uuid '56pgEk-0zLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'."
'Volume group "sales" not found'

If you are automatically mounting /dev/sales/reports, then the server will fail to boot and prompt
you to login as root to fix the problem.

1. Type root's password.


2. Edit the /etc/fstab file.
3. Comment out the line with /dev/sales/report

4. Reboot
The LVM symptom is a missing sales volume group. Typing cat /proc/partitions
confirms the server is missing one of it's disks.
ls-lvm:~ # cat /proc/partitions
major minor #blocks name
8
8
8
8
8
8
8

0
1
2
3
5
16
32

4194304
514048
1052257
1
248976
524288
524288

sda
sda1
sda2
sda3
sda5
sdb
sdc

ls-lvm:~ # pvscan
Couldn't find device with uuid
Couldn't find device with uuid
Couldn't find device with uuid
Couldn't find device with uuid
Couldn't find device with uuid
Couldn't find device with uuid
PV /dev/sda5
VG sales
PV /dev/sdb
VG sales
PV unknown device
VG sales
PV /dev/sdc
VG sales
Total: 4 [1.72 GB] / in use: 4

'56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'.
'56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'.
'56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'.
'56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'.
'56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'.
'56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'.
lvm2 [240.00 MB / 240.00 MB free]
lvm2 [508.00 MB / 0
free]
lvm2 [508.00 MB / 0
free]
lvm2 [508.00 MB / 500.00 MB free]
[1.72 GB] / in no VG: 0 [0
]

Solution:
1.
2.
3.
4.
5.

Fortunately, the meta data and file system on the disk that was /dev/sdc are intact.
So the recovery is to just put the disk back.
Reboot the server.
The /etc/init.d/boot.lvm start script will scan and activate the volume group at boot time.
Don't forget to uncomment the /dev/sales/reports device in the /etc/fstab file.

If this procedure does not work, then you may have corrupt LVM meta data.

Corrupted LVM Meta Data


The LVM meta data does not get corrupted very often; but when it does, the file system on the
LVM logical volume should also be considered unstable. The goal is to recover the LVM
volume, and then check file system integrity.
Symptom 1:
Attempting to activate the volume group gives the following:
ls-lvm:~ # vgchange -ay sales
/dev/sdc: Checksum error

/dev/sdc: Checksum error


/dev/sdc: Checksum error
/dev/sdc: Checksum error
/dev/sdc: Checksum error
/dev/sdc: Checksum error
/dev/sdc: Checksum error
/dev/sdc: Checksum error
/dev/sdc: Checksum error
/dev/sdc: Checksum error
/dev/sdc: Checksum error
/dev/sdc: Checksum error
/dev/sdc: Checksum error
Couldn't read volume group metadata.
Volume group sales metadata is inconsistent
Volume group for uuid not found:
m4Cg2vkBVSGe1qSMNDf63v3fDHqN4uEkmWoTq5TpHpRQwmnAGD18r44OshLdHj05
0 logical volume(s) in volume group "sales" now active

This symptom is the result of a minor change in the meta data. In fact, only three bytes were
overwritten. Since only a portion of the meta data was damaged, LVM can compare it's internal
check sum against the meta data on the device and know it's wrong. There is enough meta data
for LVM to know that the "sales" volume group and devices exit, but are unreadable.
ls-lvm:~ # pvscan
/dev/sdc: Checksum error
/dev/sdc: Checksum error
/dev/sdc: Checksum error
/dev/sdc: Checksum error
/dev/sdc: Checksum error
/dev/sdc: Checksum error
/dev/sdc: Checksum error
/dev/sdc: Checksum error
PV /dev/sda5
VG sales
lvm2
PV /dev/sdb
VG sales
lvm2
PV /dev/sdc
VG sales
lvm2
PV /dev/sdd
VG sales
lvm2
Total: 4 [1.72 GB] / in use: 4

[240.00 MB / 240.00 MB free]


[508.00 MB / 0
free]
[508.00 MB / 0
free]
[508.00 MB / 500.00 MB free]
[1.72 GB] / in no VG: 0 [0
]

Notice pvscan shows all devices present and associated with the sales volume group. It's not
the device UUID that is not found, but the volume group UUID.
Solution 1:
1. Since the disk was never removed, leave it as is.
2. There were no device UUID errors, so don't attempt to restore the UUIDs.
3. This is a good candidate to just try restoring the LVM meta data.
ls-lvm:~ # vgcfgrestore sales
/dev/sdc: Checksum error
/dev/sdc: Checksum error
Restored volume group sales
ls-lvm:~ # vgchange -ay sales

1 logical volume(s) in volume group "sales" now active


ls-lvm:~ # pvscan
PV /dev/sda5
VG sales
lvm2
PV /dev/sdb
VG sales
lvm2
PV /dev/sdc
VG sales
lvm2
PV /dev/sdd
VG sales
lvm2
Total: 4 [1.72 GB] / in use: 4

[240.00 MB / 240.00 MB free]


[508.00 MB / 0
free]
[508.00 MB / 0
free]
[508.00 MB / 500.00 MB free]
[1.72 GB] / in no VG: 0 [0
]

4. Run a file system check on /dev/sales/reports.


ls-lvm:~ # e2fsck /dev/sales/reports
e2fsck 1.38 (30-Jun-2005)
/dev/sales/reports: clean, 961/131072 files, 257431/262144 blocks
ls-lvm:~ # mount /dev/sales/reports /sales/reports/
ls-lvm:~ # df -h /sales/reports/
Filesystem
Size Used Avail Use% Mounted on
/dev/mapper/sales-reports
1008M 990M
0 100% /sales/reports

Symptom 2:
Minor damage to the LVM meta data is easily fixed with vgcfgrestore. If the meta data is gone,
or severely damaged, then LVM will consider that disk as an "unknown device." If the volume
group contains only one disk, then the volume group and it's logical volumes will simply be
gone. In this case the symptom is the same as if the disk was accidentally removed, with the
exception of the device name. Since /dev/sdc was not actually removed from the server, the
devices are still labeled a through d.
ls-lvm:~ # pvscan
Couldn't find device with uuid
Couldn't find device with uuid
Couldn't find device with uuid
Couldn't find device with uuid
Couldn't find device with uuid
Couldn't find device with uuid
PV /dev/sda5
VG sales
PV /dev/sdb
VG sales
PV unknown device
VG sales
PV /dev/sdd
VG sales
Total: 4 [1.72 GB] / in use: 4

'56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'.
'56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'.
'56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'.
'56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'.
'56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'.
'56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'.
lvm2 [240.00 MB / 240.00 MB free]
lvm2 [508.00 MB / 0
free]
lvm2 [508.00 MB / 0
free]
lvm2 [508.00 MB / 500.00 MB free]
[1.72 GB] / in no VG: 0 [0
]

Solution 2:
1. First, replace the disk. Most likely the disk is already there, just damaged.
2. Since the UUID on /dev/sdc is not there, a vgcfgrestore will not work.
ls-lvm:~ # vgcfgrestore sales
Couldn't find device with uuid '56ogEk-OzLS-cKBc-z9vJ-kP65-DUBIhwZPSu'.

Couldn't find all physical volumes for volume group sales.


Restore failed.

3. Comparing the output of cat /proc/partitions and pvscan shows the missing
device is /dev/sdc, and pvscan shows which UUID it needs for that device. So, copy
and paste the UUID that pvscan shows for /dev/sdc.
ls-lvm:~ # pvcreate --uuid 56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu
/dev/sdc
Physical volume "/dev/sdc" successfully created

4. Restore the LVM meta data


ls-lvm:~ # vgcfgrestore sales
Restored volume group sales
ls-lvm:~ # vgscan
Reading all physical volumes. This may take a while...
Found volume group "sales" using metadata type lvm2
ls-lvm:~ # vgchange -ay sales
1 logical volume(s) in volume group "sales" now active

5. Run a file system check on /dev/sales/reports.


ls-lvm:~ # e2fsck /dev/sales/reports
e2fsck 1.38 (30-Jun-2005)
/dev/sales/reports: clean, 961/131072 files, 257431/262144 blocks
ls-lvm:~ # mount /dev/sales/reports /sales/reports/
ls-lvm:~ # df -h /sales/reports
Filesystem
Size Used Avail Use% Mounted on
/dev/mapper/sales-reports
1008M 990M
0 100% /sales/reports

Disk Permanently Removed


This is the most severe case. Obviously if the disk is gone and unrecoverable, the data on that
disk is likewise unrecoverable. This is a great time to feel good knowing you have a solid backup
to rely on. However, if the good feelings are gone, and there is no backup, how do you recover as
much data as possible from the remaining disks in the volume group? No attempt will be made to
address the data on the unrecoverable disk; this topic will be left to the data recovery experts.
Symptom:
The symptom will be the same as Symptom 2 in the Corrupted LVM Meta Data section above.
You will see errors about an "unknown device" and missing device with UUID.
Solution:

1. Add a replacement disk to the server. Make sure the disk is empty.
2. Create the LVM meta data on the new disk using the old disk's UUID that pvscan
displays.
ls-lvm:~ # pvcreate --uuid 56ogEk-OzLS-cKBc-z9vJ-kP65-DUBI-hwZPSu
/dev/sdc
Physical volume "/dev/sdc" successfully created

3. Restore the backup copy of the LVM meta data for the sales volume group.
ls-lvm:~ # vgcfgrestore sales
Restored volume group sales
ls-lvm:~ # vgscan
Reading all physical volumes. This may take a while...
Found volume group "sales" using metadata type lvm2
ls-lvm:~ # vgchange -ay sales
1 logical volume(s) in volume group "sales" now active

4. Run a file system check to rebuild the file system.


ls-lvm:~ # e2fsck -y /dev/sales/reports
e2fsck 1.38 (30-Jun-2005)
--snip-Free inodes count wrong for group #5 (16258, counted=16384).
Fix? yes
Free inodes count wrong (130111, counted=130237).
Fix? yes
/dev/sales/reports: ***** FILE SYSTEM WAS MODIFIED *****
/dev/sales/reports: 835/131072 files (5.7% non-contiguous),
137213/262144 blocks

5. Mount the file system and recover as much data as possible.


6. NOTE: If the missing disk contains the beginning of the file system, then the file system's
superblock will be missing. You will need to rebuild or use an alternate superblock.
Restoring a file system superblock is outside the scope of this article, please refer to your
file system's documentation.

Conclusion
LVM by default keeps backup copies of it's meta data for all LVM devices. These backup files
are stored in /etc/lvm/backup and /etc/lvm/archive. If a disk is removed or the meta data gets
damaged in some way, it can be easily restored, if you have backups of the meta data. This is
why it is highly recommended to never turn off LVM's auto backup feature. Even if a disk is
permanently removed from the volume group, it can be reconstructed, and often times the
remaining data on the file system recovered.

Linux SAN Multipathing using device


mapper
Posted by Parthiban Ponnusamy
There are a lot of SAN multipathing solutions on Linux at the moment. Two of them are discussesed in this blog. The
first one is device mapper multipathing that is a failover and load balancing solution with a lot of configuration
options. The second one (mdadm multipathing) is just a failover solution with manuel re-anable of a failed path. The
advantage of mdadm multiphating is that it is very easy to configure.
Before using a multipathing solution for a production environment on Linux it is also important to determine if the
used solution is supportet with the used Hardware. For example HP doesnt support the Device Mapper Multipathing
solution on their servers yet.

Device Mapper Multipathing


Procedure for configuring the system with DM-Multipath:
1.
2.
3.
4.

Install device-mapper-multipath rpm


Edit the multipath.conf configuration file:
o comment out the default blacklist
o change any of the existing defaults as needed
Start the multipath daemons
Create the multipath device with the multipath

Install Device Mapper Multipath


# rpm -ivh device-mapper-multipath-0.4.7-8.el5.i386.rpm
warning: device-mapper-multipath-0.4.7-8.el5.i386.rpm: Header V3 DSA
signature:
Preparing...
###########################################
[100%]
1:device-mapper-multipath########################################### [100%]

Initial Configuration
Set user_friendly_name. The devices will be created as /dev/mapper/mpath[n]. Uncomment the blacklist.

# vim /etc/multipath.conf
#blacklist {
#
devnode "*"
#}

defaults {
user_friendly_names yes
path_grouping_policy multibus
}
Load the needed modul and the startup service.

# modprobe dm-multipath
# /etc/init.d/multipathd start
# chkconfig multipathd on
Print out the multipathed device.

# multipath -v2
or
# multipath -v3

Configuration
Configure device type in config file.

# cat /sys/block/sda/device/vendor
HP
# cat /sys/block/sda/device/model
HSV200
# vim /etc/multipath.conf
devices {
device {
vendor
product
path_grouping_policy
no_path_retry
}
}

"HP"
"HSV200"
multibus
"5"

Configure multipath device in config file.

# cat /var/lib/multipath/bindings
# Format:
# alias wwid
#
mpath0 3600508b400070aac0000900000080000
# vim /etc/multipath.conf
multipaths {
multipath {
wwid
alias
path_grouping_policy
path_checker
path_selector

3600508b400070aac0000900000080000
mpath0
multibus
readsector0
"round-robin 0"

failback
rr_weight
no_path_retry
}
}

"5"
priorities
"5"

Set not mutipathed devices on the blacklist. (f.e. local Raid-Devices, Volume Groups)

# vim /etc/multipath.conf
devnode_blacklist {
devnode "^cciss!c[0-9]d[0-9]*"
devnode "^vg*"
}
Show Configured Multipaths.

# dmsetup ls --target=multipath
mpath0 (253, 1)
# multipath -ll
mpath0 (3600508b400070aac0000900000080000) dm-1 HP,HSV200
[size=10G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=4][active]
\_ 0:0:0:1 sda 8:0
[active][ready]
\_ 0:0:1:1 sdb 8:16 [active][ready]
\_ 1:0:0:1 sdc 8:32 [active][ready]
\_ 1:0:1:1 sdd 8:48 [active][ready]

Format and mount Device


Fdisk cannot be used with /dev/mapper/[dev_name] devices. Use fdisk on the underlying disks and execute the
following command when device-mapper multipath maps the device to create a /dev/mapper/mpath[n] device for
the partition.

# fdisk /dev/sda
# kpartx -a /dev/mapper/mpath0
# ls /dev/mapper/*
mpath0 mpath0p1
# mkfs.ext3 /dev/mapper/mpath0p1
# mount /dev/mapper/mpath0p1 /mnt/san
After that /dev/mapper/mpath0p1 is the first partition on the multipathed device.

Multipathing with mdadm on Linux

The md multipathing solution is only a failover solution what means that only one path is used at one time and no
load balancing is made.
Start the MD Multipathing Service

# chkconfig mdmpd on
# /etc/init.d/mdmpd start
On the first Node (if it is a shared device)
Make Label on Disk

# fdisk /dev/sda
Disk /dev/sdt: 42.9 GB, 42949672960 bytes
64 heads, 32 sectors/track, 40960 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Device Boot
/dev/sdt1

Start
1

End
40960

Blocks
Id System
41943024
fd Linux raid autodetect

# partprobe
Bind multiple paths together

# mdadm --create /dev/md4 --level=multipath --raid-devices=4 /dev/sdq1


/dev/sdr1 /dev/sds1 /dev/sdt1
Get UUID

# mdadm --detail /dev/md4


UUID : b13031b5:64c5868f:1e68b273:cb36724e
Set md configuration in config file

# vim /etc/mdadm.conf
# Multiple Paths to RAC SAN
DEVICE /dev/sd[qrst]1
ARRAY /dev/md4 uuid=b13031b5:64c5868f:1e68b273:cb36724e
# cat /proc/mdstat
On the second Node (Copy the /etc/mdadm.conf from the first node)

# mdadm -As
# cat /proc/mdstat

Restore a failed path


# mdadm /dev/md1 -f /dev/sdt1 -r /dev/sdt1 -a /dev/sdt1

Вам также может понравиться