Академический Документы
Профессиональный Документы
Культура Документы
Lack a features ZFS developers knows as "Block Pointer rewrite functionality" (planned
to be developed), without it ZFS suffers of currently not being able to:
o Pool defragmentation (COW techniques used in ZFS mitigates the problem)
o
o
o
Pool resizing
Data compression (re-applying)
Adding an additional device in a RAID-Z/Z2/Z3 pool to increase it size (however,
it is possible to replace in sequence each one of the disks composing a RAIDZ/Z2/Z3)
NOT A CLUSTERED FILESYSTEM like Lustre, GFS or OCFS2
No data healing if used on a single device (corruption can still be detected), workaround
if to force a data duplication on the drive
No support of TRIMming (SSD devices)
Solaris/OpenIndiana
Oracle Solaris: remains the de facto reference platform for ZFS implementation: ZFS on
this platform is now considered as mature and usable on production systems. Solaris 11
uses ZFS even for its "system" pool (aka rpool). A great advantage of this: it is now quite
easy to revert the effect of a patch at the condition a snapshot has been taken just before
applying it. In the "old good" times of Solaris 10 and before, reverting a patch was
possible but could be tricky and complex when possible. ZFS is far from being new in
Solaris as it takes its roots in 2005 to be, then, integrated in Solaris 10 6/06 introduced in
June 2006.
the same code base, however, they now follows a different path since Oracle announced
the discontinuation of OpenSolaris (August 13th 2010). Like Oracle Solaris, OpenIndiana
uses ZFS for its system pool. The illumos kernel ZFS support lags a bit behind Oracle: it
supports zpool version 28 where as Oracle Solaris 11 has zpool version 31 support, data
encryption being supported at zpool version 30.
*BSD
FreeBSD: ZFS is present in FreeBSD since FreeBSD 7 (zpool version 6) and FreeBSD
can boot on a ZFS volume (zfsboot). ZFS support has been vastly enhanced in FreeBSD
8.x (8.2 supports zpool version 15, version 8.3 supports version 28), FreeBSD 9 and
FreeBSD 10 (both supports zpool version 28). ZFS in FreeBSD is now considered as
fully functional and mature. FreeBSD derivatives such as the popular FreeNAS takes
befenits of ZFS and integrated it in their tools. In the case of that latter, it have, for
example, supports for zvol though its Web management interface (FreeNAS >= 8.0.1).
NetBSD: ZFS has been started to be ported as a GSoC project in 2007 and is present in
the NetBSD mainstream since 2009 (zpool version 13).
OpenBSD: No ZFS support yet and not planned until Oracle changes some policies
according to the project FAQ.
ZFS alternatives
WAFL seems to have severe limitation [1] (document is not dated), also an interesting
article lies here
BTRFS is advancing every week but it still lacks such features like the capability of
emulating a virtual block device over a storage pool (zvol) and built-in support for RAID5/6 is not complete yet (cf. Btrfs mailing list). At date of writing, it is still experimental
where as ZFS is used on big production servers.
VxFS has also been targeted by comparisons like this one (a bit controversial). VxFS has
been known in the industry since 1993 and is known for its legendary flexibility.
Symantec acquired VxFS and proposed a basic version (no clustering for example) of it
under the same Veritas Storage Foundation Basic
An interesting discussion about modern filesystems can be found on OSNews.com
ZFS BTRFS
Remarks
YES YES
NO YES
YES YES
YES YES
YES YES
YES YES
YES YES
While ZFS knows where and how to rollback the data (online), BTRFS requires a bit more work from the system
administrator (off-line).
YES NO
YES YES
Data blocks
reoptimization
NO YES
Built-in data
redundancy support
YES YES
Management by
attributes
YES NO
Production quality
code
NO NO
NO YES
Underscore (_)
Hyphen (-)
Colon (:)
Period (.)
The name used to designate a ZFS pool has no particular restriction except:
zpool
What it is
A group of one or many physical storage media (hard
drive partition, file...). A zpool has to be divided in at least
one ZFS dataset or at least one zvol to hold any data.
Several zpools can coexists in a system at the condition
they each hold a unique name. Also note that zpools can
never be mounted, the only things that can are the ZFS
datasets they hold.
Counterparts examples
Logical subvolumes
(LV) in LVM
formatted with a
filesystem like ext3.
BTRFS subvolumes
No direct equivalent
in LVM.
BTRFS read-only
snapshots
LVM snapshots
zvol
BTRFS snapshots
Preparing
Once your have emerged sys-fs/zfs and sys-fs/zfs-kmod you have two options to start using ZFS
at this point :
Either you start /etc/init.d/zfs (will load all of the zfs kernel modules for you plus a couple
of other things)
Either you load the zfs kernel modules by hand (will load all of the zfs kernel modules
for you)
So :
# rc-service zfs start
Or:
# modprobe zfs
# lsmod | grep zfs
zfs
874072
zunicode
328120
1 zfs
zavl
12997
1 zfs
zcommon
35739
1 zfs
znvpair
48570
2 zfs,zcommon
spl
58011
5 zfs,zavl,zunicode,zcommon,znvpair
Then let's see what loopback devices are in use and which is the first free:
# losetup -a
# losetup -f
/dev/loop0
In the above example nothing is used and the first available loopback device is /dev/loop0. Now
associate all of the disks with a loopback device (/tmp/zfs-test-disk00.img -> /dev/loop/0,
/tmp/zfs-test-disk01.img -> /dev/loop/1 and so on):
# for i in 0 1 2 3; do losetup /dev/loop${i} /tmp/zfs-test-disk0${i}.img; done
# losetup -a
/dev/loop0: [000c]:781455 (/tmp/zfs-test-disk00.img)
/dev/loop1: [000c]:806903 (/tmp/zfs-test-disk01.img)
/dev/loop2: [000c]:807274 (/tmp/zfs-test-disk02.img)
/dev/loop3: [000c]:781298 (/tmp/zfs-test-disk03.img)
Note:
ZFS literature often names zpools "tank", this is not a requirement you can use whatever
name of you choice (as we did here...)
Every story in ZFS takes its root with a the very first ZFS related command you will be in touch
with: zpool. zpool as you might guessed manages all ZFS aspects in connection with the
physical devices underlying your ZFS storage spaces and the very first task is to use this
command to make what is called a pool (if you have used LVM before, volume groups can be
seen as a counter part). Basically what you will do here is to tell ZFS to take a collection of
physical storage stuff which can take several forms like a hard drive partition, a USB key
partition or even a file and consider all of them as a single pool of storage (we will subdivide it in
following paragraphs). No black magic here, ZFS will write some metadata on them behind the
scene to be able to track which physical device belongs to what pool of storage.
And.. nothing! Nada! The command silently returned but it did something, the next section will
explain what.
SIZE
ALLOC
FREE
CAP
DEDUP
HEALTH
ALTROOT
7.94G
130K
7.94G
0%
1.00x
ONLINE
What does this mean? Several things: First, your zpool is here and has a size of, roughly, 8 Go
minus some space eaten by some metadata. Second is is actually usable because the column
HEALTH says ONLINE. Other columns are not meaningful for us for the moment just ignore
them. If want more crusty details you can use the zpool command like this:
# zpool status
pool: myfirstpool
state: ONLINE
scan: none requested
config:
NAME
myfirstpool
STATE
ONLINE
loop0
ONLINE
loop1
ONLINE
loop2
ONLINE
loop3
ONLINE
Information is quite intuitive: your pool is seen as being usable (state is similar to HEALTH) and
is composed of several devices each one listed as being in a healthy state ... at least for now
because they will be salvaged for demonstration purpose in a later section. For your information
the columns READ,WRITE and CKSUM list the number of operation failures on each of the
devices respectfully:
READ for reading failures. Having a non-zero value is not a good sign... the device is
clunky and will soon fail.
WRITE for writing failures. Having a non-zero value is not a good sign... the device is
clunky and will soon fail.
CKSUM for mismatch between the checksum of the data at the time is had been written
and how it has been recomputed when read again (yes, ZFS uses checksums in a
agressive manner). Having a non-zero value is not a good sign... corruption happened,
ZFS will do its best to recover data by its own but this is definitely not a good sign of a
healthy system.
Cool! So far so good you have a new 8 Gb usable brand new storage space on you system. Has
been mounted somewhere?
# mount | grep myfirstpool
/myfirstpool on /myfirstpool type zfs (rw,xattr)
Remember the tables in the section above? A zpool in itself can never be mounted, never ever.
It is just a container where ZFS datasets are created then mounted. So what happened here?
Obscure black magic? No, of course not! Indeed a ZFS dataset named after the zpool's name
should have been created automatically for us then mounted. Is is true? We will check this
shortly. For the moment you will be introduced with the second command you will deal with
when using ZFS : zfs. While the zpool command is used with anything related to zpools, the zfs
is used to anything related to ZFS datasets (a ZFS dataset always resides in a zpool, always no
exception on that).
Note:
zfs and zpool commands are the two only ones you will need to remember when dealing
with ZFS.
So how can we check what ZFS datasets are currently known by the system? As you might
already guessed like this:
# zfs list
NAME
USED
AVAIL
REFER
myfirstpool
114K
7.81G
30K
MOUNTPOINT
/myfirstpool
Tala! The mystery is busted! the zfs command tells us that not only a ZFS dataset named
myfirstpool has been created but also it has been mounted in the system's VFS for us. If you
check with the df command, you should also see something like this:
# df -h
Filesystem
Size
(...)
myfirstpool
7.9G
7.9G
0% /myfirstpool
The $100 question:"what to do with this band new ZFS /myfirstpool dataset ?". Copy some files
on it of course! We used a Linux kernel source but you can of course use whatever you want:
# cp -a /usr/src/linux-3.13.5-gentoo /myfirstpool
# ln -s /myfirstpool/linux-3.13.5-gentoo /myfirstpool/linux
# ls -lR /myfirstpool
/myfirstpool:
total 3
lrwxrwxrwx
/myfirstpool/linux-3.13.5-gentoo:
total 31689
-rw-r--r--
1 root root
-rw-r--r--
1 root root
1 root root
-rw-r--r--
1 root root
-rw-r--r--
1 root root
(...)
A ZFS dataset behaves like any other filesystem: you can create regular files, symbolic links,
pipes, special devices nodes, etc. Nothing mystic here.
Now we have some data in the ZFS dataset let's see what various commands report:
# df -h
Filesystem
Size
7.9G
850M
(...)
myfirstpool
7.0G
11% /myfirstpool
# zfs list
NAME
USED
AVAIL
REFER
MOUNTPOINT
myfirstpool
850M
6.98G
850M
SIZE
ALLOC
FREE
CAP
DEDUP
HEALTH
ALTROOT
7.94G
850M
7.11G
10%
1.00x
ONLINE
/myfirstpool
# zpool list
NAME
myfirstpool
Note:
Notice the various sizes reported by zpool and zfs commands. In this case it is the same
however it can differ, this is true especially with zpools mounted in RAID-Z.
Only ZFS datasets can be mounted inside your host's VFS, no exception on
that! Zpools cannot be mounted, never, never, never... please pay attention to the
Important: terminology and keep things clear by not messing up with terms. We will
introduce ZFS snapshots and ZFS clones but those are ZFS datasets at the basis
so they can also be mounted and unmounted.
If a ZFS dataset behaves just like any other filesystem, can we unmount it?
# umount /myfirstpool
# mount | grep myfirstpool
No more /myfirstpool the line of sight! So yes, it is possible to unmount a ZFS dataset just like
you would do with any other filesystem. Is the ZFS dataset still present on the system even it is
unmounted? Let's check:
# zfs list
NAME
USED
AVAIL
REFER
myfirstpool
850M
6.98G
850M
MOUNTPOINT
/myfirstpool
Hopefully and obviously it is else ZFS would not be very useful. Your next concern would
certainly be: "How can we remount it then?" Simple! Like this:
# zfs mount myfirstpool
# mount | grep myfirstpool
myfirstpool on /myfirstpool type zfs (rw,xattr)
Doh!!!... Obvisouly nothing there. Another mystery? Sure not! The answer lies in a extremely
powerful feature of ZFS: the attributes. Simply speaking: an attribute is named property of a ZFS
dataset that holds a value. Attributes govern various aspects of how the datasets are managed
like: "Is the data has to be compressed?", "Is the data has to be encrypted?", "Is the data has to
be exposed to the rest of the world by NFS or SMB/Samba?" and of course... '"Where the dataset
has to be mounted?". The answer to that latter question can be tell by the following command:
# zfs get mountpoint myfirstpool
NAME
PROPERTY
VALUE
SOURCE
myfirstpool
mountpoint
/myfirstpool
default
Bingo! When you remounted the dataset just some paragraphs ago, ZFS automatically inspected
the mountpoint attribute and saw this dataset has to be mounted in the directory /myfirstpool.
Creating datasets
Obviously it is possible to have more than one ZFS dataset within a single zpool. Quizz: what
command would you use to subdivide a zpool in datasets? zfs or zpool? Stops reading for two
seconds and try to figure out this little question. Frankly.
Answer is... zfs! Although you want to operate on the zpool to logically subdivide it in several
datasets, you manage datasets at the end thus you will use the zfs command. It is not always easy
at the beginning, do not be too worry you will soon get the habit when to use one or the other.
Creating a dataset in a zpool is easy: just give to the zfs command the name of the pool you want
to divide and the name of the dataset you want to create in it. So let's create three datasets named
myfirstDS, mysecondDS and mythirdDS in myfirstpool(observe how we use the zpool and
datasets' names) :
# zfs create myfirstpool/myfirstDS
# zfs create myfirstpool/mysecondDS
# zfs create myfirstpool/mythirdDS
# zfs list
NAME
USED
AVAIL
REFER
MOUNTPOINT
myfirstpool
850M
6.98G
850M
myfirstpool/myfirstDS
30K
6.98G
30K
/myfirstpool/myfirstDS
myfirstpool/mysecondDS
30K
6.98G
30K
/myfirstpool/mysecondDS
myfirstpool/mythirdDS
30K
6.98G
30K
/myfirstpool/mythirdDS
/myfirstpool
Obviously we have there what we asked. Moreover if we inspect the contents of /myfirstpool we
can notice three new directories having the same than just created:
# ls -l /myfirstpool
total 8
lrwxrwxrwx
2 root root
2 Mar
2 15:26 myfirstDS
drwxr-xr-x
2 root root
2 Mar
2 15:26 mysecondDS
drwxr-xr-x
2 root root
2 Mar
2 15:26 mythirdDS
No surprise here! As you might have guessed, those three new directories serves as mountpoints:
# mount | grep myfirstpool
myfirstpool on /myfirstpool type zfs (rw,xattr)
myfirstpool/myfirstDS on /myfirstpool/myfirstDS type zfs (rw,xattr)
myfirstpool/mysecondDS on /myfirstpool/mysecondDS type zfs (rw,xattr)
myfirstpool/mythirdDS on /myfirstpool/mythirdDS type zfs (rw,xattr)
As we did before, we can copy some files in the newly created datasets just like they were
regular directories:
# cp -a /usr/portage /myfirstpool/mythirdDS
# ls -l /myfirstpool/mythirdDS/*
total 697
drwxr-xr-x
48 root root
49 Aug 18
2013 app-accessibility
drwxr-xr-x
drwxr-xr-x
4 root root
drwxr-xr-x
drwxr-xr-x
42 root root
drwxr-xr-x
34 root root
35 Aug 18
2013 app-benchmarks
drwxr-xr-x
66 root root
Nothing really too exciting here, we have file in mythirdDS. A bit more interesting output:
# zfs list
NAME
USED
AVAIL
REFER
1.81G
6.00G
850M
myfirstpool/myfirstDS
30K
6.00G
30K
/myfirstpool/myfirstDS
myfirstpool/mysecondDS
30K
6.00G
30K
/myfirstpool/mysecondDS
1002M
6.00G
1002M
myfirstpool
myfirstpool/mythirdDS
MOUNTPOINT
/myfirstpool
/myfirstpool/mythirdDS
# df -h
Filesystem
Size
myfirstpool
6.9G
850M
6.1G
myfirstpool/myfirstDS
6.1G
6.1G
0% /myfirstpool/myfirstDS
myfirstpool/mysecondDS
6.1G
6.1G
0% /myfirstpool/mysecondDS
myfirstpool/mythirdDS
7.0G 1002M
6.1G
(...)
13% /myfirstpool
15% /myfirstpool/mythirdDS
Noticed the size given for the 'AVAIL' column? At the very beginning of this tutorial we had
slightly less than 8 Gb of available space, it now has a value of roughly 6 Gb. The datasets are
just a subdivision of the zpool, they compete with each others for using the available storage
within the zpool, no miracle here. To what limit? The pool itself as we never imposed a quota on
datasets. Hopefully df and zfs list gives a coherent result.
Et voila! The zfs command is bit silent however if we check we can see that
myfirstpool/mythirdDS is now capped to 2 Gb (forget about 'REFER' for the moment): around 1
Gb of data has been copied in this dataset thus leaving a big 1 Gb of available space.
# zfs list
NAME
myfirstpool
USED
AVAIL
REFER
1.81G
6.00G
850M
MOUNTPOINT
/myfirstpool
myfirstpool/myfirstDS
30K
6.00G
30K
/myfirstpool/myfirstDS
myfirstpool/mysecondDS
30K
6.00G
30K
/myfirstpool/mysecondDS
1002M
1.02G
1002M
myfirstpool/mythirdDS
/myfirstpool/mythirdDS
Size
myfirstpool
6.9G
850M
6.1G
myfirstpool/myfirstDS
6.1G
6.1G
0% /myfirstpool/myfirstDS
myfirstpool/mysecondDS
6.1G
6.1G
0% /myfirstpool/mysecondDS
myfirstpool/mythirdDS
2.0G 1002M
1.1G
(...)
13% /myfirstpool
49% /myfirstpool/mythirdDS
Of course you can use this technique for the home directories of your users /home this also
having the a advantage of being much less forgiving than a soft/hard user quota: when the limit
is reached, it is reached period and no more data can be written in the dataset. The user must do
some cleanup and cannot procastinate anymore :-)
To remove the quota:
# zfs set quota=none myfirstpool/mythirdDS
none is simply the original value for the quota attribute (we did not demonstrate it, you can
check by doing a zfs get quota myfirstpool/mysecondDS for example).
Destroying datasets
There is no way to resurrect a destroyed ZFS dataset and the data it contained!
Important: Once you destroy a dataset the corresponding metadata is cleared and gone
forever so be careful when using zfs destroy notably with the -r option ...
We have three datasets, but the third is pretty useless and contains a lot of garbage. Is it possible
to remove it with a simple rm -rf? Let's try:
# rm -rf /myfirstpool/mythirdDS
rm: cannot remove `/myfirstpool/mythirdDS': Device or resource busy
This is perfectly normal, remember that datasets are indeed something mounted in your VFS.
ZFS might be ZFS and do alot for you, it cannot enforce the nature of a mounted filesystem
under Linux/Unix. The "ZFS way" to remove a dataset is to use the zfs command like this at the
reserve no process owns open files on it (once again, ZFS can do miracles for you but not that
kind of miracles as it has to unmount the dataset before deleting it):
# zfs destroy myfirstpool/mythirdDS
# zfs list
NAME
USED
AVAIL
REFER
MOUNTPOINT
myfirstpool
444M
7.38G
444M
myfirstpool/myfirstDS
21K
7.38G
21K
/myfirstpool/myfirstDS
myfirstpool/mysecondDS
21K
7.38G
21K
/myfirstpool/mysecondDS
/myfirstpool
USED
AVAIL
REFER
myfirstpool
851M
6.98G
850M
myfirstpool/myfirstDS
30K
6.98G
30K
/myfirstpool/myfirstDS
myfirstpool/mysecondDS
30K
6.98G
30K
/myfirstpool/mysecondDS
124K
6.98G
34K
/myfirstpool/mythirdDS
30K
6.98G
30K
/myfirstpool/mythirdDS/nestedDS1
myfirstpool/mythirdDS
myfirstpool/mythirdDS/nestedDS1
MOUNTPOINT
/myfirstpool
The zfs command detected the situation and refused to proceed on the deletion without your
consent to make a recursive destruction (-r parameter). Before going any step further let's create
some more nested datasets plus a couple of directories inside myfirstpool/mythirdDS:
USED
AVAIL
REFER
myfirstpool
851M
6.98G
850M
30K
6.98G
30K
30K
6.98G
30K
157K
6.98G
37K
30K
6.98G
30K
30K
6.98G
30K
60K
6.98G
30K
30K
6.98G
30K
myfirstpool/myfirstDS
/myfirstpool/myfirstDS
myfirstpool/mysecondDS
/myfirstpool/mysecondDS
myfirstpool/mythirdDS
/myfirstpool/mythirdDS
myfirstpool/mythirdDS/nestedDS1
/myfirstpool/mythirdDS/nestedDS1
myfirstpool/mythirdDS/nestedDS2
/myfirstpool/mythirdDS/nestedDS2
myfirstpool/mythirdDS/nestedDS3
/myfirstpool/mythirdDS/nestedDS3
myfirstpool/mythirdDS/nestedDS3/nestednestedDS
/myfirstpool/mythirdDS/nestedDS3/nestednestedDS
USED
AVAIL
REFER
MOUNTPOINT
myfirstpool
851M
6.98G
850M
myfirstpool/myfirstDS
30K
6.98G
30K
/myfirstpool/myfirstDS
myfirstpool/mysecondDS
30K
6.98G
30K
/myfirstpool/mysecondDS
/myfirstpool
MOUNTPOINT
/myfirstpool
This is, by far, one of the coolest features of ZFS. You can:
1. take a photo of a dataset (this photo is called a snapshot)
2. do whatever you want with the data contained in the dataset
3. restore (roll back) the dataset in in the exact same state it was before you did your
changes just as if nothing had ever happened in the middle.
Single snapshot
Important: Only ZFS datasets can be snapshotted and rolled back, not the zpool.
48 root root
49 Aug 18
2013 app-accessibility
drwxr-xr-x
drwxr-xr-x
4 root root
drwxr-xr-x
drwxr-xr-x
42 root root
drwxr-xr-x
34 root root
35 Aug 18
drwxr-xr-x
62 root root
drwxr-xr-x
16 root root
17 Aug 18
drwxr-xr-x
64 root root
2013 app-benchmarks
(...)
2013 xfce-base
Now, let's take a snapshot of mysecondDS. What command would be used? zpool or zfs? In that
case it is zfs because we manipulate a ZFS dataset (this time you problably got it right!):
# zfs snapshot myfirstpool/mysecondDS@Charlie
Note:
# ls -la /myfirstpool/mysecondDS
total 9
drwxr-xr-x
3 root root
3 Mar
2 18:22 .
drwxr-xr-x
5 root root
6 Mar
2 17:58 ..
2 18:36 portage
Nothing really new the portage directory is here nothing more a priori. If you have used BTRFS
before reading this tutorial you probably expected to see a @Charlie lying in
/myfirstpool/mysecondDS? So where the check is Charlie? In ZFS a dataset snapshot is not
visible from within the VFS tree (if you are not convinced you can search for it with the find
command but it will never find it). Let's check with the zfs command:
# zfs list
# zfs list -t all
NAME
myfirstpool
myfirstpool/myfirstDS
myfirstpool/mysecondDS
USED
AVAIL
REFER
1.81G
6.00G
850M
30K
6.00G
30K
1001M
6.00G
1001M
MOUNTPOINT
/myfirstpool
/myfirstpool/myfirstDS
/myfirstpool/mysecondDS
Wow... No sign of the snapshot. What you mus know is that indeed zfs list shows only datasets
by default and omits snapshots. If the command is invoked with the parameter -t set to all it will
list everything:
# zfs list
# zfs list -t all
NAME
myfirstpool
myfirstpool/myfirstDS
myfirstpool/mysecondDS
myfirstpool/mysecondDS@Charlie
USED
AVAIL
REFER
MOUNTPOINT
1.81G
6.00G
850M
30K
6.00G
30K
1001M
6.00G
1001M
/myfirstpool/mysecondDS
1001M
/myfirstpool
/myfirstpool/myfirstDS
So yes, @Charlie is here! Also notice here the power of copy-on-write filesystems: @Charlie
takes only a couple of kilobytes (some ZFS metadata) just like any ZFS snapshot at the time they
are taken. The reason snapshots occupy very little space in the datasets is because data and
metadata blocks are the same and no physical copy of them are made. At the time goes on and
more and more changes happens in the original dataset (myfirstpool/mysecondDS here), ZFS will
allocate new data and metadata blocks to accommodate the changes but will leave the blocks
used by the snapshot untouched and the snapshot will tend to eat more and more pool space. It
seems odd at first glance because a snapshot is a frozen in time copy of a ZFS dataset but this the
way ZFS manage them. So caveat emptor: remove any unused snapshot to not full your zpool...
/myfirstpool/mysecondDS/hello.txt
# cp /lib/firmware/radeon/* /myfirstpool/mysecondDS
# ls -l
/myfirstpool/mysecondDS
/myfirstpool/mysecondDS:
total 3043
-rw-r--r--
1 root root
8704 Mar
2 19:29 ARUBA_me.bin
-rw-r--r--
1 root root
8704 Mar
2 19:29 ARUBA_pfp.bin
-rw-r--r--
1 root root
6144 Mar
2 19:29 ARUBA_rlc.bin
-rw-r--r--
1 root root
24096 Mar
2 19:29 BARTS_mc.bin
-rw-r--r--
1 root root
5504 Mar
2 19:29 BARTS_me.bin
-rw-r--r--
1 root root
60388 Mar
-rw-r--r--
1 root root
13 Mar
2 19:28 hello.txt
95 Mar
2 19:28 portage
(...)
2 19:29 VERDE_smc.bin
/myfirstpool/mysecondDS/portage:
total 324
drwxr-xr-x
16 root root
drwxr-xr-x
2 root root
drwxr-xr-x
20 root root
21 Jan
7 06:56 lxde-base
(...)
USED
AVAIL
REFER
MOUNTPOINT
1.82G
6.00G
850M
30K
6.00G
30K
/myfirstpool/myfirstDS
1005M
6.00G
903M
/myfirstpool/mysecondDS
102M
1001M
/myfirstpool
Noticed the size's increase of myfirstpool/mysecondDS@Charlie? This is mainly due to new files
copied in the snasphot: ZFS had to retained the original blocks of data. Now time to roll this ZFS
dataset back to its original state (if some processes would have open files in the dataset to be
rolled back, you should terminate them first) :
# zfs rollback myfirstpool/mysecondDS@Charlie
# ls -l /myfirstpool/mysecondDS
total 6
drwxr-xr-x 164 root root 169 Aug 18 18:25 portage
Again, ZFS handled everything for you and you now have the contents of mysecondDS exactly
as it was at the time the snapshot Charlie was taken. Not more complicated than that. Not
illustrated here but if you look at the output given by zfs list -t all at this point you will notice
that the Charlie snapshot only eat very little space. This is normal: the modified blocks have
been dropped so myfirstpool/mysecondDS and its myfirstpool/mysecondDS@Charlie snapshot
are the same module some metadata (hence the few kilobytes of space taken).
2 15:26 .zfs
# cd .zfs
# pwd
/myfirstpool/mysecondDS/.zfs
# ls -la
total 4
dr-xr-xr-x 1 root root
0 Mar
2 15:26 .
2 19:29 ..
2 Mar
2 19:47 shares
2 Mar
2 18:46 snapshot
We will focus on the snapshot directory and since we did not dropped the Charlie snapshot (yet)
let's see what lies there:
# cd snapshot
# ls -l
total 0
dr-xr-xr-x 1 root root 0 Mar
2 20:16 Charlie
Yes we found Charlie here (also!), the snapshot is seen as regular directory but pay attention to
its permissions:
Did you notice? Not a single write permission on this directory, the only action any user can do
is to enter in the directory and list its contents. This not a bug but the nature of ZFS snapshots:
they are read-only stuff at the basis. Next question is naturally: can we change something in it?
For that we have to enter inside the Charlie directory:
# cd Charlie
# ls -la
total 7
drwxr-xr-x
3 root root
3 Mar
2 18:22 .
dr-xr-xr-x
3 root root
3 Mar
2 18:46 ..
2 18:36 portage
No surprise here: at the time we took the snapshot, myfirstpool/mysecondDS held a copy of the
portage tree stored in a directory named portage. At first glance this one seems to be writable for
the root user let's try to create a file in it:
# cd portage
# touch test
touch: cannot touch test: Read-only file system
Thing are a bit tricky here: indeed nothing has been mounted (check with the mount command!),
we are walking though a pseudo-directory exposed by ZFS that holds the Charlie snapshot.
Pseudo-directory because in fact .zfs had no physical existence even in the ZFS metadata as they
exists in the zpool. It is just a convenient way provided by the ZFS kernel modules to walk inside
the various snapshots' content. You can see but you cannot touch :-)
Is it possible to know what is the difference between a a live dataset and its snapshot? Answer to
this question is yes and the zfs command will help us in this task. Now we rolled back the
myfirstpool/mysecondDS ZFS dataset back to its original state we have to botch it again:
# cp -a /lib/firmware/radeon/C* /myfirstpool/mysecondDS
Now inspect the difference between the live ZFS dataset myfirstpool/mysecondDS and its
snasphot Charlie, this is done via zfs diff and by giving only the snapshot's name (you can
inspect the difference between snasphot with that command with a slightly change in
parameters):
# # zfs diff myfirstpool/mysecondDS@Charlie
M
/myfirstpool/mysecondDS/
/myfirstpool/mysecondDS/CAICOS_mc.bin
/myfirstpool/mysecondDS/CAICOS_me.bin
/myfirstpool/mysecondDS/CAICOS_pfp.bin
/myfirstpool/mysecondDS/CAICOS_smc.bin
/myfirstpool/mysecondDS/CAYMAN_mc.bin
/myfirstpool/mysecondDS/CAYMAN_me.bin
(...)
/myfirstpool/mysecondDS/
/myfirstpool/mysecondDS/portage/sys-libs/glibc
/myfirstpool/mysecondDS/portage/sys-libs/glibc/Manifest
/myfirstpool/mysecondDS/CAICOS_mc.bin
/myfirstpool/mysecondDS/CAICOS_me.bin
/myfirstpool/mysecondDS/CAICOS_pfp.bin
/myfirstpool/mysecondDS/CAICOS_smc.bin
/myfirstpool/mysecondDS/CAYMAN_mc.bin
/myfirstpool/mysecondDS/CAYMAN_me.bin
(...)
/myfirstpool/mysecondDS/
/myfirstpool/mysecondDS/portage/sys-devel
/myfirstpool/mysecondDS/portage/sys-devel/gcc
/myfirstpool/mysecondDS/portage/sys-devel/gcc/files
/myfirstpool/mysecondDS/portage/sys-devel/gcc/files/awk
/myfirstpool/mysecondDS/portage/sys-devel/gcc/files/awk/fixlafiles.awk
/myfirstpool/mysecondDS/portage/sys-devel/gcc/files/awk/fixlafiles.awk-
no_gcc_la
-
/myfirstpool/mysecondDS/portage/sys-devel/gcc/files/c89
/myfirstpool/mysecondDS/portage/sys-devel/gcc/files/c99
/myfirstpool/mysecondDS/portage/sys-devel/gcc/files/gcc-4.6.4-fix-libgcc-s-
path-with-vsrl.patch
-
/myfirstpool/mysecondDS/portage/sys-devel/gcc/files/gcc-spec-env.patch
/myfirstpool/mysecondDS/portage/sys-devel/gcc/files/gcc-spec-env-r1.patch
/myfirstpool/mysecondDS/portage/sys-devel/gcc/files/gcc-4.8.2-fix-cache-
detection.patch
-
/myfirstpool/mysecondDS/portage/sys-devel/gcc/files/fix_libtool_files.sh
/myfirstpool/mysecondDS/portage/sys-devel/gcc/files/gcc-configure-
texinfo.patch
-
/myfirstpool/mysecondDS/portage/sys-devel/gcc/files/gcc-4.8.1-bogus-error-
with-int.patch
-
/myfirstpool/mysecondDS/portage/sys-devel/gcc/gcc-4.3.3-r2.ebuild
/myfirstpool/mysecondDS/portage/sys-devel/gcc/metadata.xml
/myfirstpool/mysecondDS/portage/sys-devel/gcc/gcc-4.6.4-r2.ebuild
/myfirstpool/mysecondDS/portage/sys-devel/gcc/gcc-4.6.4.ebuild
/myfirstpool/mysecondDS/portage/sys-devel/gcc/gcc-4.8.1-r1.ebuild
/myfirstpool/mysecondDS/portage/sys-devel/gcc/gcc-4.8.1-r2.ebuild
/myfirstpool/mysecondDS/portage/sys-devel/gcc/gcc-4.6.2-r1.ebuild
/myfirstpool/mysecondDS/portage/sys-devel/gcc/gcc-4.8.1-r3.ebuild
/myfirstpool/mysecondDS/portage/sys-devel/gcc/gcc-4.8.2.ebuild
/myfirstpool/mysecondDS/portage/sys-devel/gcc/gcc-4.8.1-r4.ebuild
/myfirstpool/mysecondDS/portage/sys-devel/gcc/Manifest
/myfirstpool/mysecondDS/portage/sys-devel/gcc/gcc-4.7.3-r1.ebuild
/myfirstpool/mysecondDS/portage/sys-devel/gcc/gcc-4.8.2-r1.ebuild
/myfirstpool/mysecondDS/portage/sys-libs/glibc
/myfirstpool/mysecondDS/portage/sys-libs/glibc/Manifest
/myfirstpool/mysecondDS/CAICOS_mc.bin
/myfirstpool/mysecondDS/CAICOS_me.bin
/myfirstpool/mysecondDS/CAICOS_pfp.bin
/myfirstpool/mysecondDS/CAICOS_smc.bin
/myfirstpool/mysecondDS/CAYMAN_mc.bin
/myfirstpool/mysecondDS/CAYMAN_me.bin
(...)
No need to explain that digital mayhem! What happens if, in addition, we change the contents of
the file /myfirstpool/mysecondDS/portage/sys-devel/autoconf/Manifest?
# zfs diff myfirstpool/mysecondDS@Charlie
M
/myfirstpool/mysecondDS/
/myfirstpool/mysecondDS/portage/sys-devel
/myfirstpool/mysecondDS/portage/sys-devel/autoconf/Manifest
/myfirstpool/mysecondDS/portage/sys-devel/gcc
/myfirstpool/mysecondDS/portage/sys-devel/gcc/files
/myfirstpool/mysecondDS/portage/sys-devel/gcc/files/awk
/myfirstpool/mysecondDS/portage/sys-devel/gcc/files/awk/fixlafiles.awk
/myfirstpool/mysecondDS/portage/sys-devel/gcc/files/awk/fixlafiles.awk-
no_gcc_la
(...)
Dropping a snapshot
A snapshot is no more than a dataset frozen in time and thus can be destroyed in the exact same
way seen in the paragraphs before. Now we do not need the Charlie snapshot we can remove it.
Simple:
# zfs destroy myfirstpool/mysecondDS@Charlie
# zfs list -t all
NAME
myfirstpool
myfirstpool/myfirstDS
myfirstpool/mysecondDS
USED
AVAIL
REFER
1.71G
6.10G
850M
30K
6.10G
30K
903M
6.10G
903M
MOUNTPOINT
/myfirstpool
/myfirstpool/myfirstDS
/myfirstpool/mysecondDS
# ls -la /myfirstpool/myfirstDS
total 3
drwxr-xr-x 2 root root 2 Mar 2 21:14 .
drwxr-xr-x 5 root root 6 Mar 2 17:58 ..
Now let's generate some contents, take a snapshot (snapshot-1), add more content, take a
snapshot again (snapshot-2), do some modifications again and take a third snapshot (snapshot-3):
# echo "Hello, world" > /myfirstpool/myfirstDS/hello.txt
# cp -R /lib/firmware/radeon /myfirstpool/myfirstDS
# ls -l /myfirstpool/myfirstDS
total 5
-rw-r--r-- 1 root root 13 Mar 3 06:41 hello.txt
drwxr-xr-x 2 root root 143 Mar 3 06:42 radeon
# zfs snapshot myfirstpool/myfirstDS@snapshot-1
# echo "Goodbye, world" > /myfirstpool/myfirstDS/goodbye.txt
# echo "Are you there?" >> /myfirstpool/myfirstDS/hello.txt
# cp /proc/config.gz /myfirstpool/myfirstDS
# rm /myfirstpool/myfirstDS/radeon/CAYMAN_me.bin
# zfs snapshot myfirstpool/myfirstDS@snapshot-2
# echo "Still there?" >> /myfirstpool/myfirstDS/goodbye.txt
# mv /myfirstpool/myfirstDS/hello.txt /myfirstpool/myfirstDS/hello_new.txt
# cat /proc/version > /myfirstpool/myfirstDS/version.txt
# zfs snapshot myfirstpool/myfirstDS@snapshot-3
# zfs list -t all
NAME
USED
AVAIL
REFER
myfirstpool
1.81G
6.00G
850M
myfirstpool/myfirstDS
3.04M
6.00G
2.97M
/myfirstpool/myfirstDS
myfirstpool/myfirstDS@snapshot-1
47K
2.96M
myfirstpool/myfirstDS@snapshot-2
30K
2.97M
myfirstpool/myfirstDS@snapshot-3
2.97M
1003M
6.00G
1003M
/myfirstpool/mysecondDS
myfirstpool/mysecondDS
MOUNTPOINT
/myfirstpool
You saw to how use zfs diff to compare the difference between a snapshot and its corresponding
"live" dataset in the above paragraphs. Doing the same exercise with two snapshots is not that
much different as you just have to explicitly tell the command what datasets are to be compared
against and the command will oputput the result in the exact same manner.So what are the
/myfirstpool/myfirstDS/
/myfirstpool/myfirstDS/hello.txt
/myfirstpool/myfirstDS/radeon
/myfirstpool/myfirstDS/radeon/CAYMAN_me.bin
/myfirstpool/myfirstDS/goodbye.txt
/myfirstpool/myfirstDS/config.gz
Before digging farther, let's think about what we did between the time we created the first
snapshot and the second snapshot:
We modified the file /myfirstpool/myfirstDS/hello.txt hence the 'M' shown on left of the
second line (thus we changed something under /myfirstpool/myfirstDS hence a 'M' is also
shown on the left of the first line)
We deleted the file /myfirstpool/myfirstDS/radeon/CAYMAN_me.bin hence the minus
sign ('-') shown on the left of the fourth line (and the 'M' shown on left of the third line)
We added two files which were /myfirstpool/myfirstDS/goodbye.txt and
/myfirstpool/myfirstDS/config.gz hence the plus sign ('+') shown on the left of the fifth
and sixth lines (also this is a change happening in /myfirstpool/myfirstDS hence another
reason to show a 'M' on the left of the first line)
/myfirstpool/myfirstDS/
/myfirstpool/myfirstDS/goodbye.txt
/myfirstpool/myfirstDS/version.txt
Try to interpret what you see except for the second line where a "R" (standing for "Rename") is
shown. ZFS is smart enough to also show both the old the new names!
Why not push the limit and try a few fancy things. First things first: what happens if we tell to
compare two snapshots but in a reverse order?
# zfs diff myfirstpool/myfirstDS@snapshot-3 myfirstpool/myfirstDS@snapshot-2
Unable to obtain diffs:
Not an earlier snapshot from the same fs
Is ZFS would be a bit more happy if we ask the difference between two snapshots this time with
a gap in between (so snapshot 1 with snapshot 3):
# zfs diff myfirstpool/myfirstDS@snapshot-1 myfirstpool/myfirstDS@snapshot-3
M
/myfirstpool/myfirstDS/
/myfirstpool/myfirstDS/radeon
/myfirstpool/myfirstDS/radeon/CAYMAN_me.bin
/myfirstpool/myfirstDS/goodbye.txt
/myfirstpool/myfirstDS/config.gz
/myfirstpool/myfirstDS/version.txt
Amazing! Here again, take a couple of minutes to think about all operations you did on the
dataset between the time you took the first snapshot and the time you took the last snapshot: this
summary is the exact reflect of all your previous operations.
Just to put a conclusion on this subject, let's see the differences between the
myfirstpool/myfirstDS dataset and its various snapshots:
# zfs diff myfirstpool/myfirstDS@snapshot-1
M
/myfirstpool/myfirstDS/
/myfirstpool/myfirstDS/radeon
/myfirstpool/myfirstDS/radeon/CAYMAN_me.bin
/myfirstpool/myfirstDS/goodbye.txt
/myfirstpool/myfirstDS/config.gz
/myfirstpool/myfirstDS/version.txt
/myfirstpool/myfirstDS/
/myfirstpool/myfirstDS/goodbye.txt
/myfirstpool/myfirstDS/version.txt
Having nothing reported for the last zfs diff is normal as changed in the dataset since the
snapshot has been taken.
The time travelling machine part 2: rolling back with multiple snapshots
Examining the differences between the various snapshots of a dataset or the dataset itself would
be quite useless if we would not be able to roll the dataset back to one of its previous states. How
we have salvaged myfirstpool/myfirstDS a bit, it would the time to restore it at it was when the
first snapshot had been taken:
# zfs rollback myfirstpool/myfirstDS@snapshot-1
cannot rollback to 'myfirstpool/myfirstDS@snapshot-1': more recent snapshots exist
use '-r' to force deletion of the following snapshots:
myfirstpool/myfirstDS@snapshot-3
myfirstpool/myfirstDS@snapshot-2
Err... Well, ZFS just tells us that several more recent snapshots exists and it refuses to proceed
without dropping those latter. Unfortunately for us there is no way to circumvent that: once you
jump backward you have no way to move forward again. We could demonstrate the rollback to
myfirstpool/myfirstDS@snapshot-3 then myfirstpool/myfirstDS@snapshot-2 then
myfirstpool/myfirstDS@snapshot-1 but it would be of very little interest previous sections of this
tutorial did that already so second attempt:
# zfs rollback -r myfirstpool/myfirstDS@snapshot-1
# zfs list -t all
NAME
USED
AVAIL
REFER
myfirstpool
1.81G
6.00G
850M
myfirstpool/myfirstDS
2.96M
6.00G
2.96M
/myfirstpool/myfirstDS
1K
2.96M
1003M
6.00G
1003M
/myfirstpool/mysecondDS
myfirstpool/myfirstDS@snapshot-1
myfirstpool/mysecondDS
MOUNTPOINT
/myfirstpool
No differences at all!
A clone appears as mounted dataset (i.e. you can read and write data in it) while a
snapshot stays apart and is always read-only
A clone is always spawned from a snapshot
So it is absolutely true to say that a clone is just indeed a writable snapshot. The copy-on-write
feature of ZFS plays its role even there: the data blocks hold by the snapshot are only duplicated
upon modification. So cloning 20Gb snapshot of data does not lead to an additional 20 Gb of
data being eaten from the pool.
How to make a clone? Simple, once again with the zfs command used like this:
# zfs clone myfirstpool/myfirstDS@snapshot-1 myfirstpool/myfirstDS_clone1
# fs list -t all
NAME
USED
AVAIL
REFER
myfirstpool
1.81G
6.00G
850M
myfirstpool/myfirstDS
2.96M
6.00G
2.96M
/myfirstpool/myfirstDS
myfirstpool/myfirstDS@snapshot-1
1K
2.96M
myfirstpool/myfirstDS_clone1
1K
6.00G
2.96M
/myfirstpool/myfirstDS_clone1
1003M
6.00G
1003M
/myfirstpool/mysecondDS
myfirstpool/mysecondDS
MOUNTPOINT
/myfirstpool
In theory we can change or write additional data in the clone as it is mounted as being writable
(rw). Let it be!
# # ls /myfirstpool/myfirstDS_clone1
hello.txt
radeon
# cp /proc/config.gz /myfirstpool/myfirstDS_clone1
# echo 'This is a clone!' >> /myfirstpool/myfirstDS_clone1/hello.txt
# ls /myfirstpool/myfirstDS_clone1
config.gz
hello.txt
radeon
# cat /myfirstpool/myfirstDS_clone1/hello.txt
Hello, world
This is a clone!
Unfortunately it is not possible to ask the difference between a clone and a snapshot, zfs diff
expects to see either a snapshot name either two snapshots names. Once spawned, a clone starts
its own existence and the clone that served as a seed for it remains attached to its own original
dataset.
Because clones are nothing more than a ZFS dataset they can be destroyed just like any ZFS
dataset:
# zfs destroy myfirstpool/myfirstDS_clone1
# zfs list -t all
NAME
USED
AVAIL
REFER
myfirstpool
1.81G
6.00G
850M
myfirstpool/myfirstDS
2.96M
6.00G
2.96M
/myfirstpool/myfirstDS
1K
2.96M
1003M
6.00G
1003M
/myfirstpool/mysecondDS
myfirstpool/myfirstDS@snapshot-1
myfirstpool/mysecondDS
MOUNTPOINT
/myfirstpool
Ouch... ZFS refuses to go any step further because some data would be overwritten. We do now
own any critical data on the dataset so we could destroy it and try again or use a different name
nevertheless, just for the sake of the demonstration, let's create another zpool prior restoring the
dataset there:
# dd if=/dev/zero of=/tmp/zfs-test-disk04.img bs=2G count=1
0+1 records in
0+1 records out
2147479552 bytes (2.1 GB) copied, 6.35547 s, 338 MB/s
# losetup -f
/dev/loop4
# losetup /dev/loop4 /tmp/zfs-test-disk04.img
# zpool create testpool /dev/loop4
# zpool list
NAME
SIZE
ALLOC
FREE
CAP
DEDUP
HEALTH
ALTROOT
myfirstpool
7.94G
1.81G
6.12G
22%
1.00x
ONLINE
testpool
1.98G
89.5K
1.98G
0%
1.00x
ONLINE
Take two:
# cat /tmp/myfirstpool-myfirstDS@snapshot-snap1 | zfs receive
testpool/myfirstDS@testrecv
# zfs list -t all
NAME
USED
AVAIL
REFER
myfirstpool
1.81G
6.00G
850M
myfirstpool/myfirstDS
2.96M
6.00G
2.96M
/myfirstpool/myfirstDS
1K
2.96M
myfirstpool/mysecondDS
1003M
6.00G
1003M
/myfirstpool/mysecondDS
testpool
3.08M
1.95G
31K
testpool/myfirstDS
2.96M
1.95G
2.96M
/testpool/myfirstDS
2.96M
myfirstpool/myfirstDS@snapshot-1
testpool/myfirstDS@testrecv
MOUNTPOINT
/myfirstpool
/testpool
Very interesting things happened there! First the data previously stored in the file
/tmp/myfirstpool-myfirstDS@snapshot-snap1 been copied as a snapshot in the destination zpool
(testpool here) and it has been copied exactly in the same manner given on the command line.
Second a clone of this snapshot has been crated for you by ZFS and the snapshot
myfirstpool/myfirstDS@snapshot-1 now appears as a live ZFS dataset where data can be read
and written! Think two seconds about the error message we got just above, the reason ZFS
protested becomes clear now.
An alternative would have been to use the original zpool but this time with a different name for
the dataset:
# cat /tmp/myfirstpool-myfirstDS@snapshot-snap1 | zfs receive
myfirstpool/myfirstDS_copy@testrecv
USED
AVAIL
REFER
myfirstpool
1.82G
6.00G
850M
myfirstpool/myfirstDS
2.96M
6.00G
2.96M
/myfirstpool/myfirstDS
1K
2.96M
2.96M
6.00G
2.96M
/myfirstpool/myfirstDS_copy
2.96M
1003M
6.00G
1003M
/myfirstpool/mysecondDS
myfirstpool/myfirstDS@snapshot-1
myfirstpool/myfirstDS_copy
myfirstpool/myfirstDS_copy@testrecv
myfirstpool/mysecondDS
MOUNTPOINT
/myfirstpool
Now something a bit more interesting: instead of using a local file, we will stream the dataset to
a Solaris 11 machine (OpenIndiana can be used also) over the network using the GNU flavour of
netcat (net-analyzer/gnu-netcat) over the port TCP/7000 , in that case the Solaris host is a x86
machine but a SPARC machine would have given the exact same result as ZFS contrary to UFS
is platform agnostic.
On the Solaris machine:
# nc -l -p 7000 | zfs receive nas/zfs-stream-test@s1
USED
AVAIL
REFER
MOUNTPOINT
3.02M
6.17T
3.02M
/nas/zfs-stream-test
3.02M
(...)
nas/zfs-stream-test
nas/zfs-stream-test@s1
A quick look in the /san/zfs-stream-test directory on the same Solaris machine gives:
# ls -lR /nas/zfs-stream-test
/nas/zfs-stream-test/:
total 12
-rw-r--r--
1 root
root
13 Mar
drwxr-xr-x
2 root
root
143 Mar
3 18:59 hello.txt
3 18:59 radeon
/nas/zfs-stream-test/radeon:
total 6144
-rw-r--r--
1 root
root
8704 Mar
3 18:59 ARUBA_me.bin
-rw-r--r--
1 root
root
8704 Mar
3 18:59 ARUBA_pfp.bin
-rw-r--r--
1 root
root
6144 Mar
3 18:59 ARUBA_rlc.bin
-rw-r--r--
1 root
root
24096 Mar
3 18:59 BARTS_mc.bin
-rw-r--r--
1 root
root
5504 Mar
3 18:59 BARTS_me.bin
-rw-r--r--
1 root
root
4480 Mar
3 18:59 BARTS_pfp.bin
(...)
Note:
We took only a simple case here: ZFS can is able to handle snapshots is a very flexible
way. You can ask, for example, to combine several consecutive snapshots then send
them as a single snapshot or you can choose to proceed in incremental steps. A man zfs
will tell you the art of streaming your snapshots.
Not all of a dataset properties are settable, some of them are set and managed by the operating
system in the background for you and thus cannot be modified. Like any other action concerning
datasets, properties are sets and unset via the zfs command. Let's start by checking the value of
all supported attributes for the dataset myfirstpool/myfirstDS:
PROPERTY
VALUE
SOURCE
myfirstpool/myfirstDS
type
filesystem
myfirstpool/myfirstDS
creation
Sun Mar
myfirstpool/myfirstDS
used
2.96M
myfirstpool/myfirstDS
available
6.00G
myfirstpool/myfirstDS
referenced
2.96M
myfirstpool/myfirstDS
compressratio
1.00x
myfirstpool/myfirstDS
mounted
yes
myfirstpool/myfirstDS
quota
none
default
myfirstpool/myfirstDS
reservation
none
default
myfirstpool/myfirstDS
recordsize
128K
default
myfirstpool/myfirstDS
mountpoint
/myfirstpool/myfirstDS
default
myfirstpool/myfirstDS
sharenfs
off
default
myfirstpool/myfirstDS
checksum
on
default
myfirstpool/myfirstDS
compression
off
default
myfirstpool/myfirstDS
atime
on
default
myfirstpool/myfirstDS
devices
on
default
myfirstpool/myfirstDS
exec
on
default
myfirstpool/myfirstDS
setuid
on
default
myfirstpool/myfirstDS
readonly
off
default
myfirstpool/myfirstDS
zoned
off
default
myfirstpool/myfirstDS
snapdir
hidden
default
myfirstpool/myfirstDS
aclinherit
restricted
default
myfirstpool/myfirstDS
canmount
on
default
myfirstpool/myfirstDS
xattr
on
default
myfirstpool/myfirstDS
copies
default
myfirstpool/myfirstDS
version
myfirstpool/myfirstDS
utf8only
off
myfirstpool/myfirstDS
normalization
none
myfirstpool/myfirstDS
casesensitivity
sensitive
myfirstpool/myfirstDS
vscan
off
default
myfirstpool/myfirstDS
nbmand
off
default
myfirstpool/myfirstDS
sharesmb
off
default
myfirstpool/myfirstDS
refquota
none
default
myfirstpool/myfirstDS
refreservation
none
default
2 15:26 2014
myfirstpool/myfirstDS
primarycache
all
default
myfirstpool/myfirstDS
secondarycache
all
default
myfirstpool/myfirstDS
usedbysnapshots
1K
myfirstpool/myfirstDS
usedbydataset
2.96M
myfirstpool/myfirstDS
usedbychildren
myfirstpool/myfirstDS
usedbyrefreservation
myfirstpool/myfirstDS
logbias
latency
default
myfirstpool/myfirstDS
dedup
off
default
myfirstpool/myfirstDS
mlslabel
none
default
myfirstpool/myfirstDS
sync
standard
default
myfirstpool/myfirstDS
refcompressratio
1.00x
myfirstpool/myfirstDS
written
1K
myfirstpool/myfirstDS
snapdev
hidden
default
Note:
the manual page of the zfs command gives a list and description of every attributes
supported by a dataset.
May be something poked your curiosity: "what SOURCE means?". SOURCE describes how the
property has been determined for the dataset and can have several values:
local: the property has been explicitly set for this dataset
default: a default value has been assigned by the operating system if not explicitely set
by the system adminsitrator
dash (-): immutable property (e.g. dataset creation time, whether the dataset is currently
mounted or not...)
Of course you can get the property of a single attribute if you know its name instead of asking
for all properties.
Compressing data
# zfs get compression myfirstpool/myfirstDS
NAME
PROPERTY
VALUE
SOURCE
myfirstpool/myfirstDS
compression
off
default
Let's activate the compression on the volume (notice the change in the SOURCE column). That
is being achieved through an attribute simply named compression which can be changed by
running the zfs command with the set sub-command followed by the attribute's name
(compression here) and value (on here) like this:
PROPERTY
VALUE
SOURCE
myfirstpool/myfirstDS
compression
on
local
The attribute's new value becomes immediately effective no need to unmount and remount
anything. compression set to on will only affect new data and not what already exists on the
dataset. For your information, the lzjb compression algorithms is used when compression is set
to on, you can override and use another compression algorithm by explicitly tell your choice. For
example if you want to activate LZ4 compression on the dataset:
# zfs get compression myfirstpool/myfirstDS
NAME
PROPERTY
VALUE
SOURCE
myfirstpool/myfirstDS
compression
off
default
PROPERTY
VALUE
myfirstpool/myfirstDS
compression
lz4
SOURCE
local
PROPERTY
VALUE
SOURCE
myfirstpool/myfirstDS
type
filesystem
myfirstpool/myfirstDS
creation
Sun Mar
myfirstpool/myfirstDS
used
584M
myfirstpool/myfirstDS
available
5.43G
myfirstpool/myfirstDS
referenced
584M
myfirstpool/myfirstDS
compressratio
1.96x
myfirstpool/myfirstDS
mounted
yes
myfirstpool/myfirstDS
quota
none
default
myfirstpool/myfirstDS
reservation
none
default
myfirstpool/myfirstDS
recordsize
128K
default
myfirstpool/myfirstDS
mountpoint
/myfirstpool/myfirstDS
default
myfirstpool/myfirstDS
sharenfs
off
default
myfirstpool/myfirstDS
checksum
on
default
2 15:26 2014
Compression ratio
<<<<
myfirstpool/myfirstDS
compression
on
local
myfirstpool/myfirstDS
atime
on
default
myfirstpool/myfirstDS
devices
on
default
myfirstpool/myfirstDS
exec
on
default
myfirstpool/myfirstDS
setuid
on
default
myfirstpool/myfirstDS
readonly
off
default
myfirstpool/myfirstDS
zoned
off
default
myfirstpool/myfirstDS
snapdir
hidden
default
myfirstpool/myfirstDS
aclinherit
restricted
default
myfirstpool/myfirstDS
canmount
on
default
myfirstpool/myfirstDS
xattr
on
default
myfirstpool/myfirstDS
copies
default
myfirstpool/myfirstDS
version
myfirstpool/myfirstDS
utf8only
off
myfirstpool/myfirstDS
normalization
none
myfirstpool/myfirstDS
casesensitivity
sensitive
myfirstpool/myfirstDS
vscan
off
default
myfirstpool/myfirstDS
nbmand
off
default
myfirstpool/myfirstDS
sharesmb
off
default
myfirstpool/myfirstDS
refquota
none
default
myfirstpool/myfirstDS
refreservation
none
default
myfirstpool/myfirstDS
primarycache
all
default
myfirstpool/myfirstDS
secondarycache
all
default
myfirstpool/myfirstDS
usedbysnapshots
myfirstpool/myfirstDS
usedbydataset
584M
myfirstpool/myfirstDS
usedbychildren
myfirstpool/myfirstDS
usedbyrefreservation
myfirstpool/myfirstDS
logbias
latency
default
myfirstpool/myfirstDS
dedup
off
default
myfirstpool/myfirstDS
mlslabel
none
default
myfirstpool/myfirstDS
sync
standard
default
myfirstpool/myfirstDS
refcompressratio
1.96x
myfirstpool/myfirstDS
written
584M
myfirstpool/myfirstDS
snapdev
hidden
default
compression active
<<<< LZJB
Notice the value for compressionratio: it no longer shows 1.00x but a shiny 1.96 here (1.96:1
ratio). We have a high compression ratio here because we copied a lot of source code files but if
we put a lot of compressed data (images in jpeg or png format for example) the ratio would have
decreased a lot.
PROPERTY
VALUE
SOURCE
myfirstpool/myfirstDS
mountpoint
/myfirstpool/myfirstDS
default
USED
AVAIL
REFER
MOUNTPOINT
2.38G
5.43G
850M
/myfirstpool
584M
5.43G
584M
/mnt/floppy
1003M
5.43G
1003M
/myfirstpool/mysecondDS
Notice the dataset has been automatically unmounted and remounted at the new location for you
and once again the change is effective immediately. If the indicated mountpoint would not be
empty ZFS is smart enough to warn you and to not remount it.
PROPERTY
sharenfs
VALUE
SOURCE
rw=@192.168.1.0/24
local
# cat /etc/dfs/sharetab
/myfirstpool/myfirstDS
nfs
rw=@192.168.1.0/24
The syntax and behaviour is similar to what is found under Solaris 11: zfs share'
reads and updates entries coming from the file /etc/dfs/sharetab (not
Important: /etc/exports). This is a Solaris touch: under Solaris 11 the zfs and share
commands now acts on /etc/dfs/sharetab, /etc/dfs/dfstab being no longer
supported.
By a checking with the showmount command:
# showmount -e
Export list for .... :
/myfirstpool/myfirstDS 192.168.1.0/24
At this point it should be possible to mount the dataset from another host on the network (here a
Solaris 11 machine) and write some data in it:
# mkdir -p /mnt/myfirstDS
# mount 192.168.1.19:/myfirstpool/myfirstDS /mnt/myfirstDS
#
/mnt/myfirstDS on 192.168.1.19:/myfirstpool/myfirstDS
remote/read/write/setuid/devices/rstchown/xattr/dev=89c0002 on Sun Mar
9 14:28:55
2014
# cp /kernel/amd64/genunix /mnt/myfirstDS
Et voila!No sign of protest so the file has been copied. If we check what the ZFS dataset looks
like on the Linux host where the ZFS dataset resides, the copied file (a Solaris kernel image here)
is present:
# ls -l /myfirstpool/myfirstDS/genunix
-rwxr-xr-x 1 root root 5769456 Mar
9 14:32 /myfirstpool/myfirstDS/genunix
$100 question: How to "unshare" the dataset? Simple: just set sharenfs to off! Be aware that the
NFS server will cease to share the dataset no matter if this one is still in use by client machines.
Any NFS client still having the dataset mounted at this point will encounter RPC errors
whenever an I/O operation is attempted on the share (Solaris NFS client here):
# ls /mnt/myfirstDS
NFS compound failed for server 192.168.1.19: error 7 (RPC: Authentication error)
Quoting the zfs command's manual page, your Samba server must also be configured like this:
Samba will need to listen to 'localhost' (127.0.0.1) for the zfs utilities to communicate
with samba. This is the default behaviour for most Linux distributions.
Samba must be able to authenticate a user. This can be done in a number of ways,
depending on if using the system password file, LDAP or the Samba specific smbpasswd
file. How to do this is outside the scope of this manual. Please refer to the smb.conf(5)
manpage for more information.
See the USERSHARE section of the smb.conf(5) man page for all configuration options
in case you need to modify any options to the share afterwards. Do note that any changes
done with the 'net' command will be undone if the share is every unshared (such as at a
reboot etc). In the future, ZoL will be able to set specific options directly using
sharesmb=<option>.
What you have to know at this point is that, once emerged on your Funtoo box, Samba has no
configuration file thus will refuse to start. You can use the provided example file
/etc/samba/smb.conf.example as a starting point for /etc/samba/smb.conf, just copy it:
# cd /etc/samba
# cp smb.conf.example smb.conf
Now create the directory /var/lib/samba/usershares (will host the definitions of all usershares),
leaving default permissions (0755) and owner (root:root) untouched for the context of this
tutorial, unless you use ZFS delegation, is acceptable.
# mkdir /var/lib/samba/usersharesusershare
Several important things to know unless you have hours to waste with your friend Google:
When you set the sharesmb property to on, the zfs command will invoke Samba's net
command behind the scenes to create a usershare (comment and ACL are values are both
specified). E.g. zfs sharesmb=on myfirstpool/myfirstDS => net usershare add
myfirstpool_myfirstDS /myfirstpool/myfirstDS "Comment:/myfirstpool/myfirstDS"
"Everyone:F" guest_ok=n
Under which user the net usershare command will be invoked? Unless ZFS delegation is
used, root will be the owner of the usershare created by root which is specified in a
textual file (named after the usershare's name) located in the directory
/var/lib/samba/usershares. There is per Samba requirement three very important details
about the directory /var/lib/samba/usershares :
o Its owner must be root , the group is of secondary importance and left to your
discretion
o Its permissions must be 1775 (so owner = rwx, group = rwx, others = r-x with
sticky bit armed).
o If the directory is not set as above Samba will simply ignore any usershares you
define so if you have errors like BAD_NETWORK_NAME when connecting a
usershare created by ZFS double check the owner and permissions set for
/var/lib/samba/usershares or the directory you use on your Funtoo box to hold
usershares definition...
Unless explicitly overridden in /etc/samba/smb.conf:
o usershare max shares default value is zero so no usershare can be created. If
you forget to set a value greater than zero for usershare max shares any zfs set
sharesmb=on command will complain with the message cannot share (...) smb
add share failed (also any net usershare add command will show the error
message net usershare: usershares are currently disabled).
o usershare path = /var/lib/samba/usershares
o usershare owner only is set to true by default so Samba will refuse the share to
any remote user not opening a session as root on the share
This configuration is obviously for the sake of demonstration purposes within the
Warning: scope of this tutorial, do not use it for the real world!
At this point reload or restart Samba if you have altered /etc/samba/smb.conf. Now the
usershares are possible, let's share a ZFS dataset over Samba:
# zfs set sharesmb=on myfirstpool/myfirstDS
# zfs get sharesmb myfirstpool/myfirstDS
NAME
PROPERTY
VALUE
SOURCE
myfirstpool/myfirstDS
sharesmb
on
local
The command must return without any error message, if you have something like "cannot
share myfirstpool/myfirstDS smb add share failed" then usershares are not functional on
your machine (see the notes just above). Now a Samba usershare named after the zpool and the
dataset names should exist:
# net usershare list
myfirstpool_myfirstDS
# net usershare info myfirstpool_myfirstDS
[myfirstpool_myfirstDS]
path=/myfirstpool/myfirstDS
comment=Comment: /myfirstpool/myfirstDS
usershare_acl=Everyone:F,
guest_ok=n
Some statistics
It is not a secret to tell that a general trend in the IT industry is the exponential growth of data
quantities. Just thinking about the amount of data Youtube, Google or Facebook generates every
day taking the case of the first some statistics gives:
24 hours of video is generated every minute in March 2010 (May 2009 - 20h / October
2008 - 15h / May 2008 - 13h)
More than 2 billions views a day
More video is produced on Youtube every 60 days than 3 major US broadcasting
networks did in the last 60 years
over 900 million objects that people interact with (pages, groups, events and community
pages)
Average user creates 90 pieces of content each month (750 millions users active)
More than 2.5 million websites have integrated with Facebook
What is true with Facebook and Youtube is also true with many other cases (think one minutes
about the amount of data stored in iTunes) especially with the growing popularity of cloud
computing infrastructures. Despite the progress of the technology a "bottleneck" still exists: the
storage reliability is nearly the same over the years. If only one organization in the world
generate huge quantities of data it would be the CERN (Conseil Europen pour la Recherche
Nuclaire, now officially known as European Organization for Nuclear Research) as their
experiments can generate spikes of many terabytes of data within a few seconds. A study done in
2007 quoted by a ZDNet article reveals that:
Overall this means: 22 corrupted files (1 in every 1500 files) for a grand total of 33700 files
holding 8.7TB of data. And this study is 5 years old....
Cheap controller or buggy driver that does not reports errors/pre-failure conditions to the
operating system;
"bit-leaking": an harddrive consists of many concentric magnetic tracks. When the hard
drive magnetic head writes bits on the magnetic surface it generates a very weak
magnetic field however sufficient to "leak" on the next track and change some bits.
Drives can generally, compensate those situations because they also records some error
correction data on the magnetic surface
magnetic surface defects (weak sectors)
Hard drives firmware bugs
Cosmic rays hitting your RAM chips or hard drives cache memory/electronics
| Disk_1
| Disk_2
| Disk_3
| [D1_S2]
| [D2_S1] | [D2_S2]
The parity is simply computed by XORing the stripes of the same "row", thus giving the general
equation:
D0_S0 = 1011
D0_S1 = 0010
D0_S2 = <missing>
D0_P = 0110
D0_S0 XOR D0_S1 XOR D0_S2 XOR D0_P = 0000 also rewritten as:
D0_S2 = D0_S1 XOR D0_S2 XOR D0_P
Applying boolean algebra it gives: D0_S2 = 1011 XOR 0010 XOR 0110 = 1111. Proof: 1011
XOR 0010 XOR 1111 = 0110 this is the same as D0_P
'So what's the deal?' Okay now the funny part, forgot the above hypothesis and imagine we have
this:
D0_S0 = 1011
D0_S1 = 0010
D0_S2 = 1101
D0_P = 0110
Applying boolean algebra magics gives 1011 XOR 0010 XOR 1101 => 0100. Problem: this is
different of D0_P (0110). Can you tell which one (or which ONES) of the four terms lies? If you
find a mathematically acceptable solution, found your company because you have just solved a
big computer science problem. If humans can't solve the question, imagine how hard it is for the
poor little RAID-5 controller to determine which stripe is right and which one lies and the
resulting "datageddon" (i.e. massive data corruption on the RAID-5 array) when the RAID-5
controller detect error and start to rebuild the array.
This is not science fiction, this a pure reality and the weakness stays in the RAID-5 simplicity.
Here is how it can happen: an urban legend with RAID-5 arrays is that they update stripes in an
atomic transaction (all of the stripes+parity are written or none of them). Too bad, this is just not
true, the data is written on the fly and if for a reason or another the machine where the RAID-5
array has a power outage or crash, the RAID-5 controller will simply have no idea about what he
was doing and which stripes are up to date which ones are not up to date. Of course, RAID
controllers in servers do have a replaceable on-board battery and most of the time the server they
reside in is connected to an auxiliary source like a battery-based UPS or a diesel/gas electricity
generator. However, Murphy laws or unpredictable hazards can, sometimes, happens....
Another funny scenario: imagine a machine with a RAID-5 array (on UPS this time) but with
non ECC memory. the RAID-5 controller splits the data buffer in stripes, computes a data stripe
and starts to write them on the different disks of the array. But...but...but... For some odd reason,
only one bit in one of the stripes flips (cosmic rays, RFI...) after the parity calculation. Too bad
too sad, one of the written stripes contains corrupted data and it is silently written on the array.
Datageddon in sight!
Not to make you freaking: storage units have sophisticated error correction capability (a
magnetic surface or an optical recording surface is not perfect and reading/writing error occurs)
masking most the cases. However, some established statistics estimates that even with error
correction mechanism one bit over 10^16 bits transferred is incorrect. 10^16 is really huge but
unfortunately in this beginning of the XXIst century with datacenters brewing massive amounts
of data with several hundreds to not say thousands servers this this number starts to give
headaches: a big datacenter can face to silent data corruption every 15 minutes (Wikepedia).
No typo here, a potential disaster may silently appear 5 times an hour for every single day of the
year. Detection techniques exists but traditional RAID-5 arrays in them selves can be a problem.
Ironic for a so popular and widely used solution :)
If RAID-5 was an acceptable trade-off in the past decades, it simply made its time. RAID-5 is
dead? *Horray!*