I research ZFS with deduplication running on FreeBSD.
In
computing,
data deduplication is a specialized
data compression technique for eliminating duplicate copies of repeating data. Related and somewhat synonymous terms are
intelligent (data) compression and
single-instance (data) storage.
The technique is used to improve storage utilization and can also be
applied to network data transfers to reduce the number of bytes that
must be sent. In the deduplication process, unique chunks of data, or
byte patterns, are identified and stored during a process of analysis.
As the analysis continues, other chunks are compared to the stored copy
and whenever a match occurs, the redundant chunk is replaced with a
small reference that points to the stored chunk. Given that the same
byte pattern may occur dozens, hundreds, or even thousands of times (the
match frequency is dependent on the chunk size), the amount of data
that must be stored or transferred can be greatly reduced.
http://en.wikipedia.org/wiki/Data_deduplication
My box is running FreeBSD 9.1 R.
root@skyline:/root # uname -v
FreeBSD 9.1-RELEASE #0 r243825: Tue Dec 4 09:23:10 UTC 2012 root@farrell.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC
root@skyline:/root # zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
zroot 1.81T 1.49G 1.81T 0% 1.00x ONLINE -
root@skyline:/root #
root@skyline:/root # zfs get all zroot
NAME PROPERTY VALUE SOURCE
zroot type filesystem -
zroot creation Fri Jan 11 16:52 2013
zroot used 3.55G -
zroot available 1.78T -
zroot referenced 373M -
zroot compressratio 1.50x -
zroot mounted yes -
zroot quota none default
zroot reservation none default
zroot recordsize 128K default
zroot mountpoint legacy local
zroot sharenfs off default
zroot checksum fletcher4 local
zroot compression off default
zroot atime on default
zroot devices on default
zroot exec on default
zroot setuid on default
zroot readonly off default
zroot jailed off default
zroot snapdir hidden default
zroot aclmode discard default
zroot aclinherit restricted default
zroot canmount on default
zroot xattr off temporary
zroot copies 1 default
zroot version 5 -
zroot utf8only off -
zroot normalization none -
zroot casesensitivity sensitive -
zroot vscan off default
zroot nbmand off default
zroot sharesmb off default
zroot refquota none default
zroot refreservation none default
zroot primarycache all default
zroot secondarycache all default
zroot usedbysnapshots 0 -
zroot usedbydataset 373M -
zroot usedbychildren 3.19G -
zroot usedbyrefreservation 0 -
zroot logbias latency default
zroot dedup off default
zroot mlslabel -
zroot sync standard default
zroot refcompressratio 1.00x -
zroot written 373M -
root@skyline:/root #
Because the dedup function is turning off, so I have to make it on.
root@skyline:/root # zfs set dedup=on zroot
root@skyline:/root # zfs get compression,dedup zroot
NAME PROPERTY VALUE SOURCE
zroot compression off local
zroot dedup on local
root@skyline:/root #
From that information the dedup musbe on. I am making simulation that the dedup is working.
root@skyline:/ # cd /home/
root@skyline:/home # ls
root@skyline:/home # mkdir Testdedup1
root@skyline:/home # mkdir Testdedup2
root@skyline:/home # mkdir Testdedup3
root@skyline:/home #
root@skyline:/home # du -hs Testdedup*
1.5k Testdedup1
1.5k Testdedup2
1.5k Testdedup3
root@skyline:/home # df -h
Filesystem Size Used Avail Capacity Mounted on
zroot 1.8T 372M 1.8T 0% /
devfs 1.0k 1.0k 0B 100% /dev
zroot/tmp 1.8T 35k 1.8T 0% /tmp
zroot/usr 1.8T 377M 1.8T 0% /usr
zroot/usr/ports 1.8T 406M 1.8T 0% /usr/ports
zroot/usr/src 1.8T 358M 1.8T 0% /usr/src
zroot/var 1.8T 6.3M 1.8T 0% /var
zroot/var/empty 1.8T 31k 1.8T 0% /var/empty
zroot/var/run 1.8T 59k 1.8T 0% /var/run
zroot/var/tmp 1.8T 32k 1.8T 0% /var/tmproot@skyline:/home # du -hs /home
6.0k /home
root@skyline:/home #
root@skyline:/root # zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
zroot 1.81T 1.49G 1.81T 0% 1.00x ONLINE -
root@skyline:/root #
As we can see, that the free ALLOC is 1.49 TB. I will copy the 2.2 GB to the zroot, and other Testdedup folder.
root@skyline:/root # zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
zroot 1.81T 3.61G 1.81T 0% 1.00x ONLINE -
root@skyline:/root #
root@skyline:/home # du -hs *
2.1G Testdedup1
1.5k Testdedup2
1.5k Testdedup3
root@skyline:/home #
root@skyline:/home # cp Testdedup1/FreeBSD-8.2-RELEASE-i386-dvd1.iso Testdedup2/
root@skyline:/home # du -hs *
2.1G Testdedup1
2.1G Testdedup2
1.5k Testdedup3
root@skyline:/home # zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
zroot 1.81T 3.62G 1.81T 0% 2.00x ONLINE -
root@skyline:/home #
I copy to another Testdedup3
cp Testdedup1/FreeBSD-8.2-RELEASE-i386-dvd1.iso Testdedup3/
root@skyline:/home # du -hs *
2.1G Testdedup1
2.1G Testdedup2
2.1G Testdedup3
root@skyline:/home #
How about if I rename the source file to FreeBSD-8.2-RELEASE-i386-dvd1.iso.renamefile
root@skyline:/home # mv Testdedup1/FreeBSD-8.2-RELEASE-i386-dvd1.iso Testdedup1/FreeBSD-8.2-RELEASE-i386-dvd1.iso.renamefile
root@skyline:/home # ls
TestRenameDedup1 Testdedup1 Testdedup2 Testdedup3
root@skyline:/home # du -hs *
1.5k TestRenameDedup1
2.1G Testdedup1
2.1G Testdedup2
2.1G Testdedup3
root@skyline:/home # cp Testdedup1/FreeBSD-8.2-RELEASE-i386-dvd1.iso.renamefile TestRenameDedup1/
root@skyline:/home # du -hs *
2.1G TestRenameDedup1
2.1G Testdedup1
2.1G Testdedup2
2.1G Testdedup3
root@skyline:/home #
root@skyline:/home # ls -al TestRenameDedup1/
total 2225265
drwxr-xr-x 2 root wheel 3 Jan 14 11:43 .
drwxr-xr-x 6 root wheel 6 Jan 14 11:42 ..
-rw-r--r-- 1 root wheel 2276931584 Jan 14 11:44 FreeBSD-8.2-RELEASE-i386-dvd1.iso.renamefile
root@skyline:/home #
root@skyline:/home # zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
zroot 1.81T 3.62G 1.81T 0% 4.00x ONLINE -
root@skyline:/home #
As we can see that, the ALLOC is still 3.62 GB
Note :
from man zpool hereis the some of explanation.
alloc Amount of storage space within the pool that has been physi-
cally allocated.
capacity Percentage of pool space used. This property can also be
referred to by its shortened column name, "cap".
dedupratio The deduplication ratio specified for a pool, expressed as a
multiplier. For example, a value of 1.76 indicates that 1.76
units of data were stored but only 1 unit of disk space was
actually consumed. See zfs(8) for a description of the dedu-
plication feature.
free Number of blocks within the pool that are not allocated.
size Total size of the storage pool.