License

Copyright (C) 2008-2021 Oliver Bohlen.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.

A copy of the license is included in the section entitled "GNU Free Documentation License".

Introduction

This documentation comes with ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law.

Howto: File deduplication for Gentoo Linux

If you have a large fileserver or something else with many users some files could be saved multiple times in different locations which eats useless space.
With the following script you can find similar files and create automatically hardlinks on them for saving disk space. Please be very careful with this and think about if a file is hard linked and you cange it all other files linked to this file are changed too because they are the same file for the filesystem (same Inode).
I use this for for my complete system backups.

If you want to use this solution you need the following howto(s) finished:

Required software

The required software has to be installed with the following command(s):
emerge app-misc/fdupes

Changes in /usr/local/sbin/deduplicate.pl

File permissions:
Owner: root
Group: root
Permissions: -rwx------

Click here for a download of the complete file: /usr/local/sbin/deduplicate.pl

Changed on 29.04.10
Issued by olli
Beginning line 2

This script finds duplicate files and creates hardlinks on them (file deduplication). Be very careful with this!
Think about that if you change one file the linked file will be changed too.

#!/usr/bin/perl -w

# Usage: deduplicate.pl <Dir1> [dir2] [...]

# ToDo: Add a DryRun (Print only the files which will be linked and not link them)

#foreach $a (@ARGV) {
# @dirlist=`find $a -type d`;
# foreach $b (@dirlist) {
#  chomp($b);
#  push(@list,$b);
# }
#}

@duplicates=`fdupes -q -r @ARGV`;
$new=1;
foreach $file (@duplicates) {
 chomp($file);
 unless ($file) {
  $new=1;
  next;
 }
 if ($new) {
  $sourcefile=$file;
  $new=0;
  next;
 }
 print "ln -f $sourcefile $file\n";
 `ln -f  $sourcefile $file`;
}

Please send a feedback to: doc<at>gabosh.net

Howto listing
File Index

Here you can find the official Gentoo Linux Forums where you can find a lot of answers.

Here a link to the official Gentoo Linux Homepage.

Edit Howto

About / Impressum

Click here for About / Impressum

Wishlist

If you want to support my work you can find my Amazon whishlist here