Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
merging directories replacing duplicates with links
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks
View previous topic :: View next topic  
Author Message
barrymac
Tux's lil' helper
Tux's lil' helper


Joined: 15 Jul 2004
Posts: 87

PostPosted: Mon May 29, 2006 1:33 am    Post subject: merging directories replacing duplicates with links Reply with quote

Hello all,

I would like to find a quick way to scan for file duplicates using an md5sum and replace duplicates found with symlinks, thereby maintaining the relative semantics without duplicating actual file content. Perhaps rsync's hashing algorithm is more efficient and adequate?

I'm sure this situation has come up for many people before. I just got a new file server with enough capacity to put everything in one place. So I'd like to merge directories from several machines into the new server but I have lots of duplication across them.

I was wondering if anyone had a quick strategy or even know of a package that does the job. I'm think that using the ouput of fdupes would be one approach but my scripting is basic. Is this a good excuse to learn some python?

Somehow I think a filesystem plugin for reiser4 might be a nice way to achieve this. I imagine it would keep a database of hashes so if you ever tried saving the same file somewhere else in the system it would only create a link rather than saving it again. This would be useful in a system with many users which may have their own copies but you want to minimise storage. I would imagine Google might use something like this.

Thanks in advance for any help.
Back to top
View user's profile Send private message
barrymac
Tux's lil' helper
Tux's lil' helper


Joined: 15 Jul 2004
Posts: 87

PostPosted: Mon May 29, 2006 3:22 am    Post subject: Solved Reply with quote

I was looking in the wrong places.

The tool to use is called fslint, a very handy program for removing as they call it 'lint' from a file system and it will replace file duplicates with hardlinks.

I think that'll solve my problem!

http://www.pixelbeat.org/fslint/
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum