Forums

Skip to content

Advanced search
  • Quick links
    • Unanswered topics
    • Active topics
    • Search
  • FAQ
  • Login
  • Register
  • Board index Discussion & Documentation Gentoo Chat
  • Search

RAID vs rsync. Your preferences, experiences?

Opinions, ideas and thoughts about Gentoo. Anything and everything about Gentoo except support questions.
Post Reply
  • Print view
Advanced search
30 posts
  • 1
  • 2
  • Next
Author
Message
pjp
Administrator
Administrator
User avatar
Posts: 20668
Joined: Tue Apr 16, 2002 10:35 pm

RAID vs rsync. Your preferences, experiences?

  • Quote

Post by pjp » Fri Aug 21, 2020 4:19 pm

For home use, RAID seems like a layer of complexity that I'm not sure is worth the benefit. Although I do use LVM sometimes for the convenience of more flexible management of volumes. RAID and LVM is even more complexity, and I'm unsure about LVM's built-in RAID abilities. Ultimately it is still two layers of complexity.

For a bulk data repository using HDD (not OS, other than backups), I was planning to do a two disk mirror, possibly extending it to a third disk to have two mirrored copies.

But now I'm leaning toward rsync. The main disadvantage I see with rsync would be the third copy and more reads causing extra wear on the source disk. I haven't used batch mode, so it isn't immediately clear that would address that concern.

Any thoughts or other solutions?
Quis separabit? Quo animo?
Top
alamahant
Advocate
Advocate
Posts: 4032
Joined: Sat Mar 23, 2019 12:12 pm

  • Quote

Post by alamahant » Fri Aug 21, 2020 4:32 pm

Rsync is perfect.
I have an rsync invocation in my daily update script,before the emerge part.
So in case something goes wrong with the state of the my machine post-update I just revert.
I use this formula

Code: Select all

rsync -aAXv --delete --exclude={"/dev/*","/proc/*","/sys/*","/tmp/*","/run/*","/mnt/*","/media/*","/lost+found","/home/<user>/shared/*","/home/<user>/ssd/*","/boot/efi/*"} / /mnt/
I mount an lvm partition in /mnt
I love it.
:D
Top
Anon-E-moose
Watchman
Watchman
User avatar
Posts: 6566
Joined: Fri May 23, 2008 7:31 pm
Location: Dallas area

  • Quote

Post by Anon-E-moose » Fri Aug 21, 2020 4:34 pm

My nas/ws (small) has 2 usb3 raid boxes (4 tb dual drive, mirrored) one for backup of all linux machines, portage, distfiles, pkgs, etc, the other is windows backup, media (music and movies), I use rsync for the transfer for the backups. I just retired two raid boxes that were older (drive max of 3tb) so I merged them into one of the 4tb new ones. All 4 of the retired drives (wd red 2tb) have no problems (after several years of daily backups) and I'll re-purpose them for some type storage. Given the life time of drives now (hd and ssd) I wouldn't worry about any problems related to rsync and longevity.
UM780 xtx, 6.18 zen kernel, gcc 15, openrc, wayland
minixforum m1-s1 max -- same software as above but used for ai learning


Zealots are gonna be zealots, just like haters are gonna be haters
Top
Banana
Administrator
Administrator
User avatar
Posts: 2379
Joined: Fri May 21, 2004 12:02 pm
Location: Germany
Contact:
Contact Banana
Website

  • Quote

Post by Banana » Fri Aug 21, 2020 5:47 pm

have a look at http://moo.nac.uci.edu/~hjm/parsync/
Forum Guidelines

PFL - Portage file list - find which package a file or command belongs to.
My delta-labs.org snippets do expire
Top
steve_v
Guru
Guru
Posts: 445
Joined: Sun Jun 20, 2004 7:39 am
Location: New Zealand

Re: RAID vs rsync. Your preferences, experiences?

  • Quote

Post by steve_v » Fri Aug 21, 2020 7:11 pm

pjp wrote:The main disadvantage I see with rsync would be the third copy and more reads causing extra wear on the source disk.
The main disadvantage I would see with rsync is that when a disk fails, anything accessing it falls on it's face until redirected to one of the rsync mirrors somehow. With RAID that would be entirely transparent to applications.
pjp wrote:Any thoughts
Immediate thought: Rsync is for backups and replication, RAID is for keeping things running until you can swap out failed hardware. Maybe I'm misconstruing your intent though.
If what you want is [network]replication and/or snapshots, rather than uptime and redundancy, rsync is certainly the more flexible solution.
If you want all of the above in one solution there's always ZFS, which I hereby shamelessly plug yet again because it's awesome.
Once is happenstance. Twice is coincidence. Three times is enemy action. Four times is Official GNOME Policy.
Top
pjp
Administrator
Administrator
User avatar
Posts: 20668
Joined: Tue Apr 16, 2002 10:35 pm

  • Quote

Post by pjp » Fri Aug 21, 2020 10:52 pm

alamahant wrote:Rsync is perfect.
I have an rsync invocation in my daily update script,before the emerge part.
So in case something goes wrong with the state of the my machine post-update I just revert.
I use this formula

Code: Select all

rsync -aAXv --delete --exclude={"/dev/*","/proc/*","/sys/*","/tmp/*","/run/*","/mnt/*","/media/*","/lost+found","/home/<user>/shared/*","/home/<user>/ssd/*","/boot/efi/*"} / /mnt/
I mount an lvm partition in /mnt
I love it.
:D
Have you tried using lvm snapshosts and restoring from those? I tested it briefly in a VM once. Seemed to work, but I needed to refine the process for repeatability. More specifically I think there was a minor issue with not reverting to the previous kernel.

For the issue specifically relating to this thread, rsync would be use for "whole disk" syncing within a single host. Although eventually I'll be updating from at least one other host to parts of the primary target disk, which would then do the whole disk sync.

Anon-E-moose wrote:I wouldn't worry about any problems related to rsync and longevity.
Other than human error, I'm not. but /path/dir/ vs /path/dir is really annoying. I'll have to come up with some means to test any updates to prevent blowing away the entire disk. That seems more fragile than RAID.


@Banana:

Interesting, thanks. They seem to slightly discourage copying within the same host, but they do suggest fpsync, which is part of fpart, so that may be useful.

steve_v wrote:The main disadvantage I would see with rsync is that when a disk fails, anything accessing it falls on it's face until redirected to one of the rsync mirrors somehow. With RAID that would be entirely transparent to applications.
At least initially, I don't think that will be a big issue. Other than the rsync mirroring within the host, and possibly some automated syncs from clients, I don't think that is going to be a big problem. In some ways, that may be better. If monitoring isn't working well enough, the disk failure could be missed for a longer period of time. Or at least I'd like to think I'd notice the lack of response. But that would only be the case if the primary disk failed. Hmm. I'll have to think about that some more as it relates to monitoring. Good reminder.
steve_v wrote:Immediate thought: Rsync is for backups and replication, RAID is for keeping things running until you can swap out failed hardware. Maybe I'm misconstruing your intent though.
If what you want is [network]replication and/or snapshots, rather than uptime and redundancy, rsync is certainly the more flexible solution.
If you want all of the above in one solution there's always ZFS, which I hereby shamelessly plug yet again because it's awesome.
Others disagree, but I consider RAID to be the first backup. A type of "hot" backup. If you have a single disk and it fails, well, enterprise backup solutions often have holes in them. In that situation, RAID doesn't protect against accidental deletions or any other live activity such as a virus. That's where other backups come into play. As in any situation, "what is the risk you are protecting against"?

In my case, service availability isn't the top priority. If a disk fails, then I need to fix that, or at least have the disk on the way (and why I'm thinking of having the 2 mirror disks).

At least with my initial expectations, I have no plans of snapshots using rsync. I don't think rsync alone is the correct tool for the job, and I don't know that I care to try customizing a tool above it.

Eventually my plan is to test a system with ZFS (I prefer it), but I'm hesitant to deal with the kernel patching. I had originally wanted to do that using the disks I'm talking about in this thread, but various roadblocks keep getting in the way, so I'm just "getting it done" and will figure out what the next iteration looks like. I'm even considering one of the "minis" from iXsystems.


Thanks for the feedback!
Quis separabit? Quo animo?
Top
Anon-E-moose
Watchman
Watchman
User avatar
Posts: 6566
Joined: Fri May 23, 2008 7:31 pm
Location: Dallas area

  • Quote

Post by Anon-E-moose » Fri Aug 21, 2020 11:18 pm

pjp wrote:
Anon-E-moose wrote:I wouldn't worry about any problems related to rsync and longevity.
Other than human error, I'm not. but /path/dir/ vs /path/dir is really annoying. I'll have to come up with some means to test any updates to prevent blowing away the entire disk. That seems more fragile than RAID.
Yeah, I agree about the whole [dir|dir/] thing although I suppose its the way it is because you can make it do one of two things depending on the trailing /


So what I do is whenever I'm not sure about what will happen, I use the "-n" dry-run flag, it will show you what rsync would do.
UM780 xtx, 6.18 zen kernel, gcc 15, openrc, wayland
minixforum m1-s1 max -- same software as above but used for ai learning


Zealots are gonna be zealots, just like haters are gonna be haters
Top
pjp
Administrator
Administrator
User avatar
Posts: 20668
Joined: Tue Apr 16, 2002 10:35 pm

  • Quote

Post by pjp » Sat Aug 22, 2020 1:26 am

That it can is fine, but it reminds me of being able to rm -rf / without intentionally meaning to do so. I believe that's been updated to not allow that. I use -n as well, but its output isn't as blatantly obvious as I'd like. I just need to memorize the difference in some explicit manner, then pause to consider before each run. Also, I've rarely used rsync, so using it as a common tool is still pretty new to me and why I don't consider it a suitable replacement for scp. Unfortunately sftp isn't either.
Quis separabit? Quo animo?
Top
Anon-E-moose
Watchman
Watchman
User avatar
Posts: 6566
Joined: Fri May 23, 2008 7:31 pm
Location: Dallas area

  • Quote

Post by Anon-E-moose » Sat Aug 22, 2020 9:46 am

I find rsync less confusing (on what it's doing) if I use the -i flag

Code: Select all

$ head root-link.log 
.d..t...... ./
*deleting   etc/portage/patches/media-video/makemkv/notify_linux.patch.14.2
*deleting   etc/portage/patches/media-video/makemkv/configure.patch.14.6
*deleting   etc/portage/patches/media-video/mkvtoolnix/qt5-m4.patch.old
*deleting   etc/portage/patches/media-video/mkvtoolnix/qt5-configure.patch.old
*deleting   etc/portage/patches/media-video/mkvtoolnix/qt-disable-dbus.patch
*deleting   etc/portage/patches/media-video/mkvtoolnix/configure.patch.old
.d..t...... bin/
.d..t...... dev/
--itemize-changes, -i output a change-summary for all updates

Edit to add: Trailing slash vs none
Trailing slash says copy all files under what 1st arg points to
No trailing slash says copy what 1st arg points to (including 1st arg name)

DirA
File1
File2

rsync -aix DirA/ nas::tmp/test

this would copy File1 and File2 to test but not DirA

rsync -aix DirA nas::tmp/test

this would copy DirA w/Files to test

and using -naix it would clearly show the above.
Last edited by Anon-E-moose on Sat Aug 22, 2020 10:51 am, edited 1 time in total.
UM780 xtx, 6.18 zen kernel, gcc 15, openrc, wayland
minixforum m1-s1 max -- same software as above but used for ai learning


Zealots are gonna be zealots, just like haters are gonna be haters
Top
krinn
Watchman
Watchman
User avatar
Posts: 7476
Joined: Fri May 02, 2003 6:14 am

  • Quote

Post by krinn » Sat Aug 22, 2020 9:47 am

i think you mistake raid for backup
mirroring is not made as a backup, but as safety to failure
the raw idea is having disk1,disk2 or disk3 to repair the array to match other disks content, this "might" be seen as backup solution, but it's not a real backup solution.
if you crush file1, file1 will be crushed on all disks, and there's no way you can recover file1, that's a backup task not a raid task
however unlike backup, if you were working on file1 and someone shoot your computer, backup will only help you recovering lastest saved version of file1, while raid will come to help to restore it to the freshest version possible.

or are you saying using raid in the backup source as a safety to backup errors?
Top
Naib
Watchman
Watchman
User avatar
Posts: 6101
Joined: Fri May 21, 2004 9:42 pm
Location: Removed by Neddy
Contact:
Contact Naib
Website

  • Quote

Post by Naib » Sat Aug 22, 2020 10:56 am

krinn wrote:i think you mistake raid for backup
mirroring is not made as a backup, but as safety to failure
the raw idea is having disk1,disk2 or disk3 to repair the array to match other disks content, this "might" be seen as backup solution, but it's not a real backup solution.
if you crush file1, file1 will be crushed on all disks, and there's no way you can recover file1, that's a backup task not a raid task
however unlike backup, if you were working on file1 and someone shoot your computer, backup will only help you recovering lastest saved version of file1, while raid will come to help to restore it to the freshest version possible.

or are you saying using raid in the backup source as a safety to backup errors?
Bingo! I was about to post this. RAID is not a backup. RAID is a layer of data protection between backup cycles (or for speed).

So the question is ... what do you want? redundancy or backup. Rsync is great for backup, especially over ssh to backup a headless for instance. For redundancy... sure RAID or one of hte fancier filesystem (zfs, btrfs)
#define HelloWorld int
#define Int main()
#define Return printf
#define Print return
#include <stdio>
HelloWorld Int {
Return("Hello, world!\n");
Print 0;
Top
sitquietly
Apprentice
Apprentice
User avatar
Posts: 153
Joined: Sat Oct 23, 2010 9:20 pm
Location: On the Wolf River, Tennessee

Re: RAID vs rsync. Your preferences, experiences?

  • Quote

Post by sitquietly » Sat Aug 22, 2020 5:50 pm

steve_v wrote:The main disadvantage I would see with rsync is that when a disk fails, anything accessing it falls on it's face until redirected to one of the rsync mirrors somehow. With RAID that would be entirely transparent to applications ..... Rsync is for backups and replication, RAID is for keeping things running until you can swap out failed hardware ..... there's always ZFS, ... it's awesome.
I had once set up a system as the OP suggested, with the redundant disks being updated via rsync rather than being kept in sync via raid. It was very flexible. But I consider an on-machine backup to be no backup at all and always back everything to other hosts (i.e. backup servers). So given that a network backup must always be kept, everything on my work computers must be backed up elsewhere, the rsync'ed drive didn't take advantage of of its potential.

ZFS was the perfect use for multiple data disks! Now I keep all of my working stuff on a zfs mirror and the backup on another computer via rsync. I've been through several OS changes (FreeBSD -> Debian -> Gentoo/Calculate) and my files have been always available and always safe. I also always keep the OS on its own ssd in a hot swap bay. Operating Systems change all the time but my data lives forever! A workstation may not need raid for the OS -- if that disk dies a complete re-install and restore from backup would take very little time. I've never actually had an ssd die. I've got an Intel X25-E 32 gb SLC ssd from 2009 still in service. It has seen a lot of throughput in the past decade. And a SanDisk Extreme 120 gb ssd from 2012. And Samsung and Crucial ssd's from 2014 to 2019. I try to kill them compiling software for Gentoo, FreeBSD, and OpenBSD but they have kept working.
Top
pjp
Administrator
Administrator
User avatar
Posts: 20668
Joined: Tue Apr 16, 2002 10:35 pm

  • Quote

Post by pjp » Sat Aug 22, 2020 10:49 pm

@Anon-E-moose:

Thanks for the mention of itemize. I think I'd seen that before but had forgotten about it.

I think the major risk is with tab completion, which adds the trailing /.

So /data/adir to /newplace/bdir/ would work as expected (bdir/adir), but an accidental adir/ could create a mess, especially if if using delete.

I just need to be careful and specific about what I intend to do. The time I mentioned wiping out a bunch of data was before I realized that behavior. Fortunately nothing significant was lost.


I completed temporary copies to two different drives within the same system. The downside is that it was very slow. The second copy summary is as follows:

Code: Select all

sent 624.67G bytes  received 30.10M bytes  19.64M bytes/sec
total size is 707.03G  speedup is 1.13
Unfortunately I don't think I have any crossover cables, so I'm not looking forward to seeing how slow it goes over the network.
Quis separabit? Quo animo?
Top
pjp
Administrator
Administrator
User avatar
Posts: 20668
Joined: Tue Apr 16, 2002 10:35 pm

  • Quote

Post by pjp » Sun Aug 23, 2020 12:00 am

krinn wrote:i think you mistake raid for backup
mirroring is not made as a backup, but as safety to failure
the raw idea is having disk1,disk2 or disk3 to repair the array to match other disks content, this "might" be seen as backup solution, but it's not a real backup solution.
if you crush file1, file1 will be crushed on all disks, and there's no way you can recover file1, that's a backup task not a raid task
however unlike backup, if you were working on file1 and someone shoot your computer, backup will only help you recovering lastest saved version of file1, while raid will come to help to restore it to the freshest version possible.

or are you saying using raid in the backup source as a safety to backup errors?
Naib wrote:Bingo! I was about to post this. RAID is not a backup. RAID is a layer of data protection between backup cycles (or for speed).

So the question is ... what do you want? redundancy or backup. Rsync is great for backup, especially over ssh to backup a headless for instance. For redundancy... sure RAID or one of hte fancier filesystem (zfs, btrfs)


As it relates to the thread title, the distinction was between redundant copies across two drives. One using RAID, the other using rsync. From an earlier reply:
pjp wrote:Others disagree, but I consider RAID to be the first backup. A type of "hot" backup. If you have a single disk and it fails, well, enterprise backup solutions often have holes in them. In that situation, RAID doesn't protect against accidental deletions or any other live activity such as a virus. That's where other backups come into play. As in any situation, "what is the risk you are protecting against"?
To expand on that, some people don't consider a backup to be sufficient if it is: a) in the same machine, b) at the same site or c) within the same region.

For points b) and c), the issue is of risk and cost. That's a value choice. However, while important, offsite redundant copies of (some) data are stale, likely by at least hours if not a day or more.

A backup in the same machine absolutely is a valid backup. The most obvious example would be snapshots. These are very much useful in the case of people accidentally intentionally deleting a file. It is typically recoverable in the fastest possible time. Some people may not consider that a backup, but it demonstrably is. In fact, given the possibility of open files, offsite backups are often incomplete.

I think the RAID is not a backup perspective is one of human timescale (nanosecond demonstration). Also related, but not specifically about backups: "What about the cost of that information? The cost of collecting data and information at the time of an event is very low. But the further you get away from it in time, the more it's costing you to store it and maintain it."

if you lose your only physical copy of those snapshots, well, your offsite copy of data on tape from last night isn't much help. So maybe instead of a backup, that should be called an incomplete offsite copy of data that might not contain what you wanted.
Quis separabit? Quo animo?
Top
Goverp
Advocate
Advocate
User avatar
Posts: 2402
Joined: Wed Mar 07, 2007 6:41 pm

  • Quote

Post by Goverp » Sun Aug 23, 2020 8:51 am

pjp wrote:...
Unfortunately I don't think I have any crossover cables, so I'm not looking forward to seeing how slow it goes over the network.
AFAIR you probably don't need crossover cables; if you have modern ethernet cards at either end, they'll sort the cable out themselves.
Greybeard
Top
Naib
Watchman
Watchman
User avatar
Posts: 6101
Joined: Fri May 21, 2004 9:42 pm
Location: Removed by Neddy
Contact:
Contact Naib
Website

  • Quote

Post by Naib » Sun Aug 23, 2020 9:45 am

pjp wrote:
krinn wrote:i think you mistake raid for backup
mirroring is not made as a backup, but as safety to failure
the raw idea is having disk1,disk2 or disk3 to repair the array to match other disks content, this "might" be seen as backup solution, but it's not a real backup solution.
if you crush file1, file1 will be crushed on all disks, and there's no way you can recover file1, that's a backup task not a raid task
however unlike backup, if you were working on file1 and someone shoot your computer, backup will only help you recovering lastest saved version of file1, while raid will come to help to restore it to the freshest version possible.

or are you saying using raid in the backup source as a safety to backup errors?
Naib wrote:Bingo! I was about to post this. RAID is not a backup. RAID is a layer of data protection between backup cycles (or for speed).

So the question is ... what do you want? redundancy or backup. Rsync is great for backup, especially over ssh to backup a headless for instance. For redundancy... sure RAID or one of hte fancier filesystem (zfs, btrfs)


As it relates to the thread title, the distinction was between redundant copies across two drives. One using RAID, the other using rsync. From an earlier reply:
pjp wrote:Others disagree, but I consider RAID to be the first backup. A type of "hot" backup. If you have a single disk and it fails, well, enterprise backup solutions often have holes in them. In that situation, RAID doesn't protect against accidental deletions or any other live activity such as a virus. That's where other backups come into play. As in any situation, "what is the risk you are protecting against"?
To expand on that, some people don't consider a backup to be sufficient if it is: a) in the same machine, b) at the same site or c) within the same region.

For points b) and c), the issue is of risk and cost. That's a value choice. However, while important, offsite redundant copies of (some) data are stale, likely by at least hours if not a day or more.
I totally agree (and I expected you to get it ;) just the usual google-foo and someone comes across something that implied RAID is synonymous with backup :)
Offsite backup is typically for insurance reasons (we have a separate "brick building" requirement for weekly tapes)

I also use a 2nd drive as a local copy/archive drive and yes it is valid. So the real question is do you want the expense (space, "complexity") of RAID or do you want the delay of rsync

1) RAID-1 (mirroring). Simple to setup but it is a 1:1 of the entire drive. do you want that considering an OS can be rebuilt but data can't? It is almost instantaneous and you gain double the read speed
2) RSYNC. Selectively target the data but must be executed and could take some time thus delaying shutdown
3) lsyncd. A daemon that uses inotify to sync a target/s directory from one place to another
#define HelloWorld int
#define Int main()
#define Return printf
#define Print return
#include <stdio>
HelloWorld Int {
Return("Hello, world!\n");
Print 0;
Top
krinn
Watchman
Watchman
User avatar
Posts: 7476
Joined: Fri May 02, 2003 6:14 am

  • Quote

Post by krinn » Sun Aug 23, 2020 11:41 am

Naib wrote:3) lsyncd. A daemon that uses inotify to sync a target/s directory from one place to another
Which appears a good idea, but at end, it doesn't look good
inotify report any modifitcations, which then will sync the delete/broken/bork file with the backup (aie), like mirroring, but you don't get mirroring speed

To me, the real solve is both : mirroring the datas with raid to protect them from local damage, and rsync those datas to another place.
the raid part need nothing, it will just works
the rsync will be slow only the first time, next time, it only copy the changes, which is fast (well, it's relative of course)
and the shutdown delay could be manage easy thru a script, ie: touch a file the script seek to know if he should shutdown after rsync or not.
Top
szatox
Advocate
Advocate
Posts: 3858
Joined: Tue Aug 27, 2013 12:35 pm

  • Quote

Post by szatox » Sun Aug 23, 2020 11:52 am

AFAIR you probably don't need crossover cables; if you have modern ethernet cards at either end, they'll sort the cable out themselves.
AFAIR modern ethernet devices (like in 1Gbps and faster) use all 4 pairs, in both directions, at the same time.
Crossover cables belong in the past.
(Some 100Mbps devices, notably from Intel, would randomly switch port's mode between straight/crossed until they could make sense of the noise on the wire)


My main box I setup almost a decade ago uses LVM on top of RAID1 as the mains storage, and copies the bits of data I'm particularly interested in to another disk with rsync.
Rsync has a really cool feature: it can reference another directory. In this mode you can copy modified files and hard-link unmodified files, effectively providing an incremental backup (it's just file-level, but it's still an amazing space-saver).
Scripted with a weekly rotation, I can always have 2 independent sets (sometimes called "cylinders") of backups (weekly full + daily incremental), so I wouldn't lose all 1 copies in case of a bad block in a rarely-modified file. (I doing a weekly full I always have at least 2 copies of those)
And I'm quite happy with this setup.

Also, I can remove any of the backups at any time without breaking all the other backups - because the hardlinked files remain accessible via the other paths, so the whole thing is very easy to manage. Just delete what you don't need anymore, and the space will be reclaimed once nothing references it anymore.
And since rsync can transparently run over network, it's very easy to deploy something like that across multiple machines... Including a dedicated backup server.
Top
Naib
Watchman
Watchman
User avatar
Posts: 6101
Joined: Fri May 21, 2004 9:42 pm
Location: Removed by Neddy
Contact:
Contact Naib
Website

  • Quote

Post by Naib » Sun Aug 23, 2020 1:01 pm

krinn wrote:
Naib wrote:3) lsyncd. A daemon that uses inotify to sync a target/s directory from one place to another
Which appears a good idea, but at end, it doesn't look good
inotify report any modifitcations, which then will sync the delete/broken/bork file with the backup (aie), like mirroring, but you don't get mirroring speed

To me, the real solve is both : mirroring the datas with raid to protect them from local damage, and rsync those datas to another place.
the raid part need nothing, it will just works
the rsync will be slow only the first time, next time, it only copy the changes, which is fast (well, it's relative of course)
and the shutdown delay could be manage easy thru a script, ie: touch a file the script seek to know if he should shutdown after rsync or not.
true,
The rsync could also be a 6h cronjob so it is little and often with a shutdown local service for a final sync
#define HelloWorld int
#define Int main()
#define Return printf
#define Print return
#include <stdio>
HelloWorld Int {
Return("Hello, world!\n");
Print 0;
Top
steve_v
Guru
Guru
Posts: 445
Joined: Sun Jun 20, 2004 7:39 am
Location: New Zealand

Re: RAID vs rsync. Your preferences, experiences?

  • Quote

Post by steve_v » Sun Aug 23, 2020 2:35 pm

sitquietly wrote:ZFS was the perfect use for multiple data disks! Now I keep all of my working stuff on a zfs mirror and the backup on another computer via rsync.
Not ZFS send/receive? :P

TBF, I too use rsync extensively, though in the other direction - rsync backups from several machines to a ZFS (RAIDZ6) fileserver, which snapshots and sends the filesystem (off-site) nightly.
Important live data on the fileserver gets snapshotted every 15 minutes locally for those "whoops" moments, and snapshots are presented via "previous versions" to windoze boxen via samba.

Yes, those frequent snaps waste considerable space. But they also save considerable bacon. :D
If the building burns down, a day-old backup is just fine.
Once is happenstance. Twice is coincidence. Three times is enemy action. Four times is Official GNOME Policy.
Top
sitquietly
Apprentice
Apprentice
User avatar
Posts: 153
Joined: Sat Oct 23, 2010 9:20 pm
Location: On the Wolf River, Tennessee

Re: RAID vs rsync. Your preferences, experiences?

  • Quote

Post by sitquietly » Sun Aug 23, 2020 8:49 pm

steve_v wrote:
sitquietly wrote:ZFS was the perfect use for multiple data disks! Now I keep all of my working stuff on a zfs mirror and the backup on another computer via rsync.
Not ZFS send/receive? ..... I too use rsync extensively, though in the other direction - rsync backups from several machines to a ZFS (RAIDZ6) fileserver...
You do it the right way. The backup server here needs to be upgraded to a zfs mirror ... soon. :)
Top
pjp
Administrator
Administrator
User avatar
Posts: 20668
Joined: Tue Apr 16, 2002 10:35 pm

  • Quote

Post by pjp » Sun Aug 23, 2020 10:38 pm

Goverp wrote:
pjp wrote:...
Unfortunately I don't think I have any crossover cables, so I'm not looking forward to seeing how slow it goes over the network.
AFAIR you probably don't need crossover cables; if you have modern ethernet cards at either end, they'll sort the cable out themselves.
I had thought that, but couldn't recall the specifics and if it included consumer devices. I also could recall if there were any power / damage concerns. I don't think there is, but the most recent occasion when I was wrong, I lost the original of the 3 4TB drives related to this thread. But, other than PoE, I can't recall any power issues between devices.

Naib wrote:just the usual google-foo and someone comes across something that implied RAID is synonymous with backup.
No more or no less than someone who doesn't know what they are doing being mistaken about other "backup" solutions that aren't what they expected. To be more explicit, yes, I do consider RAID an actual form of backup. And as with every other solution I have encountered, they all have areas they do not protect against. Don't forget the duct tape and bailing wire to assemble a solution for your needs.

Naib wrote:So the real question is do you want the expense (space, "complexity") of RAID or do you want the delay of rsync
Not having considered the potential read performance you mentioned from RAID mirroring, I had mostly chosen the rsync option short of someone mentioning an "oh, yeah, I should go with RAID mirroring as opposed to rsync mirroring." The $ cost is not the factor as I had already decided to dedicate 1 or 2 disks to mirroring.

Cost of complexity was the deciding factor (barring the "oh, yeah" moment). Dealing with RAID failures in an enterprise environment has its own concerns, but I don't have that hardware environment at home, so I'm much less comfortable relying on the "consumer" equivalent.

So at least until I get a ZFS server going, I'll be relying on the rsync equivalent of RAID1. I may use the 3rd drive for rsync snapshots, though I'm not sure if you can package only the difference between points A and B onto C. Worse case ought to be obtaining the difference from an rsync dry run.

Now I need to go NIC shopping.

Naib wrote:The rsync could also be a 6h cronjob so it is little and often with a shutdown local service for a final sync
This gets into the finer points of implementation. Overnight makes the most sense for the daily sync. But I think I'm going to aim for some degree of hourly and maybe 10 - 15 minute interval, somewhat like an applications autosave feature. I'll probably start with the hourly/nMinute snapshots from a workstation onto a dedicated disk within the workstation. Then have the nightly transfer to the system hosting the disks in question which are the subject of this thread. That's the starting point goal anyway. I suspect it will need to be tweaked.
Quis separabit? Quo animo?
Top
pjp
Administrator
Administrator
User avatar
Posts: 20668
Joined: Tue Apr 16, 2002 10:35 pm

Re: RAID vs rsync. Your preferences, experiences?

  • Quote

Post by pjp » Sun Aug 23, 2020 10:45 pm

sitquietly wrote:
steve_v wrote:
sitquietly wrote:ZFS was the perfect use for multiple data disks! Now I keep all of my working stuff on a zfs mirror and the backup on another computer via rsync.
Not ZFS send/receive? ..... I too use rsync extensively, though in the other direction - rsync backups from several machines to a ZFS (RAIDZ6) fileserver...
You do it the right way. The backup server here needs to be upgraded to a zfs mirror ... soon. :)
Well, yeah. Maybe.

Once I get it stabilized, then I need to get back to making generic binaries for everything (laptop, workstation, backup "server" ). I'm only doing that for my laptop now, and it is only at an 80% solution with the remaining 20% resulting in me putting off upgrades.
Quis separabit? Quo animo?
Top
krinn
Watchman
Watchman
User avatar
Posts: 7476
Joined: Fri May 02, 2003 6:14 am

  • Quote

Post by krinn » Mon Aug 24, 2020 4:38 am

pjp wrote:aim for some degree of hourly and maybe 10 - 15 minute interval, somewhat like an applications autosave feature.
Keep in mind, the most important won't be how much you will be able to save the current state of a file, but how much you will be able to recover it!
If your backup is too short, and a file is damage, the time you will have to SEE the file is damage and recover it will only be the gap between your backup time!
For incremental backup (which cost lot of space) that would work, for mirroring, it mean you will only have those 10-15 minutes to see the file has been damage... pass the delay, the file will be sync and you are dead
Top
pjp
Administrator
Administrator
User avatar
Posts: 20668
Joined: Tue Apr 16, 2002 10:35 pm

  • Quote

Post by pjp » Mon Aug 24, 2020 5:39 am

That's true of every backup solution, and I've yet to see any solution verify that a file isn't damaged. If it is damaged at the source and backed up damaged, well it isn't going to get fixed. I've seen that happen when users request a file, and when we finally find an undamaged version, it is older than the version they had hoped to retrieve.

My goal is to reasonably minimize the gap between daily backups. Some days I might not change much, other days I'd prefer to not lose some of the work. Anything will be an improvement, since it isn't currently being done.

The only way I'll do 15 minute snapshots is if I can skip performing them if there are no changes. And I don't know if that is possible with rsync. If I change a file early in the day and it is caught by an "rsync snapshot", I'm thinking that same change will be caught every time a snapshot is attempted after that.

As for space, those would all be temporary., and I'd probably skip certain files or directories. When the daily snapshot is completed, then the 15min/hourly or whatever increment snapshots would be cleaned up, depending on available space.

As far as progress goes, I have the two local copies of the data as previously mentioned, and have tested ~10% to the backup host. I'm about to kick off the full sync to that host and hopefully it will be done in the morning.
Quis separabit? Quo animo?
Top
Post Reply
  • Print view

30 posts
  • 1
  • 2
  • Next

Return to “Gentoo Chat”

Jump to
  • Assistance
  • ↳   News & Announcements
  • ↳   Frequently Asked Questions
  • ↳   Installing Gentoo
  • ↳   Multimedia
  • ↳   Desktop Environments
  • ↳   Networking & Security
  • ↳   Kernel & Hardware
  • ↳   Portage & Programming
  • ↳   Gamers & Players
  • ↳   Other Things Gentoo
  • ↳   Unsupported Software
  • Discussion & Documentation
  • ↳   Documentation, Tips & Tricks
  • ↳   Gentoo Chat
  • ↳   Gentoo Forums Feedback
  • ↳   Duplicate Threads
  • International Gentoo Users
  • ↳   中文 (Chinese)
  • ↳   Dutch
  • ↳   Finnish
  • ↳   French
  • ↳   Deutsches Forum (German)
  • ↳   Diskussionsforum
  • ↳   Deutsche Dokumentation
  • ↳   Greek
  • ↳   Forum italiano (Italian)
  • ↳   Forum di discussione italiano
  • ↳   Risorse italiane (documentazione e tools)
  • ↳   Polskie forum (Polish)
  • ↳   Instalacja i sprzęt
  • ↳   Polish OTW
  • ↳   Portuguese
  • ↳   Documentação, Ferramentas e Dicas
  • ↳   Russian
  • ↳   Scandinavian
  • ↳   Spanish
  • ↳   Other Languages
  • Architectures & Platforms
  • ↳   Gentoo on ARM
  • ↳   Gentoo on PPC
  • ↳   Gentoo on Sparc
  • ↳   Gentoo on Alternative Architectures
  • ↳   Gentoo on AMD64
  • ↳   Gentoo for Mac OS X (Portage for Mac OS X)
  • Board index
  • All times are UTC
  • Delete cookies

© 2001–2026 Gentoo Foundation, Inc.

Powered by phpBB® Forum Software © phpBB Limited

Privacy Policy

 

 

magic