Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Best FS for data integrity?
View unanswered posts
View posts from last 24 hours

Goto page Previous  1, 2, 3  
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo
View previous topic :: View next topic  
Author Message
steveL
Watchman
Watchman


Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

PostPosted: Wed Mar 18, 2015 5:15 pm    Post subject: Reply with quote

The_Great_Sephiroth wrote:
Steve, this is a personal backup of my laptop.
..I simply want the most secure FS for a single disk.

This recent debian-devel discussion seems on point to me, especially when you read the upstream kernel bug which indicates it applies to all the new modern FS.

Given the latter I'm sure people will change their application-code (or more likely the libraries in use will.) I just find it somewhat disturbing, and it makes me glad I'm still using ext3 (via the ext4 module so it's relatively-current code.)

ISTR reiser was pretty good at journalling, back in the day, before true bit-rot meant ext3 was much more robust, so I find it odd that things are less reliable all these years later.
Quote:
I am having a huge problem figuring out how to do incremental backups in Linux.

I'd ask in #bash on IRC: chat.freenode.net if I were you; that's where all the serious sysadmins are, ime.

(I don't think it's off-topic, since this is your thread, asking for support on setting up backups for your machine.)

I know there are plenty of options, but I'm not an admin, and not current with what's going on. I tend to just use rsync (over ssh if to/from another box), and I don't usually do incremental at all, just snapshot effectively, or mirroring of git repos which are also "backed up" elsewhere by virtue of being shared via a lovely VPS hosted by Patrick.

The VPS is itself ofc backed up offsite, thanks to infra (Griz and Monkeh). I'm fairly happy about the backup of important things; in this context things we collaborate on. Like I said, every git clone is a mirror, and every single one has had the objects verified as a matter of routine. (note this means the server's object store verified on every client pull, by the client. sweet :)
When it comes to a system, it cannot be argued that reproducible builds from upstream sources is achievable; that's how we installed the machines we're all using, from stages painstakingly bootstrapped from sources by releng. This is exactly the same as the UNIX tradition of the 1970s.

There is an rsnapshot tool that istr is part of rsync, as well as all sorts of differential things, eg for binaries.
Let's see what others say.
Back to top
View user's profile Send private message
John R. Graham
Administrator
Administrator


Joined: 08 Mar 2005
Posts: 10589
Location: Somewhere over Atlanta, Georgia

PostPosted: Wed Mar 18, 2015 5:52 pm    Post subject: Reply with quote

steveL wrote:
...
You seem to have missed these parts:
...
Dismissed, not missed. ;) Although I agree with your comparisons with tar and rsync, is doesn't really affect the validity of my argument: the overwhelming majority of data that is read from a typical Linux volume doesn't come from tarballs or rsync servers either. Fact is, I also agree with you that Git is a more comprehensive variant of what we've already been doing for the last 40 years; but for the exact same reasons, ZFS is, too, and it covers more data than Git: by a reasonable definition of the term, it appears to be more general.

Also, I find it curious that you think that if such protections are pushed down a layer, they're the best thing since sliced bread, but if they're pushed down two layers (which is where the ZFS protections currently reside), they're superfluous. I understand that there could be value to eventually pulling them up a layer as they would then automatically benefit all filesystems, but I would contend that they're in exactly the right place for now so that they can mature without upsetting the delicate sensibilities of the kernel developers.

Regarding all of the mathematical contortions, parity is about the simplest error correction syndrome there is and it calculates really fast. Are you contending that modern (or even semi-modern) multi-core CPUs wouldn't be able to keep the I/O channels saturated because of the error correction syndrome calculation or perhaps that the calculation overhead would unduly slow down other things? If so, then my intuition is that you're incorrect, but let me do some benchmarking. That overhead is only incurred on write, though. For completeness, I'll note that the 2nd syndrome used on RAID 6 (and, I think, the analogous RAID-Z2) is more complex.

- John
_________________
I can confirm that I have received between 0 and 499 National Security Letters.
Back to top
View user's profile Send private message
py-ro
Veteran
Veteran


Joined: 24 Sep 2002
Posts: 1734
Location: Velbert

PostPosted: Wed Mar 18, 2015 6:13 pm    Post subject: Reply with quote

Just some Numbers thrown in.

I run a btrfs scrub yesterday.

Processor was a Phenom T1090.

The Array delivered with around 850MB/s, which is all the PCIe 4x Controller could deliver.

This occupied half of the cores and this processor has no usefull extension for calculating the checksums. With a more modern Processor and normal Workload there shouldn't be any problem.

Bye
Py
Back to top
View user's profile Send private message
steveL
Watchman
Watchman


Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

PostPosted: Sat Mar 21, 2015 12:09 pm    Post subject: Reply with quote

John R. Graham wrote:
Dismissed, not missed. ;)

Ah and now we're back to your weirdness.

FTR "Ignored" != "Dismissed"; it just means you're being smarmy now, instead of actually dealing with the substantive as you should have done at the beginning. I note in passing that you haven't addressed the substantive points you said you would in the other thread, either, which were ofc in other threads altogether.

The fact that you're deliberately twisting "down" to mean "up" just confirms it for me.

Good luck with that.
Back to top
View user's profile Send private message
John R. Graham
Administrator
Administrator


Joined: 08 Mar 2005
Posts: 10589
Location: Somewhere over Atlanta, Georgia

PostPosted: Sat Mar 21, 2015 5:16 pm    Post subject: Reply with quote

Edit: Upon further reflection, I've reworded my response somewhat, regarding the applicability of the word, "dismissed".

First of all, I don't really see how that little play on words is categorically different from your suggestion that I might be one of those fellows that insists on wandering around utilizing three different methods of holding up my pants. Friendly banter, both of them, in my opinion. Especially since I did attempt to address your points: you were presenting three different cases that all share a common attribute and I chose one exemplar for comparison purposes, just as I have said "ZFS" when I meant "ZFS or BTRFS". This is pretty standard. But, since they shared this common attribute, they were answered by a common argument. Thus, to say that I dismissed the other two is, I think, accurate.

Second, what I thought you meant by saying that the solution I was espousing wasn't generic enough was that its proper implementation would be at the VFS layer (a higher layer of the OS), where it could universally benefit all filesystems, as opposed to the filesystem layer (a lower layer of the OS) where it is currently implemented, only benefiting users of ZFS (or BTRFS). In that context, I'm sure you realize that my use of "up" and "down" was correct (deliberate, too, but that's not important right now). As I've apparently misunderstood what you meant, I hope you'll explain.

Regarding the parity calculation benchmark, I've finished conceptualizing it, although implementation will take a little longer as I haven't done a serious amount of x86 assembler since the mid-90's. Would it surprise you to know that, using a 5-volume RAID 5 array as an example, including loop overhead, calculating the error correction syndrome uses less than 1.17 machine instructions per 64 bits of I/O channel traffic? The loop where the syndrome is calculated will only contain 6 machine instructions and each iteration of the loop will process 320 bits of I/O channel traffic. Additionally, the SSE and AVX instruction set extensions appear to offer even higher performance than that, but I haven't finished learning about them.

- John
_________________
I can confirm that I have received between 0 and 499 National Security Letters.
Back to top
View user's profile Send private message
steveL
Watchman
Watchman


Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

PostPosted: Mon Mar 23, 2015 3:50 pm    Post subject: Reply with quote

If you had dismissed my statements earlier, you would have had to refute them first; you made no attempt til the previous post.

As to the substantive, I've lost any sense of motivation to expound further; I don't feel I need to in any case, since krinn got the point immediately. Thus afaic any fellow practitioner has enough info to see what I mean.

In the spirit of the forums, I'll simply add: intrusion detection, wrt "git only saves text" (which is ofc untrue.)
Back to top
View user's profile Send private message
John R. Graham
Administrator
Administrator


Joined: 08 Mar 2005
Posts: 10589
Location: Somewhere over Atlanta, Georgia

PostPosted: Mon Mar 23, 2015 4:50 pm    Post subject: Reply with quote

Regarding "dismissed", I don't think that word means what you think it means.

Regarding Git, of course it's untrue, but you were arguing that the human element was important, and one of the reasons why the extra robustness was superfluous at the filesystem level:
steveL wrote:
You are checking in source-code, which you have just built or used, to verify your changes are reasonable.
git has already scanned the entire tree for mtime changes, and run checksums on everything that has (unless you ask it to do more, ofc.)

The git history is a block-chain. (yes, that has nice implications ;)
Quote:
So, it's not really belts and braces, it's protection in another arena.

Yes it is; it already was before we used vcs.
Belt and braces meaning human and machine, as well as medium + checksum, both verified continuously, since this isn't random data.
I was arguing that, since most of the data our machines process is not human readable and thus cannot be passed through the full human + machine check, the extra robustness at the filesystem level might not be superfluous.

- John
_________________
I can confirm that I have received between 0 and 499 National Security Letters.
Back to top
View user's profile Send private message
The_Great_Sephiroth
Veteran
Veteran


Joined: 03 Oct 2014
Posts: 1602
Location: Fayetteville, NC, USA

PostPosted: Thu Mar 26, 2015 1:12 pm    Post subject: Reply with quote

Wow, lots of good info in this thread now! I actually got incremental working through a home-brew bash script. I would love something like Toucan for Linux, but shell is OK also, since it means I can set it up as a cron job on a server if I had to.

Now I am curious about this ext3 versus ext4 argument brought up. I have not read the entire thread you linked yet, but you are saying ext3 is faster than ext4? Part of me believe that would be normal, since ext4 can handle larger volumes and such, but I am unclear here. I have been using ext4 for five or six years now and have not had any issues with it, even when I have had power issues and a system dies instead of being shutdown correctly.

So this has brought up another question in the back of my mind. What file-system would YOU (any of you) use for a Linux workstation? How about a server? I have been using primarily ext4, but am open to opinion and advice. Now I am going to go read the remainder of the thread you linked.
_________________
Ever picture systemd as what runs "The Borg"?
Back to top
View user's profile Send private message
steveL
Watchman
Watchman


Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

PostPosted: Thu Mar 26, 2015 2:13 pm    Post subject: Reply with quote

The_Great_Sephiroth wrote:
Wow, lots of good info in this thread now! I actually got incremental working through a home-brew bash script. I would love something like Toucan for Linux, but shell is OK also, since it means I can set it up as a cron job on a server if I had to.

Good one. Like I said I'd ask in #bash if you want something more prebuilt; either way you'll call it from shell, and they can help you with that too. Firstly if it is sh, tell them that upfront, and secondly cf: /msg greybot faq cron
Quote:
Now I am curious about this ext3 versus ext4 argument brought up. I have not read the entire thread you linked yet, but you are saying ext3 is faster than ext4?

Well this fsync/fdatasync issue seems like it's going to slow things down, since many apps, including toolkits like KDE/Qt will datasync automatically (there was a bug about that a few months ago wrt tempfiles in #kate.) Again, I'd imagine things will change over time, but I don't much like the choice (sync and be very slow, or don't sync and hope.) Especially given that POSIX recommends we fdatasync at least, if it matters (which means programmers coding for more than one OS, are going to use it more often than not.)
Quote:
Part of me believe that would be normal, since ext4 can handle larger volumes and such, but I am unclear here. I have been using ext4 for five or six years now and have not had any issues with it, even when I have had power issues and a system dies instead of being shutdown correctly.

So this has brought up another question in the back of my mind. What file-system would YOU (any of you) use for a Linux workstation? How about a server? I have been using primarily ext4, but am open to opinion and advice. Now I am going to go read the remainder of the thread you linked.

I'm sticking with ext3 via the ext4 module for now (since I am similarly happy with it: it's never messed me around); if you're happy with ext4, by all means stay with it.

I don't think it's got anything to do with larger volumes, but I'm no FS expert.
Back to top
View user's profile Send private message
The_Great_Sephiroth
Veteran
Veteran


Joined: 03 Oct 2014
Posts: 1602
Location: Fayetteville, NC, USA

PostPosted: Thu Mar 26, 2015 3:57 pm    Post subject: Reply with quote

After reading that thread and doing more searching and reading, I came across this performance comparison which showed me that in all of the tests ext4 is fairly good. Overall I'd say it is the best. It even beats ext3 except in one test. I believe I will stick to ext4 except on RAID or Flash.
_________________
Ever picture systemd as what runs "The Borg"?
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo All times are GMT
Goto page Previous  1, 2, 3
Page 3 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum