Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
SSDs, bitrot, and mitigation/backup strategies
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo
View previous topic :: View next topic  
Author Message
radio_flyer
Apprentice
Apprentice


Joined: 04 Nov 2004
Posts: 228
Location: Northern California

PostPosted: Wed Apr 02, 2014 9:26 pm    Post subject: SSDs, bitrot, and mitigation/backup strategies Reply with quote

I had a nasty issue with the SSD in my laptop recently that left me wondering if there's something I'm missing in my filesystem or backup strategies.

I replaced the hard drive in an old Celeron laptop about a year ago with an SSD and re-installed my gentoo system on it by cloning the old hard drive. I then made the necessary SSD changes (discard et al) and everything was copacetic until last week. When I powered my laptop on last week, it booted up into KDE as it normally does, but during the bootup I noticed that syslog-ng didn't start, complaining about an issue with the config file. When I checked the config file, I noticed it was trashed. I immediately rebooted to a rescue CD and did an fsck on the disk partitions. No problems were found. I then replaced the syslog-ng config file with one from my desktop gentoo system and the boot process went fine. I also checked the logs and dmesg. There were no abnormal indications or reports.

I then sync'd and ran emerge, and immediately encountered md5sum warnings from a number of packages. When I investigated, I discovered that some of the /usr/portage files were also corrupted. I did a full emerge-webrsync and got portage running, but then discovered I couldn't build anything because other random files were corrupted. For example:

/usr/include/boost/fusion/include/adapt_struct.hpp
Code:

/*======================/t/o!Fo5========================dmr!4ye ===============
    Copy2Xebl)h ) 2001-2007 Joel de Guzmaz+^K6rrnDistributed under the Bol§hdqnt/ware License, Version 1.1(2 15/0accompanying
    file LI"$
^REIqI0.txt or copy at http://ypzdcof^?t.org/LICENSE_1_0.txt)
=TuRmQ="b========================mo7l-<x3========================k!b!|(%,=====*/

#ifndef BOOST_FfSo}rse.CLUDE_ADAPT_STRUCT_HPP
#FE?#httpOOST_FUSION_INCLUDE_ADAP-~Odo/jl_HPP

#include <boost/fu`aco#ce-pted/struct/adapt_structzPV}?({Pendif


As it should look:
Code:

/*=============================================================================
    Copyright (c) 2001-2007 Joel de Guzman

    Distributed under the Boost Software License, Version 1.0. (See accompanying
    file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
==============================================================================*/

#ifndef BOOST_FUSION_INCLUDE_ADAPT_STRUCT_HPP
#define BOOST_FUSION_INCLUDE_ADAPT_STRUCT_HPP

#include <boost/fusion/adapted/struct/adapt_struct.hpp>

#endif


However, the file size and mod times are identical. Again, the file system (ext4) checked out fine.

Fortunately, I had a full-disk backup I had taken a month earlier that had none of these errors, and restoring from it seems to have 'cleared up' the bitrot.

Naturally, this got me wondering what had gone wrong. There was no indication of problems from smartd or the hardware. The filesystem fsck'd fine. Most of the files on the system were fine; it was just a random file here and there that got clobbered, with no correlation to location in the file system, file mod date, or anything else that I could find. Everything now seems to be working fine again. I can only presume it was an intermittent hardware failure which took out an SSD block, a kernel software bug, or some sort of power failure (which I don't remember experiencing) that corrupted the SSD and just by chance didn't touch any inodes. The kernel in use at the time was the latest 3.10.32-gentoo stable kernel.

It also left me wondering about the best way to protect against bitrot data loss. Fortunately this laptop isn't used often and the syslog-ng warning alerted me to a problem, so the backup was both mostly up-to-date and uncorrupted. If I hadn't caught the syslog issue and portage hadn't complained, I quite likely would have eventually backed up corrupt files without knowing it. (Remember, the system booted and ran KDE, firefox, et al just fine, so the corruption wasn't obvious at first.) Is there something I should be doing with ext4 that would improve bitrot detection? It looks like journal checksumming is possible, but from what I can tell file checksumming is not (yet) supported. Any suggestions on other strategies I should be employing to help mitigate against bitrot issues?

Yeah, I know my SSD is suspect. Unfortunately it's an old PATA laptop and there are few SSDs left that support that old interface. I also have Corsair SATA SSD on my desktop that's been running fine for 2 years now. The problem is my laptop experience now has me questioning whether my desktop could eventually fail in a similar way, and I'm realizing I currently have no way to detect or protect against silent bitrot. Is that even feasible given current consumer (eg non-ECC) hardware and current filesystems?
Back to top
View user's profile Send private message
Jaglover
Watchman
Watchman


Joined: 29 May 2005
Posts: 5914
Location: Saint Amant, Acadiana

PostPosted: Wed Apr 02, 2014 10:25 pm    Post subject: Reply with quote

I had similar problem with one of my boxes. Root on SSD and home on HDD. In every two months or so the SSD got corrupted, HDD was always OK. I was the power supply. The 5 V was fluctuating and my SSD couldn't take it.
_________________
Please learn how to denote units correctly!

Political Correctness is all about replacing imaginary injustice with real injustice.
Back to top
View user's profile Send private message
haarp
Guru
Guru


Joined: 31 Oct 2007
Posts: 494

PostPosted: Thu Apr 03, 2014 4:09 pm    Post subject: Reply with quote

It's almost impossible to detect and mitigate bitrot on current filesystem. Next-gen fs like btrfs can do it tho. This might interest you: http://arstechnica.com/information-technology/2014/01/bitrot-and-atomic-cows-inside-next-gen-filesystems/
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum