View previous topic :: View next topic |
Author |
Message |
gtbX Tux's lil' helper
Joined: 11 Oct 2003 Posts: 126
|
Posted: Tue Aug 06, 2013 4:25 pm Post subject: Ext4 fs corruption |
|
|
I'm having an odd issue with one of the machines I remotely administer. Recently, it's /home partition started developing filesystem errors that prevent it from being mounted at boot. Instead it drops to a login screen, and I have to walk someone through logging in and running fsck -y on the partition. It seems to need it every time it reboots now. I tried reformatting the partition with
Code: | mke2fs -t ext4 -c -c /dev/sda7 | to scan for bad blocks, but it didn't find any. I suppose it might be the superblock(?) that's bad, but I would think that would've been detected too.
So I have 2 questions:
1. What could be causing this/how to prevent it?
2. Can the init scripts be configured to keep booting, even if /home fails to mount, so that I can at least ssh into the box? |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9646 Location: almost Mile High in the USA
|
Posted: Tue Aug 06, 2013 5:40 pm Post subject: |
|
|
Remember that corruption doesn't necessarily come from the disk. Just like any other computer, garbage in, garbage out. Your CPU could be emitting garbage for the disk to write, or perhaps your RAM has amnesia causing your CPU to write bad data to the disk.
I would think that initscripts should keep on booting without home, but since ~/.ssh lives on home for many users, it would still be hard to ssh in (especially if root is disabled). _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
gtbX Tux's lil' helper
Joined: 11 Oct 2003 Posts: 126
|
Posted: Wed Aug 07, 2013 5:15 am Post subject: |
|
|
I think if it was a kernel or RAM issue, I'd see more problems than just this. Then again, I first saw this problem shortly after upgrading to gentoo-sources-3.8.13. I haven't had any issues with the root fs (also ext4), but it gets less I/O.
The init scripts fail at running fsck on /home, and drop to an emergency login: "Welcome to (none).(none)" or something. The hostname isn't even set yet. Conceivably, the network and sshd could be started, and I could login as root (via pubkey of course).
/etc/fstab: Code: | /dev/disk/by-label/ROOT / ext4 noatime 0 1
/dev/disk/by-label/HOME /home ext4 noatime 0 2 |
I'll double check my kernel config, and see if dropping back to gentoo-sources-3.6.11 helps. |
|
Back to top |
|
|
gtbX Tux's lil' helper
Joined: 11 Oct 2003 Posts: 126
|
Posted: Mon Aug 26, 2013 6:01 am Post subject: |
|
|
It does seem to be kernel-related, I just ran into the same problem on a different machine with the same kernel version. I rolled back the kernel on the original box to 3.6.11 and the problem seems to have gone away (I'll upgrade the kernel when I get to it in person). The second box has been upgraded to 3.10.7, I'll have to see if that helps |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9646 Location: almost Mile High in the USA
|
Posted: Mon Aug 26, 2013 4:37 pm Post subject: |
|
|
Though I had other issues with gentoo-sources-3.8.13 I have not seen the corruption issue on my ext4 machines. _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
gtbX Tux's lil' helper
Joined: 11 Oct 2003 Posts: 126
|
Posted: Mon Aug 26, 2013 7:49 pm Post subject: |
|
|
Crud, it happened again, this time on 3.6.11. Seems to start when there's an unclean shutdown. Running fsck manually has it remove some deleted inodes - nothing critical yet, but it's only a matter of time until valuable files get lost. I thought using a journalling fs was supposed to help with that? Maybe I'm just doing it wrong. |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9646 Location: almost Mile High in the USA
|
Posted: Mon Aug 26, 2013 8:43 pm Post subject: |
|
|
Uh... No. Even with a journalling filesystem, just shutting down the machine abruptly (like cutting power) is not proper.
Journalling filesystems will *help* but does not prevent corruption. A proper shutdown is still needed.
If you must have a system that can handle this, it can help more if cached writes are flushed to disk as quickly as possible. It will reduce performance but will help against corruption. _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
trumee Guru
Joined: 02 Mar 2003 Posts: 551 Location: London,UK
|
Posted: Wed Aug 28, 2013 7:55 pm Post subject: |
|
|
eccerr0r wrote: |
If you must have a system that can handle this, it can help more if cached writes are flushed to disk as quickly as possible. It will reduce performance but will help against corruption. |
How can i do this? It will be useful in situations when power failure is random. |
|
Back to top |
|
|
kernelOfTruth Watchman
Joined: 20 Dec 2005 Posts: 6111 Location: Vienna, Austria; Germany; hello world :)
|
Posted: Wed Aug 28, 2013 10:57 pm Post subject: |
|
|
trumee wrote: | eccerr0r wrote: |
If you must have a system that can handle this, it can help more if cached writes are flushed to disk as quickly as possible. It will reduce performance but will help against corruption. |
How can i do this? It will be useful in situations when power failure is random. |
mount with commit=5 (should be the default no ? forcing nonetheless is safer)
or commit=10 (you could also try 20 to sync every 20 seconds)
or add data=journal as mount option - to force (ext3-like) full journalling mode - is it deprecated yet, btw ? _________________ https://github.com/kernelOfTruth/ZFS-for-SystemRescueCD/tree/ZFS-for-SysRescCD-4.9.0
https://github.com/kernelOfTruth/pulseaudio-equalizer-ladspa
Hardcore Gentoo Linux user since 2004 |
|
Back to top |
|
|
trumee Guru
Joined: 02 Mar 2003 Posts: 551 Location: London,UK
|
Posted: Thu Aug 29, 2013 10:07 pm Post subject: |
|
|
Is the commit option only for ext3? man mount indicates it as a suboption of ext3.
At the moment i am running ext4, but was wondering whether ext3 is safer choice for sudden power failures? |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9646 Location: almost Mile High in the USA
|
Posted: Thu Aug 29, 2013 10:41 pm Post subject: |
|
|
As the shorter commit times is just a hack to just help limit the damage, I cannot condone this as a "solution". Journalling filesystems are already helping the problem a bit as it is (unless you somehow disabled the journal) but it's still not right.
The question that's going in my head: Why is the power going out so frequently that such is needed?
If it's due to laziness, people will need to figure out how to shut down normally.
If it's due to unstable power, a UPS or perhaps a laptop configured to do a clean shutdown is highly recommended, this is a "proper" solution.
How frequent is frequent? Also what is the function of the machine, is it writing stuff to disk constantly? A disk that's merely just read most of the time should not suffer as much corruption from unclean shutdowns.
Remember, even with these faster commit options, if power goes out while committing, you will suffer problems as well. _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
|