ReiserFS and 2TB disk

binro · Posted: Fri Dec 07, 2012 12:02 pm Post subject: ReiserFS and 2TB disk

Two weeks ago I upgraded two ageing disks to a single 2TB Seagate (ST2000DM001-9YN164). I use LVM and formatted the LVs with ReiserFS. In particular, the /home partition is 1TB. Having restored my system everything looked fine but returning after several hours, the KDE desktop would not wakeup properly. Switching to a console, neither the sync or umount command would complete, they just hung. This happened a couple of times, so I thought the backup might have been a bit corrupt and completely reinstalled @system and @world, and built the latest kernel-3.6.8. Returning last night the same thing had occurred; looking at htop from a console I could see lots of identical processes that had been started and just hung. In the syslog I could see kernel messages relating to hung tasks:

Eventually the system just hangs completely. Since this started with the new disk, I am wondering if ReiserFS actually works with new, huge disks. If not, what else could be causing this? This is a bit desperate.

TIA
_________________
"Ship me somewheres east of Suez, where the best is like the worst,
Where there ain't no Ten Commandments an' a man can raise a thirst"
from "Mandalay" by Rudyard Kipling

Merlin-TC · l33t Joined: 16 May 2003 Posts: 603 Location: Germany

Sawadee Binro,

reiserfs doesn't have any problems with volumes up to 16tb so I doubt reiserfs itself is the problem.

1. Is there any additional output of dmesg?
2. Can you reproduce it or does it feel "random"?
3. Is the system under heavy load when this is happening?

You could try another io scheduler just to narrow down the problem.

srs5694 · Posted: Fri Dec 07, 2012 3:52 pm Post subject:

You might also run a SMART utility like GSmartControl, the SMART functions of Palimpsest, or smartctl. These will tell you if you've got a new disk that's defective. (Sadly, it happens sometimes.) The output can be difficult to interpret sometimes, though, so post for help interpreting the output if you need it.

binro · Posted: Fri Dec 07, 2012 4:26 pm Post subject:

binro · Posted: Fri Dec 07, 2012 4:27 pm Post subject:

binro · Posted: Sat Dec 08, 2012 4:20 pm Post subject:

This gets stranger and stranger. I disabled the screen-saver and now the system is stable again! A screen-saver wouldn't interfere with process execution, would it?
_________________
"Ship me somewheres east of Suez, where the best is like the worst,
Where there ain't no Ten Commandments an' a man can raise a thirst"
from "Mandalay" by Rudyard Kipling

srs5694 · Posted: Sat Dec 08, 2012 4:34 pm Post subject:

binro · Posted: Sat Dec 08, 2012 8:51 pm Post subject:

I was thinking along the same lines, except that before the restore onto the new disk this all worked perfectly. I can't help thinking that something in my system has been subtly corrupted.
_________________
"Ship me somewheres east of Suez, where the best is like the worst,
Where there ain't no Ten Commandments an' a man can raise a thirst"
from "Mandalay" by Rudyard Kipling

srs5694 · Posted: Sun Dec 09, 2012 12:49 am Post subject:

How did you transfer your system to the new disks? (dd, tar, etc.?) It could be there's a malfunction in the video drivers that's related to a subtle permission problem introduced in the transfer; or maybe a bit or two got flipped during the copying. If you've still got the original disk, you could plug it in and write a script to compare every file. between the two systems.

salahx · Guru Joined: 12 Mar 2005 Posts: 530

Actually looking at the stack trace and explanation of symptoms, this could be a genuine bug. It sounds like there a race condition in reiserfs that's causing a deadlock. The screen saver being innocent in this matter - it just happens to widen the window the race can occur.

It may worth recompiling the kernel with CONFIG_PROVE_LOCKING=y

binro · Posted: Sat Jan 19, 2013 1:08 pm Post subject:

binro · Posted: Sat Jan 19, 2013 1:10 pm Post subject:

binro · Posted: Mon Feb 18, 2013 11:37 am Post subject:

I am back looking at this again. The lock proving idea did not work because the kernel disabled it when the evil NVidia binary module tainted the kernel! I am now seeing this in the logging:

This was during a nightly backup. Also...

Signs of a failing disk?
_________________
"Ship me somewheres east of Suez, where the best is like the worst,
Where there ain't no Ten Commandments an' a man can raise a thirst"
from "Mandalay" by Rudyard Kipling

Merlin-TC · l33t Joined: 16 May 2003 Posts: 603 Location: Germany

I wouldn't say it's a sign of a failing disk but it is failing right now.
If there is anything important on it copy it off while you can.
It also seems as if your hard drive doesn't have any spare sectors as well so you really should replace it.

This is a hardware error for sure.
It could of course be a faulty cable/sata port but I doubt it.

NeddySeagoon · Posted: Mon Feb 18, 2013 5:20 pm Post subject:

binro,

the output of smartctl -a for that drive would be good.

binro · Posted: Mon Feb 18, 2013 8:14 pm Post subject:

http://smartmontools.sourceforge.net

I live in Bangkok, so 60C is not so hot in the middle of the night when the aircon is off. Kit does tend to expire more quickly out here, but this unit has only been operating 83 days! Well Bangkok, as well as being hot, is also the hard disk capital of the world, so I should be able to get it replaced.

_________________
"Ship me somewheres east of Suez, where the best is like the worst,
Where there ain't no Ten Commandments an' a man can raise a thirst"
from "Mandalay" by Rudyard Kipling

NeddySeagoon · Posted: Mon Feb 18, 2013 8:48 pm Post subject:

binro,