View previous topic :: View next topic |
Author |
Message |
babudro n00b


Joined: 30 Sep 2005 Posts: 35 Location: Canada
|
Posted: Wed Nov 09, 2005 1:42 am Post subject: XFS error 990 |
|
|
I've got a hardware RAID-5 subsystem in a box; power went out one night and now I can't mount the drives. Trying to boot gives me a SB failure, booting on another device and trying to mount gives me "mount: Unknown error 990", and if I do xfs_repair I get this:
Quote: |
Phase 1 - find and verify superblock...
sb root inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 128
resetting superblock root inode pointer to 128
sb realtime bitmap inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 129
resetting superblock realtime bitmap ino pointer to 129
sb realtime summary inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 130
resetting superblock realtime summary ino pointer to 130
Phase 2 - using internal log
- zero log...
* ERROR: mismatched uuid in log
* SB : 43a113e7-e5be-4464-9293-4fbe8d923e7b
* log: e5be4464-e5be-4464-9293-4fbe00008000
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed. Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair. If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.
|
If I follow the advice and try to mount, I just get the error 990 again. The first time I ran xfs_repair it said it found a secondary superblock and could recover, but that is quite obviously false unless I'm overlooking something.
I've got two partitions on this subsystem, so I tried running xfs_repair with the "-L" option on the one which only contained the O/S (not my data), and it just ran a string of endless error messages "block (x,y) multiply claimed by bno space tree, state - 1" -- it seemed like it was reporting this error on every sector as the second number was counting up by ones and it went on for about a half hour before I finally ^C'd it.
Anybody got any suggestions? Any opinions about whether ReiserFS or EXT3 would've fared better? I've always used Reiser before except for boot partitions, but this is the first power loss I've had on a box with hardware RAID5. The EXT3 boot partition had errors but fsck managed to fix them. The server was only trying to shut-down when power went out, no other filesystem activity was happening, so the severity of this crash surprised me.
---
To warn anyone else who may be headed for this same disaster, I just read this bug report:
https://bugs.gentoo.org/show_bug.cgi?id=111078
Fixed 10 November (yesterday as I write this). I traced my problem to the fact that ethernet was shutting down prior to netmount, so this caused the server to hang on shutdown. I just went looking now to see if anyone else had the same issue and found this bug report. Fortunately this bug is fixed, albeit a few days too late to save my data. Update your baselayout in time to save yours, if you use remote filesystems.
Last edited by babudro on Fri Nov 11, 2005 11:52 pm; edited 1 time in total |
|
Back to top |
|
 |
Will Scarlet Apprentice

Joined: 19 Mar 2004 Posts: 239
|
Posted: Fri Nov 11, 2005 11:51 pm Post subject: |
|
|
Well after some searching on Google if found these:
http://oss.sgi.com/projects/xfs/faq.html#error990
-- This explains what the error is.
http://jijo.free.net.ph/19
-- This explains how to repair when the xfs_repair tool won't repair the filesystem
The XFS filesystem caches alot of information while it's running. So, I would suggest that you get an UPS for your server. This would help the situation during a power outage.
As far as which filesystem would fair better, I could't tell you for sure. But, I have been running XFS for most of my system the I have managed since it's first stable release and have not had any major issues with it.
Hope this helps... 
Last edited by Will Scarlet on Mon Nov 14, 2005 9:42 pm; edited 1 time in total |
|
Back to top |
|
 |
babudro n00b


Joined: 30 Sep 2005 Posts: 35 Location: Canada
|
Posted: Fri Nov 11, 2005 11:55 pm Post subject: |
|
|
Quote: | So, I would suggest that you get an UPS for your server. This would help the situation during a power outage. |
Thank you for your response, Will. You make a sensible recommendation, of course, but in my case that wasn't the problem. The UPS kept the server up for about 45 minutes, but the power was out for several hours so it didn't help. See my addition above about NFS; there was a bug in baselayout that caused ethernet to shutdown prior to netmount, which caused the server to hang until UPS power ran out.
Thanks very much for sharing your experience with XFS. I figured it was probably a good filesystem and that SGI was careful about their work, but I have only been using it for a few months whereas I've used ReiserFS since, oh, I think 1999 and never had anything like this happen in spite of many power outages with it.
The SGI link I'd already seen, but not the other one. Thanks a lot for looking - my Google searches didn't turn up anything to try past xfs_repair (funny how the keywords you use can bring up such different search results).
Unfortunately I can't make much sense of what the guy is saying, and there are so many errors reported by xfs_repair that I really have no idea where to start with xfs_db. His reference to "dev/hdXX" is clear enough, but he mentions specific commands (like 'write core.size 0') without explaining what they do and they are not explained in the xfs_db man page. He also shows "inode XXX" but doesn't say what the "XXX" is. I see three inode errors in my xfs_repair report, so I tried inserting one of these numbers like so:
Code: | xfs_db -x -c 'inode 18446744073709551615' -c 'write core.nextents 0' -c 'write core.size 0' /dev/hda3 |
but xfs_db said:
Quote: | bad inode number -1
no current type
no current type |
Sounds like a 16-bit versus 32-bit (or similar) integer error. SGI's mailing list archives are filled with similar dialogues but none of the suggestions I find there do me any good nor do they explain how to use xfs_db without getting into crazy tech talk (I could rebuild the 150GB of data faster than I could do the kind of virtual sector-by-sector debugging some of those posts suggest).
Might you know enough about xfs_db to be able to give me guidance on what xfs_db command to give if I provide further xfs_repair output? |
|
Back to top |
|
 |
Will Scarlet Apprentice

Joined: 19 Mar 2004 Posts: 239
|
Posted: Mon Nov 14, 2005 11:20 pm Post subject: |
|
|
Sorry to hear about your baselayout problem. It's never good when problems like that arise.
I really have never used xfs_db. But, here goes nothing... (I'm also hoping that you have a current backup of the partition in question that is having a problem in case none of this goes well.)
Anyway, as for xfs_db I found the following in the man page:
Code: | inode Inodes are allocated in ``chunks'' of 64 inodes each. Usually a chunk is
multiple filesystem blocks, although there are cases with large filesys-
tem blocks where a chunk is less than one block. The inode Btree (see
inobt above) refers to the inode numbers per allocation group. The inode
numbers directly reflect the location of the inode block on disk. Use
the inode command to point xfs_db to a specific inode. Each inode con-
tains four regions: CORE, next_unlinked, u, and a. CORE contains the
fixed information. next_unlinked is separated from the core due to jour-
naling considerations, see type agi field unlinked. u is a union struc-
ture that is different in size and format depending on the type and rep-
resentation of the file data (``data fork''). a is an optional union
structure to describe attribute data, that is different in size, format,
and location depending on the presence and [b]representation of attribute
data, and the size of the u data (``attribute fork''). xfs_db automati-
cally selects the proper union members based on information in the inode.
The following are fields in the inode CORE:
magic: inode magic number, 0x494e ('IN')
mode: mode and type of file, as described in chmod(2), mknod(2), and
stat(2)
version: inode version, 1 or 2
format: format of u union data (0: xfs_dev_t, 1: local file - in-inode
directory or symlink, 2: extent list, 3: Btree root, 4: unique id
[unused])
nlinkv1: number of links to the file in a version 1 inode
nlinkv2: number of links to the file in a version 2 inode
projid: owner's project id (version 2 inode only)
uid: owner's user id
gid: owner's group id
atime: time last accessed (seconds and nanoseconds)
mtime: time last modified
ctime: time created or inode last modified
SIZE: number of bytes in the file
nblocks: total number of blocks in the file including indirect and
attribute
extsize: basic/minimum extent size for the file, used only for realtime
NEXTENTS: number of extents in the data fork |
I had to capitalize some of the text from the man page (couldn't use bb code to bold for some reason). Anyway, it goes on describing more fields for core. So, what I gather from the command is that you're writing the value of 0 for the inode size and number of extents.
As for the inode number, maybe it's a 3 digit number since he used XXX for the value. So maybe you should try 128 through 130 for the inode value (from your original post). You might what to test my theory for the inode number using the xfs_ncheck tool. Do a man for more information on the xfs_ncheck tool.
Questions:
1. Are there more inode errors from xfs_repair? or only the three that you originally listed?
If you have more that the three listed, according to how many more, it may be better to dump the partition, recreate, and then restore from backup (if you have a backup to restore from).
2. Do you have more partitions on this array? or only one? If you have other partitions, have you had any luck mounting them?
If you have other partitions on this array, at least you'll know if you have more problems. Better to find out now then later.
Edit ---
You might also want to check out the news groups listed on the following SGI website:
http://oss.sgi.com/newsgroups.html
There might be someone listed with a similar issue.
End edit ---
Hope this helps...  |
|
Back to top |
|
 |
babudro n00b


Joined: 30 Sep 2005 Posts: 35 Location: Canada
|
Posted: Tue Dec 06, 2005 7:24 am Post subject: |
|
|
Hi Will. I was away for a couple of weeks. Thanks for your input on this problem.
Will Scarlet wrote: | As for the inode number, maybe it's a 3 digit number since he used XXX for the value. So maybe you should try 128 through 130 for the inode value (from your original post).
|
Good thinking. I tried that and (just FYI) it gave me a normal-looking response for all three inode values, but then when I ran xfs_repair again it spit out the same errors for inodes 128, 129, and 130.
Will Scarlet wrote: |
Questions:
1. Are there more inode errors from xfs_repair? or only the three that you originally listed?
If you have more that the three listed, according to how many more, it may be better to dump the partition, recreate, and then restore from backup (if you have a backup to restore from).
2. Do you have more partitions on this array? or only one? If you have other partitions, have you had any luck mounting them?
|
1. Just the three inode errors for that partition.
2. There were two xfs partitions. I've been experimenting with the root partition which contained no important data, leaving the data partition for last (although I'm a hair's breadth away from giving up and starting the tedious process of rebuilding the data).
Although your suggestions didn't get me anywhere, I appreciate your time and response.  |
|
Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|