Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
WebCVS + SMP + Sandbox Problem -> CRASH + DEAD SYSTEM
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
Ondrej
n00b
n00b


Joined: 28 Apr 2002
Posts: 15

PostPosted: Tue Jun 11, 2002 2:28 am    Post subject: WebCVS + SMP + Sandbox Problem -> CRASH + DEAD SYSTEM Reply with quote

Hi everyone,

we run Gentoo Linux on a 2 CPU (P2 350s) system. The kernel is 2.4.19-gentoo-r5 with SMP enabled, and the low-latency and non-preempt patches disabled.

On a newly installed system, emerge webcvs crashes during the 'configure' step.

Code:

./conf.sh: configure has_map_fd, has_mmap, has_madvise, mmap_signal ... kernel BUG at filemap.c:130!
invalid operand: 0000
CPU:    0
EIP:    0010:[<c012b35c>]    Not tainted
EFLAGS: 00010282
eax: 0000002a    ebx: c11289a8   ecx: c039c640   edx: c797bcb0
esi: c11289a8    edi: c797bd6c   ebp: 00000000   esp: c797bd08
ds: 0018   es: 0018    ss: 0018
Process a.out (pid: 22757, stackpage=c797b000)

... more ...


The system's dead.

After a restart, portage is broken. Emerge will still rsync, however no package will emerge, instead emerge will freeze after it calculates the dependencies. NOTE: this has already been discussed at https://forums.gentoo.org/viewtopic.php?t=2529&start=0&postdays=0&postorder=asc&highlight=

Changing FEATURES in /etc/make.conf and switching sandbox to -sandbox fixes the problem.
However, even after re-emerging portage sandbox is still messed. This appears to be a kernel issue: refere to /usr/src/linux/mm/filemap.c:130 to see the problem for your selves ...

PLEASE MAKE PORTAGE USABLE IN THE EVENT THAT SOMETHING LIKE THIS HAPPENS!!!
Back to top
View user's profile Send private message
delta407
Bodhisattva
Bodhisattva


Joined: 23 Apr 2002
Posts: 2876
Location: Chicago, IL

PostPosted: Tue Jun 11, 2002 2:33 am    Post subject: Reply with quote

Alternatively, Gentoo could not officially recommend a kernel prepatch, which (though more stable than 2.5) is not nearly as stable as the latest kernel release (vanilla-sources, 2.4.18).
Back to top
View user's profile Send private message
Ondrej
n00b
n00b


Joined: 28 Apr 2002
Posts: 15

PostPosted: Tue Jun 11, 2002 3:08 am    Post subject: Reply with quote

We are aware of the risks of using a non-official kernel, however the point of the message was to illustrate the need for portage to improve... how is it possible for a kernel bug to effectively and permanently disable part of portage's functionality (sandbox)?
Back to top
View user's profile Send private message
delta407
Bodhisattva
Bodhisattva


Joined: 23 Apr 2002
Posts: 2876
Location: Chicago, IL

PostPosted: Tue Jun 11, 2002 3:12 am    Post subject: Reply with quote

Well, when the kernel gets toasted, you really can't expect user-mode apps to stand a chance. When your system went up in flames, there wasn't much Portage could have done to recover. So, if Portage got killed in a not so opportune spot (sandboxing), strange and horrible things can result.
Back to top
View user's profile Send private message
Ondrej
n00b
n00b


Joined: 28 Apr 2002
Posts: 15

PostPosted: Tue Jun 11, 2002 4:26 am    Post subject: Reply with quote

well portage works for the most part .... except sandbox What is the danger of not using sandbox? It is not explained in depth what exactly it does.

Also the interesting thing again is that even after re-emerging portage, it is still broken. (One guy from the original thread even re-emerged his entire system!) If you look again at the EXEC process trace it just dies when it does a ... rm -rf ... Wondering if XFS is then the problem... it was the __remove_inode_page function after all ...
Back to top
View user's profile Send private message
bcressey
n00b
n00b


Joined: 13 Jun 2002
Posts: 35

PostPosted: Thu Jun 13, 2002 1:25 pm    Post subject: Reply with quote

I encountered a very similar problem, I think. My system is a dual Athlon MP 1600+ running kernel-2.4.18-xfs; all of my partitions (except boot/swap) are XFS.

Last night while emerging sox my system died with a kernel crash. Having never seen one before I wasn't sure what to look for, although I believe it mentioned something about swap.

There is an error message in my kernel logs which seems related:

Jun 8 00:02:00 [kernel] swap_free: Bad swap file entry 20747562
Jun 8 06:15:01 [kernel] swap_free: Bad swap file entry 07200720

I am inclined to doubt that this is caused by a faulty hard disk, since I'm running a mirrored RAID setup and haven't seen any warnings from either drive.

Anyhow, after a reboot I couldn't emerge anything; sandbox would hang and begin devouring system resources. Following a suggestion in the other thread and README.RESCUE I unpacked the portage bz2 file, which had the nasty side effect of rendering my system unbootable.

At that point I booted off the rescue CD, xfs_repair'd each partition, and restored my backups of /usr, /bin, /sbin, /lib, /etc, and /var. They are only a few days old, and I know sandbox worked at the time I made the backups.

However, sandbox still fails. Anyone out there who can hazard a guess as to why? More importantly, how do I fix it, if restoring all the related files on top of the broken ones doesn't solve it?

Ben
Back to top
View user's profile Send private message
Ondrej
n00b
n00b


Joined: 28 Apr 2002
Posts: 15

PostPosted: Thu Jun 13, 2002 8:53 pm    Post subject: Problem Solved Reply with quote

Hi,

I've been trying to figure out what happened, and finally found the problem.

Once executed, sandbox stores its PIDs in /tmp/sandboxpids.tmp. It's apparent that as the kernel crash occured, sandbox was still active, and the last PID it ran as was stored. However, I am assuming that the file was never properly closed by the kernel, and the last PID line (the first line in the file) was corrupted. In my case, it looked something like "@@@@1234".

So, if the kernel crashes when something is being emerged within sandbox and, after reboot, portage doesn't work anymore, all that needs to be done to fix things is rm /etc/sandboxpids.tmp.

Why i didn't bother to search for all '*sandbox*' files and look at their content earlier is beyond me. It only cost my friend and I a whole reinstall and countless reboots. I hate my life.

Hope this helps!

Ondrej

P.S. In my case the kernel crashed during the configure step of rcs (revision control system). Has anyone else had problems emerging this package? I run SMP-enabled 2.4.19-gentoo-r7 without the preemption and no-latency patch. Thanks!
Back to top
View user's profile Send private message
bcressey
n00b
n00b


Joined: 13 Jun 2002
Posts: 35

PostPosted: Thu Jun 13, 2002 10:20 pm    Post subject: Reply with quote

Ahh, thanks! That's very useful to know.

As it happens, /tmp was the only partition I didn't restore. Go figure.

I also decided to reinstall. Whoops. At least if this ever happens again I'll know what to do.

Thanks again.

Ben
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum