Joined: 22 Jun 2003
Location: New Haven, CT, USA
|Posted: Mon Mar 01, 2004 6:00 pm Post subject: SMP Sparc32 stability woes :(
|I've been having stability issues with my dual CPU SparcStation 20 since I've started using it as a firewall/router. The machine locks up occasionally, especially I am doing large compiles like XFree86 and glibc.
This problem has really come to a head this weekend... I have tried several times to get the latest revision of glibc compiled and installed, but I have had no success. The machine gets pretty far into the compilation (last night it even got to the installation phase), but the machine eventually locks up with hundreds of these messages in the syslog:
Mar 1 10:26:02 leech spin_lock(faaea644) CPU#1 stuck at f0064850, owner PC(f0165624):CPU(0)
Mar 1 10:26:09 leech spin_lock(faaea644) CPU#1 stuck at f0064850, owner PC(f0165624):CPU(0)
Mar 1 10:26:16 leech spin_lock(faaea644) CPU#1 stuck at f0064850, owner PC(f0165624):CPU(0)
Mar 1 10:26:24 leech spin_lock(faaea644) CPU#1 stuck at f0064850, owner PC(f0165624):CPU(0)
I have two SM81 chips in the sparcstation 20, which warrants a "very hot" rating from the "The Rough Guide to MBus Modules":
So I tried out two SM61 chips in the machine instead. No luck, the machine still locks up under heavy load, printing out those "spin_lock" errors to the syslog. This can happen even just a few minutes after the sparcstation is turned on, so I really don't think this is an overheating issue. I have never seen this system crash under solaris or netbsd, even when doing an entire system recompile under netbsd with an smp kernel.
I can make this lockup happen very quickly as well... I have a large firewall script that the sparcstation runs at boot-time, it calls iptables about 5,000 times and takes about 20 minutes to run. If I try to do another intensive task at the same time as this script is running (such as an "emerge -UD system"), its nearly certain that linux will lock up with those "spin_lock" errors.
I am running 2.4.24-sparc-r2. Does anyone else have problems like this?