Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
2.6.35 hangs badly
View unanswered posts
View posts from last 24 hours

Goto page 1, 2  Next  
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
Ormaaj
Guru
Guru


Joined: 28 Jan 2008
Posts: 319

PostPosted: Thu Aug 05, 2010 12:21 pm    Post subject: 2.6.35 hangs badly Reply with quote

2.6.35 kernel has been causing everything to freeze up after running certain heavy loads like compiling. All processes freeze except when triggering interrupts by moving the mouse or switching between virtual desktops. For example - watching a video will freeze completely except when moving the mouse - everything returns to normal. The only indication is that my xmobar displays "cpu total not found" rather than the load %.

This has been going on since about rc1-rc2. It occurs in both unstable and stable zen kernel (2.6.34 with some backported things), all vanilla 2.6.35 rcs, but not vanilla 2.6.34.1. I really don't know where to start looking to diagnose this. A git bisect would take ages because it occurs unpredictably within an hour of booting. Arch is amd64. Not sure what kind of info would be useful in diagnosing.
Back to top
View user's profile Send private message
d2_racing
Bodhisattva
Bodhisattva


Joined: 25 Apr 2005
Posts: 13047
Location: Ste-Foy,Canada

PostPosted: Thu Aug 05, 2010 12:47 pm    Post subject: Reply with quote

Are you using the latest vanilla-sources or the gentoo-sources ?
Back to top
View user's profile Send private message
Ormaaj
Guru
Guru


Joined: 28 Jan 2008
Posts: 319

PostPosted: Thu Aug 05, 2010 1:12 pm    Post subject: Reply with quote

d2_racing wrote:
Are you using the latest vanilla-sources or the gentoo-sources ?

sys-kernel/vanilla-sources are installed through portage to satisfy the virtuals but I'm just checking out tags directly from git for both zen-sources and vanilla kernel.org kernel. I tried switching back to vanilla from zen to see whether it was specific to one of their patches but apparently it isn't.

Currently testing whether this might be related to the recent intel i7 cpuidle driver...


Last edited by Ormaaj on Thu Aug 05, 2010 4:22 pm; edited 1 time in total
Back to top
View user's profile Send private message
dufeu
l33t
l33t


Joined: 30 Aug 2002
Posts: 924
Location: US-FL-EST

PostPosted: Thu Aug 05, 2010 4:15 pm    Post subject: Reply with quote

Ormaaj wrote:
Currently testing whether this might be related to the recent intel i7 cpuidle driver...

I don't have any answers to this. I had some what may be AMD equivalent cpu idle {BIOS setting AMD C1E} issues. I 'resolved' the problem by disabling the function in the BIOS.

If you're indeed having cpuidle issues, it may be a more general problem with cpus that support it. So I'm just monitoring the thread.
_________________
People whom think M$ is mediocre, don't know the half of it.
Back to top
View user's profile Send private message
Timbers2k
Apprentice
Apprentice


Joined: 03 Oct 2003
Posts: 215

PostPosted: Thu Aug 05, 2010 5:42 pm    Post subject: Reply with quote

I'm having the exact same problem. The worst case for me is playing Wow! After a few minutes the sound start stuttering and then everything starts to crawl. Once I exit Wow my desktop is very slow, and gkrellm is showing cpu use staying high, but nothing shows in top.

I'm also using a Core i7, and I did set up the new cpuidle driver. I'll try disabling that and see if it makes a difference.
Back to top
View user's profile Send private message
Timbers2k
Apprentice
Apprentice


Joined: 03 Oct 2003
Posts: 215

PostPosted: Thu Aug 05, 2010 11:02 pm    Post subject: Reply with quote

It works fine if you disable the "Cpuidle Driver for Intel Processors" option under "Power management and ACPI options". Seems that this option is not quite ready for prime time.
Back to top
View user's profile Send private message
maj
Tux's lil' helper
Tux's lil' helper


Joined: 22 Nov 2002
Posts: 92

PostPosted: Fri Aug 06, 2010 4:22 pm    Post subject: Reply with quote

See, its the first thing I disabled when I found the problem - still exists for me, mplayer will output it's your system is too slow message, which given its an i7 system, I think not! Besides, video playback is handed off to the GPU!

Edit - ok, removed "ACPI Processor P-States driver" aswell as "Cpuidle Driver for Intel Processors", now the system is behaving itself!

Edit 2 - Or not, still have the issue - its less pronounced, but it still happens :(
Back to top
View user's profile Send private message
Cffeine
n00b
n00b


Joined: 09 Aug 2010
Posts: 1

PostPosted: Tue Aug 10, 2010 5:00 pm    Post subject: Reply with quote

Hey all,

Just thought I'd throw in my experience. I also have an i7 and was experiencing the same problem with 2.6.35 (gentoo-sources). Any heavy load would bring my computer to a crawl (with the exception of moving the mouse, clicking menus, etc. Similar to what the OP was experiencing). My system never seemed to recover but I'm not sure if I gave it adequate time to do so as the system would usually be so slow I would have difficulty getting the offending process to close. I would usually have to use SysReq+REISUB to get it to reboot. I tried disabling "ACPI Processor P-States driver" as well as "Cpuidle Driver for Intel Processors" as mentioned but with no luck.

I finally got it to run properly by also disabling all the "CPU Frequency scaling --->" options. Not sure which one was causing the problem as I didn't have time over the weekend to test every governor combination.

My system now remains responsive as it should, but I have a couple of other strange issues that are probably not related. Namely: if I try and manually mount one of my hard disks on the command line, mount just hangs and the system goes slightly unstable -- I can log out, but If I try and shut the computer down, it just hangs. Not sure if this is a 2.6.35 kernel specific problem or if its because I disabled CONFIG_IDE as instructed by udev. I do know I didn't have this problem with 2.6.34 but I didn't take udev's advice for disabling CONFIG_IDE until I started messing with 2.6.35.

Not trying to wander off-topic, but here is the output of dmesg | tail after trying to do a mount operation:
Code:

[<c017aaf8>] ? do_kern_mount+0x2f/0xb8
 [<c018ce3c>] ? do_mount+0x657/0x6ba
 [<c018b557>] ? copy_mount_options+0x7d/0xe7
 [<c018cf05>] ? sys_mount+0x66/0x9d
 [<c01025d0>] ? sysenter_do_call+0x12/0x26
Code: d2 89 d0 5b c3 55 57 89 c7 56 53 e8 ec 36 15 00 8b 5f 04 8b 77 08 8b 2d c4 5f 47 c0 83 c9 ff eb 12 8b 14 8d 58 68 5d c0 8b 47 14 <8b> 04 10 99 01 c3 11 d6 41 ba 10 00 00 00 89 e8 e8 a2 32 ff ff
EIP: [<c031d405>] __percpu_counter_sum+0x26/0x55 SS:ESP 0068:f5f63d20
CR2: 00000000025d3000
---[ end trace 1c099b19386537c6 ]---
note: mount[19808] exited with preempt_count 1


Not sure If I should file a bug since I don't know if I'm experienced enough to know if my problems are truly a kernel problem or something stupid I did.

Anyway, just thought I'd throw my 2.6.35 experiences out there in case it helps anyone.

Later
Back to top
View user's profile Send private message
dufeu
l33t
l33t


Joined: 30 Aug 2002
Posts: 924
Location: US-FL-EST

PostPosted: Tue Aug 10, 2010 11:54 pm    Post subject: Reply with quote

Cffeine wrote:
... Namely: if I try and manually mount one of my hard disks on the command line, mount just hangs and the system goes slightly unstable -- I can log out, but If I try and shut the computer down, it just hangs. ...

You're not stupid and you may or may not be having a kernel problem.

Given your existing experiences with cpu scaling, I'd try the following noting, however, these suggestions may or may not help and and are at your own risk:
  • Confirm that you have the latest BIOS installed for your motherboard

  • Go through your BIOS and simplify your settings. This means turn off any overclocking etc. Be especially sensitive to anything which might have an impact on either power conservation {don't permit the BIOS to do power conservation} or on PCIe bus activity.

    Personal experience suggests possible timing issues across your PCIe buss(es) which is one of the things which could cause your disk to 'hang'.

  • Replace the SATA cable to that disk. There are tools you can run to check for HDD faults. Beyond scope for the moment to cover them. Replacing the cable with a known good one is easier.

  • In the kernel, disable any disk controller driver you're not using.

  • Like the BIOS, simplify your kernel settings with an eye towards not using any power saving features etc.

  • Consider using a "Pappy's kernel seed" as your base kernel settings.

Good Luck!
_________________
People whom think M$ is mediocre, don't know the half of it.
Back to top
View user's profile Send private message
Ormaaj
Guru
Guru


Joined: 28 Jan 2008
Posts: 319

PostPosted: Wed Aug 11, 2010 4:28 am    Post subject: Reply with quote

I can also confirm that the new intel idle driver isn't the problem - but if the solution is disabling all power saving then that would really suck as I have quite an overclocked system. P/C-states along with frequency scaling do a very nice job of keeping the temps down and saving power when idle (which is 99% of the time) so I don't really want to disable them keeping my CPU buzzing away at 4 Ghz at all times.

Maybe I'll just stick to 2.6.34.x until this gets fixed upstream. There should really be no reason to disable ACPI C-states.

EDIT: I haven't noticed the problem yet with the performance governor and ACPI P-state driver enabled along with intel cpuidle. Only been testing for about 20 minutes though so we'll see.
Back to top
View user's profile Send private message
Ormaaj
Guru
Guru


Joined: 28 Jan 2008
Posts: 319

PostPosted: Wed Aug 11, 2010 9:04 am    Post subject: Reply with quote

Still seems fine after switching from ondemand to performance.

Also... just judging from the description relative to the symptoms, this one looks like a very likely culprit:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=1f85f87d4f81d1e5a2d502d48316a1bdc5acac0b
http://lwn.net/Articles/386990/
Back to top
View user's profile Send private message
dufeu
l33t
l33t


Joined: 30 Aug 2002
Posts: 924
Location: US-FL-EST

PostPosted: Wed Aug 11, 2010 9:54 pm    Post subject: Reply with quote

Ormaaj wrote:
Still seems fine after switching from ondemand to performance.

Also... just judging from the description relative to the symptoms, this one looks like a very likely culprit:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=1f85f87d4f81d1e5a2d502d48316a1bdc5acac0b
http://lwn.net/Articles/386990/

Thank you for the links. I wish I better understood what they mean. {sigh}

My general guidance and approach to giving suggestions is to strip everything to the minimum and then add back items/features/settings one at a time. For most people, this path works well and is very understandable to them.

You, on the other hand, obviously know a lot better what you're doing! :D

What concerns me with cpu idle, cpu frequency {over or under} and sleep modes are their potential impact on PCI(e) bus characteristics. The problem is that anything plugged into the bus may behave unpredictably if PCI(e) characteristics change. I've had several modern mobos (still being manufactured) where PCI(e) attached devices did some very strange things like GigaEthernet cards being able to run only at FastEthernet speeds, upper memory (above 4 Gigs RAM) errors and graphic display artifacts. I managed to trace these back to AMD cpu idle issues. And I don't even overclock my systems!

BTW - My personal take on the links you provided is that cpu idle is still under "active" development. So none of us should be surprised if specific systems behave strangely.

Take care and thanks again for the links.

:)
_________________
People whom think M$ is mediocre, don't know the half of it.
Back to top
View user's profile Send private message
Ormaaj
Guru
Guru


Joined: 28 Jan 2008
Posts: 319

PostPosted: Thu Aug 12, 2010 12:38 am    Post subject: Reply with quote

I don't think this should affect PCIe because none of this should have an effect on board frequencies or even the input frequency to the CPU. Current intel frequencies on i7 are quite complex. There's sort of a hierarchy of things which affect actual core frequencies: FSB (BCLK) -> multiplier -> ACPI P-States -> Intel Turbo. The latter two happen within the chip and should be pretty transparent to everything else, and Turbo you really don't have any control over and I don't really understand why Turbo isn't just another P-state because I would think that should have the same effect. Additionally, there are the C-states which control sleeping, and even more beyond that which is controlled by that intel idle driver... I guess ACPI wasn't good enough for Intel so they had to add their own proprietary extensions.

P-states are controlled in discrete increments by the frequency scaling driver and work independently for each core. It shoudln't cause side-effects in other subsystems. You can see the true frequencies and P-states and C-states in action if you install i7z (there's no ebuild afaik)

http://code.google.com/p/i7z/

and also powertop gives some C-states info

http://www.lesswatts.org/projects/powertop/

If you notice in i7z, p-states are still doing their thing even if you use the performance governor. Enabling the on-demand driver seems to make p-states more aggressive for some reason... when your cpu is idle it really gets down to lower frequencies than without it. Disabling the p-states driver entirely causes you to always run at the highest p-state.
Back to top
View user's profile Send private message
ferrarif5
Apprentice
Apprentice


Joined: 06 Sep 2003
Posts: 211
Location: Manchester, UK

PostPosted: Thu Aug 12, 2010 7:05 am    Post subject: Reply with quote

I've got the same issue as described in the thread, lagging response from keyboard, mouse, screen refresh with my CPU hitting 100% load, think I'll roll back to gentoo-sources-2.6.34 for now.
_________________
Asus P6X58D-E Mobo
Intel Core i7 920
18GB Corsair DDR3
User:335876 | Screenshot
Back to top
View user's profile Send private message
drescherjm
Advocate
Advocate


Joined: 05 Jun 2004
Posts: 2790
Location: Pittsburgh, PA, USA

PostPosted: Wed Aug 18, 2010 4:21 am    Post subject: Reply with quote

I may be having a different issue but instead of my kernel booting in 20s or so on my i7 3.0GHz it takes over 90s and typing or anything else in the terminal reminds me of the early 1990s and dialing into my university with a 1200 baud modem, heck 1200 baud was faster..
_________________
John

My gentoo overlay
Instructons for overlay
Back to top
View user's profile Send private message
dufeu
l33t
l33t


Joined: 30 Aug 2002
Posts: 924
Location: US-FL-EST

PostPosted: Sat Aug 21, 2010 12:26 pm    Post subject: Reply with quote

drescherjm wrote:
I may be having a different issue but instead of my kernel booting in 20s or so on my i7 3.0GHz it takes over 90s and typing or anything else in the terminal reminds me of the early 1990s and dialing into my university with a 1200 baud modem, heck 1200 baud was faster..

Have you reviewed this thread? Delays during boot and shutdown [SOLVED]

I realize it's AMD focused rather than Intel, but I suspect that the whole 'power saving' thing is going to be problematical for some time to come. In addition to AMD and Intel trying to gain a competitive edge in 'power savings', there are all the mobo manufacturers with their own (mis)understandings of what 'power savings' means, related BIOS issues and yada-yada-yada. I'm also seeing that the language translation issue from the mobo manufacturers is obfuscating what's happening as well.

With the plethora of available CPU 'features', possible BIOS settings and coordinated kernel settings, nothing is as simple or easy as it used to be.
_________________
People whom think M$ is mediocre, don't know the half of it.
Back to top
View user's profile Send private message
drescherjm
Advocate
Advocate


Joined: 05 Jun 2004
Posts: 2790
Location: Pittsburgh, PA, USA

PostPosted: Sat Aug 21, 2010 1:08 pm    Post subject: Reply with quote

Thanks. I will try disabling C1E. However that is only a temporary solution since I find it unacceptable to have to disable this power saving feature.

[EDIT]Well at least on my machine power management is not the cause. I disabled CxE and SpeedStep in my BIOS and the problem continued. So back to 2.6.34. [/EDIT]
_________________
John

My gentoo overlay
Instructons for overlay
Back to top
View user's profile Send private message
the_bard
n00b
n00b


Joined: 03 Dec 2002
Posts: 60
Location: Albany, NY

PostPosted: Mon Aug 23, 2010 4:47 pm    Post subject: Reply with quote

I'm running across symptoms as experienced by the above: Running gentoo-sources 2.6.35-r1, I'm getting odd "soft" hanging on boot. I've got a boot splash screen configured, so leaving it in silent mode simply displays the loading bar as normal, but it hangs. The system does respond to input... switching to the verbose boot splash causes the system to unhang, whereupon it appears to hang at the next event.

Any keyboard activity seems to temporarily resolve the issue, very briefly. I've babysat the boot process, hitting keys repeatedly, until my system tries to load KDM. I don't have the patience to continue at that point.

I upgraded to 2.6.35-r1 by copying the .config from 2.6.34, then performing a `make oldconfig`. Removing the new intel idle CPU and APCI P-States drivers have done nothing. If I remember correctly, I even disabled ACPI completely within the kernel... it made no difference. I swapped back to 2.6.34.

I'm running a Core i7-860 on an Intel DP55WG board, for what it's worth.
Back to top
View user's profile Send private message
frostschutz
Advocate
Advocate


Joined: 22 Feb 2005
Posts: 2977
Location: Germany

PostPosted: Mon Aug 23, 2010 7:00 pm    Post subject: Reply with quote

2.6.35 with intel cpuidle caused really bad hangs and panics in KVM for me. 2.6.35.3 without anything new (answered N to all when make oldconfig from 2.6.34.x) does not have bad hangs, but in less than 6 hours I already had a KVM machine crashing again. There is no problem whatsoever with 2.6.34.x. So something is definitely odd with 2.6.35 for me, on a Intel i7 920 machine. I'll try disabling anything power saving related next since disabling the cpu intel idle driver already improved things a lot. Other than that I'm out of ideas as to what could be the culprit.
Back to top
View user's profile Send private message
dufeu
l33t
l33t


Joined: 30 Aug 2002
Posts: 924
Location: US-FL-EST

PostPosted: Tue Aug 24, 2010 2:12 am    Post subject: Reply with quote

frostschutz wrote:
2.6.35 with intel cpuidle caused really bad hangs and panics in KVM for me. 2.6.35.3 ...

For What It's Worth: Even though I use both vmware and virtualbox {not at the same time}, I currently leave off all virtualisation support. I don't make any production use of either so it's one of the areas where I've simplified my kernel setup and accept the performance hit.

Also FWIW: I've always felt there's a dichotomy involved when combining aggressive power saving techniques {such as select cpu core idling} with the running of virtual machines. You either do one or the other but not both. i.e. "Power Saving" implies your load only occasionally needs all the hardware resources you have available. "Virtualization" implies that you are interleaving multiple workloads in order to achieve more complete utilization of resources. In my opinion these are contradictory goals and I've always geared my kernel settings to one or the other but not both.

As always, YMMV and all that. :D
_________________
People whom think M$ is mediocre, don't know the half of it.
Back to top
View user's profile Send private message
frostschutz
Advocate
Advocate


Joined: 22 Feb 2005
Posts: 2977
Location: Germany

PostPosted: Tue Aug 24, 2010 9:49 am    Post subject: Reply with quote

The VMs are idle themselves most of the time. With the host running 2.6.35, they get kernel soft lockups when they are not.

And all that does not explain why it works fine in 2.6.34 (and older), but not in 2.6.35, using basically the same config.

Edit: Disabling the Power Management section entirely seems to have further improved the situation.
Edit2: Nah, still soft lockups in KVM. Unrelated to power management after all. Must be KVM doing something strange.
Back to top
View user's profile Send private message
zx2c4
Developer
Developer


Joined: 09 Jun 2005
Posts: 177

PostPosted: Sun Sep 05, 2010 2:46 am    Post subject: Reply with quote

@the_bard

Your symptoms are exactly the same as mine. Did you find a solution eventually?
Back to top
View user's profile Send private message
optiluca
Guru
Guru


Joined: 16 Jan 2006
Posts: 545
Location: Rivergaro, Italy

PostPosted: Sun Sep 05, 2010 9:03 am    Post subject: Reply with quote

zx2c4 wrote:
@the_bard

Your symptoms are exactly the same as mine. Did you find a solution eventually?


I am also suffering the same issues with the 2.6.35 line of the gentoo sources. I have not experienced it once using zen sources. The only significant differences in setup I can think of the the use of BFS/BFQ schedulers and the SLQB allocator.

BTW I am also running an i7 system, dunno if that could be the source of the issue.

Anyway all I can say is that zen sources worked for me, even though I would still like to know what the hell is going on... :?
_________________
# "Hmm, sounds like your system froze up."
# "I don't know why. It's about 80 degrees in here!"

http://www.rinkworks.com/stupid/cs_mincing.shtml
Back to top
View user's profile Send private message
Genewb
Apprentice
Apprentice


Joined: 09 Jan 2007
Posts: 165

PostPosted: Mon Sep 06, 2010 3:12 am    Post subject: Reply with quote

There's a tracker ticket--which hasn't been answered--and another report on the mailing list was sent yesterday, which also hasn't been answered. It seems that no kernel hackers care much.
_________________
I don't give a darn about "experience", just functional copyleft software.
Back to top
View user's profile Send private message
drescherjm
Advocate
Advocate


Joined: 05 Jun 2004
Posts: 2790
Location: Pittsburgh, PA, USA

PostPosted: Fri Sep 10, 2010 12:29 am    Post subject: Reply with quote

Genewb

Thanks. The second thread led me to the solution. Nix found by bisecting the kernel that this problem was with the clocksource. In his case it was hpet. I was using tsc. Switching to acpi_pm and all is well.

https://forums.gentoo.org/viewtopic-t-842775-highlight-.html
_________________
John

My gentoo overlay
Instructons for overlay
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum