Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Kernel 3.7.9 crashes (nouveau?)
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
tomtom69
Apprentice
Apprentice


Joined: 09 Nov 2010
Posts: 245
Location: Bavaria

PostPosted: Sun Feb 24, 2013 5:31 pm    Post subject: Kernel 3.7.9 crashes (nouveau?) Reply with quote

Hi,

since the last kernel update from 3.6.11 to 3.7.9 I get frequent freezes which seem to be graphics related.
It happens for example when using googleearh, mplayer or some other applications which use graphics with more bandwidth. glxgears is also a candidate to cause the lockup.
When the kernel crashes I only get stripes on the screen. Sometimes the picture freezes for some seconds until the stripes appear. No blinking LEDs, no chance to change terminal, no ssh, only reset is possible.
Unfortunately also no output in dmesg before the crash.
Chipset is Nvidia GeForce 7025 / nForce 630a.
Reverting to kernel 3.6.11 makes the crashes disappear.
Is this a known issue?

Kernel config is here:
http://pastebin.com/cv53m85H

tom
Back to top
View user's profile Send private message
Maitreya
Guru
Guru


Joined: 11 Jan 2006
Posts: 441

PostPosted: Tue Apr 02, 2013 8:49 am    Post subject: Reply with quote

For stripes to appear after a crash would seem like a memory issue on the videocard. This could be heat. Did any powersaving/fan options changed during the kernel upgrade or could you monitor temperatures?
Back to top
View user's profile Send private message
Chris W
l33t
l33t


Joined: 25 Jun 2002
Posts: 972
Location: Brisbane, Australia

PostPosted: Wed Apr 03, 2013 7:54 am    Post subject: Reply with quote

I've been seeing hard locks on NVidia hardware with 3.7.10 Gentoo sources (a MythTV box). Have not found the cause yet: they're intermittent and do not seem to have a consistent trigger. I'm rolling back to 3.6 and I'll see how it goes.
_________________
Cheers,
Chris W
"Common sense: The collection of prejudices acquired by age 18." -- Einstein
Back to top
View user's profile Send private message
tomtom69
Apprentice
Apprentice


Joined: 09 Nov 2010
Posts: 245
Location: Bavaria

PostPosted: Wed Apr 03, 2013 7:03 pm    Post subject: Reply with quote

Hi,

The chipset is onboard and has no specific cooling or dedicated memory. A manual temperature check of the chipset does not show any "hot spots" so I assume temperature should not be an issue. The crashes appear reproducible at the start of a graphics intensive application (kernel 3.7), where I would not assume any temperature rise, but never with kernel 3.6.11. 3D accel or such things are not used - no gaming etc.
At the moment I hope that kernel 3.8 cures this - till then I will get stuck at 3.6.11, which works without any problems since getting stable.

tom
Back to top
View user's profile Send private message
TomWij
Retired Dev
Retired Dev


Joined: 04 Jul 2012
Posts: 1553

PostPosted: Wed Apr 03, 2013 8:02 pm    Post subject: Reply with quote

There has been refactoring in Nouveau in the 3.7 branch, you will definitely want to avoid it and prefer an earlier or the latest kernel.
Back to top
View user's profile Send private message
ian.au
Guru
Guru


Joined: 07 Apr 2011
Posts: 591
Location: Australia

PostPosted: Sat Apr 06, 2013 4:24 am    Post subject: Reply with quote

Irritating, I keep away from nVidia cards in general for my Gentoo machines.

I have an old Toshiba Satellite Laptop P-20 Laptop using NV34M [GeForce FX Go5200 64M] which is totally broken under 3.7.10 (loses graphics altogether as soon as the module comes up.

Another machine here runs GT218 [GeForce 210] graphics and seems to be fine so far under 3.7.10.

I.
Back to top
View user's profile Send private message
TomWij
Retired Dev
Retired Dev


Joined: 04 Jul 2012
Posts: 1553

PostPosted: Sat Apr 06, 2013 10:00 am    Post subject: Reply with quote

ian.au wrote:
Another machine here runs GT218 [GeForce 210] graphics and seems to be fine so far under 3.7.10.


Fixed that one on my side. What does dmesg say? (Well, you can look at /var/log/messages after a reboot)
Back to top
View user's profile Send private message
ian.au
Guru
Guru


Joined: 07 Apr 2011
Posts: 591
Location: Australia

PostPosted: Sat Apr 06, 2013 10:14 am    Post subject: Reply with quote

Quote:
Fixed that one on my side. What does dmesg say? (Well, you can look at /var/log/messages after a reboot)


[* Edit - correct broken card]

On the broken [GeForce FX Go5200 64M] machine, I just updated to kernel 3.8.4 and get the following dmesg (same problem as 2.7.10 wrt this kernel, machine boots, no screen :evil:

Seems to just die silently. Machine is running fine, I'm ssh'd into it atm.

Code:

ln1 ~ # dmesg |grep nouveau
[    9.134815] nouveau  [  DEVICE][0000:01:00.0] BOOT0  : 0x034400a2
[    9.134821] nouveau  [  DEVICE][0000:01:00.0] Chipset: NV34 (NV34)
[    9.134824] nouveau  [  DEVICE][0000:01:00.0] Family : NV30
[    9.136306] nouveau  [   VBIOS][0000:01:00.0] checking PRAMIN for image...
[    9.201109] nouveau  [   VBIOS][0000:01:00.0] ... checksum invalid
[    9.201116] nouveau  [   VBIOS][0000:01:00.0] checking PROM for image...
[    9.201142] nouveau  [   VBIOS][0000:01:00.0] ... signature not found
[    9.201145] nouveau  [   VBIOS][0000:01:00.0] checking ACPI for image...
[    9.201149] nouveau  [   VBIOS][0000:01:00.0] ... signature not found
[    9.201152] nouveau  [   VBIOS][0000:01:00.0] checking PCIROM for image...
[    9.202204] nouveau  [   VBIOS][0000:01:00.0] ... checksum invalid
[    9.202208] nouveau  [   VBIOS][0000:01:00.0] using image from PRAMIN
[    9.202212] nouveau  [   VBIOS][0000:01:00.0] BMP version 5.27
[    9.202314] nouveau  [   VBIOS][0000:01:00.0] version 04.34.20.25.00
[    9.202735] nouveau W[  PTIMER][0000:01:00.0] unknown input clock freq
[    9.202747] nouveau  [     PFB][0000:01:00.0] RAM type: DDR1
[    9.202752] nouveau  [     PFB][0000:01:00.0] RAM size: 64 MiB
[    9.202757] nouveau  [     PFB][0000:01:00.0]    ZCOMP: 0 tags
[    9.207606] nouveau  [     DRM] VRAM: 63 MiB
[    9.207616] nouveau  [     DRM] GART: 128 MiB
[    9.207804] nouveau  [     DRM] BMP BIOS found
[    9.207809] nouveau  [     DRM] BMP version 5.39
[    9.207817] nouveau  [     DRM] Bios version 04.34.20.25
[    9.207823] nouveau  [     DRM] DCB version 2.2
[    9.207832] nouveau  [     DRM] DCB outp 00: 030002f3 00000005
[    9.207836] nouveau  [     DRM] DCB outp 01: 01010100 00009c40
[    9.207840] nouveau  [     DRM] DCB outp 02: 02020321 00000003
[    9.208000] nouveau  [     DRM] Loading NV17 power sequencing microcode
[    9.208073] nouveau  [     DRM] BIOS FP mode: 1440x900 (96210kHz pixel clock)
[    9.208700] nouveau  [     DRM] Saving VGA fonts
[    9.293442] nouveau  [     DRM] 0 available performance level(s)
[    9.293447] nouveau  [     DRM] c: core 199MHz memory 405MHz
[    9.294374] nouveau  [     DRM] MM: using M2MF for buffer copies
[    9.294384] nouveau  [     DRM] Calling LVDS script 1:
[    9.294389] nouveau  [     DRM] Calling LVDS script 6:
[    9.294392] nouveau  [     DRM] 0xADAF: Parsing digital output script table
[    9.796236] nouveau  [     DRM] Setting dpms mode 3 on TV encoder (output 2)
[    9.843390] nouveau  [     DRM] allocated 1440x900 fb: 0x9000, bo f54aa200
[    9.843492] fbcon: nouveaufb (fb0) is primary device
[    9.855582] nouveau  [     DRM] Calling LVDS script 2:
[    9.855586] nouveau  [     DRM] 0xAEF7: Parsing digital output script table
[    9.903666] nouveau  [     DRM] Calling LVDS script 5:
[    9.903670] nouveau  [     DRM] 0xAD98: Parsing digital output script table
[    9.909290] nouveau 0000:01:00.0: fb0: nouveaufb frame buffer device
[    9.909294] nouveau 0000:01:00.0: registered panic notifier
[    9.909300] [drm] Initialized nouveau 1.1.0 20120801 for 0000:01:00.0 on minor 0
[  750.560023] nouveau  [     DRM] Calling LVDS script 6:
[  750.560026] nouveau  [     DRM] 0xADAF: Parsing digital output script table
[ 2034.748077] nouveau  [     DRM] Calling LVDS script 2:
[ 2034.748080] nouveau  [     DRM] 0xAEF7: Parsing digital output script table
[ 2034.796154] nouveau  [     DRM] Calling LVDS script 5:
[ 2034.796156] nouveau  [     DRM] 0xAD98: Parsing digital output script table


Paste of the entire dmesg here http://pastebin.com/ck0PfRsH

Cheers,

Ian


Last edited by ian.au on Sun Apr 07, 2013 1:08 am; edited 2 times in total
Back to top
View user's profile Send private message
TomWij
Retired Dev
Retired Dev


Joined: 04 Jul 2012
Posts: 1553

PostPosted: Sat Apr 06, 2013 6:56 pm    Post subject: Reply with quote

ian.au wrote:
Seems to just die silently.


That's unfortunate, did you verify it is not your DE? (/var/log/Xorg.0.log, /var/log/messages, ...)

You may ask on IRC in #nouveau on FreeNode if there are additional debugging techniques for silent problems like these.

The other option is to do a http://wiki.gentoo.org/wiki/Kernel_git-bisect if you have some time, that seems like the only way to really find the issue I think.

(See the help information of git bisect itself, you can bisect a path and therefore limit commits to the nouveau directory and spare out some reboots)
Back to top
View user's profile Send private message
ian.au
Guru
Guru


Joined: 07 Apr 2011
Posts: 591
Location: Australia

PostPosted: Sun Apr 07, 2013 1:04 am    Post subject: Reply with quote

TomWij wrote:
ian.au wrote:
Seems to just die silently.


That's unfortunate, did you verify it is not your DE? (/var/log/Xorg.0.log, /var/log/messages, ...)


Sorry, I put the wrong card descriptor in the above post for the broken machine, corrected that now in the above post.

Well the machine is booting into console and the screen dies on module loading, whilst processing uevents during boot and never comes back for console login; so I can't think it has anything to do with the DE.

I use metalog, so no /var/log/messages but the relevant logs are clean, to all intents and purposes the system thinks it's operating normally.

I'm thinking this problem is rooted at a lower level, I note that on the broken machine I have an uncachable register:

[GeForce FX Go5200 64M] x86
Code:

ln1 log # cat /proc/mtrr
reg00: base=0x000000000 (    0MB), size= 2048MB, count=1: write-back
reg01: base=0x07ff80000 ( 2047MB), size=  512KB, count=1: uncachable
reg02: base=0x0e0000000 ( 3584MB), size=  128MB, count=1: write-combining


Whilst on the machine that runs fine: [GeForce 210] amd64
Code:

lw1 ~ # cat /proc/mtrr
reg00: base=0x000000000 (    0MB), size= 2048MB, count=1: write-back
reg01: base=0x0c0000000 ( 3072MB), size=  256MB, count=1: write-combining


Maybe that's tripping the later kernel up on the x86 machine. Anyway, is running fine on kernel 3.5.7 so I've reverted to that on the x86 arch for the time being.

I may wait for the next stable release on x86 and see how that goes on the old laptop, I just don't have time to dig through this at the moment.

Thanks for taking an interest,

Ian
Back to top
View user's profile Send private message
tomtom69
Apprentice
Apprentice


Joined: 09 Nov 2010
Posts: 245
Location: Bavaria

PostPosted: Fri May 17, 2013 6:19 pm    Post subject: Reply with quote

Update:
Problem stays with kernel 3.8.13
Looks like I need to drop nouveau and use the nvidia blob again.
Back to top
View user's profile Send private message
wcg
Guru
Guru


Joined: 06 Jan 2009
Posts: 588

PostPosted: Sun May 19, 2013 1:54 pm    Post subject: Reply with quote

Does enabling the MTRR sanitizer by default help with the nforce630a
chipset? I have an nforce430 machine with that enabled, and it works
on that machine (no dmesg complaints about chunk size, etc). The gpu
is Ge6150SE.

In your kernel .config, I noticed that you have
Code:

CONFIG_MTRR_SANITIZER_ENABLE_DEFAULT=0


(IIRC that disables it by default at boot.)

On a 990FX chipset machine with a gt218, the mtrr sanitizer code in the kernel
simply does not work, but the gpu works fine with 3.7.10 nouveau anyway,
meaning either the BIOS mtrr settings are usable or the kernel is using
a newer alternative to mtrr available with newer cpu models (I saw a vague
note about that while reading the help for various options in make menuconfig.)

So, nouveau in 3.7.10 is working with the Ge6150SE in the nforce430 chipset
with the mtrr sanitizer enabled. I have not stressed it particularly, and it
did hang once or twice, but the first hang went away when I re-emerged the
xorg nouveau driver, and the second one seemed like something not related
to video (flaky motherboard that complains about various wierd power-related
things from time to time; not reproducible).

Anyway, try enabling the MTRR sanitizer with the nforce630a and see if that
helps nouveau on newer kernels.

edit:
Caveat: I compile xorg with "USE=-udev" and use the xorg mouse and keyboard
drivers rather than evdev for input drivers. So I am possibly not getting uevents.
(I do not know if that matters.)
_________________
TIA
Back to top
View user's profile Send private message
wcg
Guru
Guru


Joined: 06 Jan 2009
Posts: 588

PostPosted: Thu May 23, 2013 12:54 am    Post subject: Reply with quote

Actually I just started seeing this on the nforce430-Ge6150SE board with
4gb of ram and scads of swap when firefox-17.0.{5,6} starts, on kernel
3.7.10 (I know I tested this after the kernel was installed, and it worked
after re-emerging the xorg nouveau driver; apparently that was only luck)
and kernel 3.8.13. No problem with kernel 3.5.7, and no problem
with 3.7.10 on another box that has a 990fx chipset and a gt218 gpu
in a PCIe slot.

I ran memtest86, guessing it might be a memory problem (that's what firefox
does differently than most other processes that run in xfce4 in X,
allocate lots of memory when it starts), but the dimms tested with
0 errors (actually I suspected a dimm socket rather than the dimms
themselves, but neither produced any errors). Does not happen when
I start evince and load a .pdf, gimp, emacs, etc.

Hangs the kernel (cannot ssh in and kill the process, because the kernel
is no longer running, so the network is not responding). No messages
in /var/log/*, no xorg messages, etc.

I have a spare nvidia PCIe x16 card I can try in the nforce430 box and
see if it happens with a 3.7.10 or 3.8.x kernel. (The Ge6150SE onboard
video will reserve some of main memory for a video memory buffer;
a PCIe card will not, yet they will use the same video driver.)

But it could be other things than the video driver (something to do with chipset
setup in the BIOS). The system hangs before firefox ever displays the page.
_________________
TIA
Back to top
View user's profile Send private message
_______0
Guru
Guru


Joined: 15 Oct 2012
Posts: 521

PostPosted: Thu May 23, 2013 11:54 am    Post subject: Reply with quote

upgrade card's bios, many seem to shipped with questionable stability.

Nouveau had fixed an important bug in 3.9, can't recall the details but might be worth trying. Bear in mind that nouvau lags behind in sophistication due to nvidias attitued. On the other hand radeon's are rock solid.

I experienced similar issues, search my posts. What I did is to purposely stress the card with many things until triggering the darn bug. It's fun because it's predictable. Without stressing it's all aight though.

let me know about 3.9.
Back to top
View user's profile Send private message
tomtom69
Apprentice
Apprentice


Joined: 09 Nov 2010
Posts: 245
Location: Bavaria

PostPosted: Thu May 23, 2013 7:05 pm    Post subject: Reply with quote

Hi,

I tried the following:
(1) CONFIG_MTRR_SANITIZER_ENABLE_DEFAULT=1
(2) Disabling MTRR support completely
(3) increase the shared memory size from 64MB to 256MB
but unfortunately everything without success. The time to crash varies, but it was always less than some minutes. Reminds me to Win9x which caused me to move to linux a long time ago.

Hardware defect should not be the case because I can see the fault on 2 systems with identical chipset. And I have a system with a different chipset (GeForce 6150SE) but nearly same kernel .config which works without problems using nouveau.

I'll give kernel 3.9 a try as soon as it hits stable and report the results.

tom
Back to top
View user's profile Send private message
wcg
Guru
Guru


Joined: 06 Jan 2009
Posts: 588

PostPosted: Fri May 24, 2013 2:36 am    Post subject: Reply with quote

Yes, I don't think it is really hardware, or at least not the Ge 6150SE video
hardware. That hardware works fine with nouveau on kernel 3.5.7. firefox-17.0.5
and firefox-17.0.6 load on that kernel and run with no problems. That gpu-driver
combination worked fine on 3.3.8, 3.2.12, 3.1.6, and so on.

It is a new bug (new for me with 3.7.x kernels, anyway), and it is intermittent.
It seems to depend on one's memory allocation pattern before the offending
process runs whether the kernel hangs or not (reference the time firefox
worked on 3.7.10 right after I re-emerged the xorg nouveau driver and
restarted xorg).

So, I guess one would have to step up kernel versions one version at a time
until seeing the hang, then git bisect it to find the actual kernel patch that
allows it to happen. (Unless one has more money than time and can just
replace the mb and/or gpu with products that don't trigger the hang.)

If it is fixed in kernel 3.9.x, that would be cool (someone else already found
it and fixed it).
_________________
TIA
Back to top
View user's profile Send private message
tomtom69
Apprentice
Apprentice


Joined: 09 Nov 2010
Posts: 245
Location: Bavaria

PostPosted: Fri May 24, 2013 6:21 pm    Post subject: Reply with quote

GE6150SE works for me on one machine. What does not work is GE7025.
But I do not know how to get and apply all the patches from 3.6.11 to 3.7.10 in order to see when the problem apperared first.
For filing a bug I think some more information would be necessary than just "hangs with striped screen". However the crash is so "quick and heavy" that things like sysrq key or log files are all empty.
Back to top
View user's profile Send private message
wcg
Guru
Guru


Joined: 06 Jan 2009
Posts: 588

PostPosted: Sat May 25, 2013 2:05 pm    Post subject: Reply with quote

I would have said kernel.org to get the source to any kernel version,
but connecting to http://www.kernel.org/ seems to be broken for
this. One gets a front page with a filtered list instead of a directory
listing that one can navigate to a directory with base 3.x kernel source
trees and then a long list of 3.x.x patches.

(I am *not* reconfiguring my router to let in remotely initiated
connect attempts so that ftp works.)

However, connecting to https://www.kernel.org/pub/linux/kernel/v3.x/
gets one to the traditional directory listing of kernel source trees
and version patches.

edit:
These would not be managed by portage, of course, and I don't know
if genkernel will work with kernel.org source trees. Old school.
_________________
TIA
Back to top
View user's profile Send private message
TomWij
Retired Dev
Retired Dev


Joined: 04 Jul 2012
Posts: 1553

PostPosted: Fri May 31, 2013 8:44 pm    Post subject: Reply with quote

Please file this at https://bugs.gentoo.org as we can't actively track the forums to fix things, thanks in advance.
Back to top
View user's profile Send private message
tomtom69
Apprentice
Apprentice


Joined: 09 Nov 2010
Posts: 245
Location: Bavaria

PostPosted: Mon Jun 03, 2013 6:30 pm    Post subject: Reply with quote

OK. Bug 472200 Submitted. Hopefully with all information needed.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum