Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
EQ Overflow from nvidia Crashing X
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Desktop Environments
View previous topic :: View next topic  
Author Message
ExecutorElassus
l33t
l33t


Joined: 11 Mar 2004
Posts: 729
Location: Stuttgart, Germany

PostPosted: Mon Jun 11, 2012 1:09 pm    Post subject: EQ Overflow from nvidia Crashing X Reply with quote

So, I've been recently updating, and chrome, xorg-server, and adobe-flash have all recently been emerged. I seem to be having this problem, whereby my WM starts having screen corruption (random rows of pixels, sections of the screen black out, etc etc), gradually worsening until the X server freezes, uses 100% of a core, and I have to kill it from an ssh connection.

Now I'm using Opera, to see if it's just chrome (no issues with Opera so far), or something more general.

There's nothing in the Xorg.log.0 file, so I'm not sure how to track this down. I have hardware acceleration enabled, and VDPAU disabled (the former to get rid of the blue meanies, the latter to keep embedded videos from leaking all over the screen).

Any suggestions how I might pin down this bug?

Thanks,

EE

EDIT: um, it might maybe be that a loose DVI cable was causing all the trouble, since I was having all the same problems from the BIOS screen, and - now that I gave my monitor a reach-around and screwed the cable in tightly (*cough*) - I don't seem to have any problems. So, uh, whoops.

EDIT 2: I take that last one back. I tightened the cable, and things have been fine for a while, but now they are back where they were: intermittently, with no log entries, and no error messages, the screen starts randomly flickering (like, a flash every 5--10 minutes), and I suspect, that - left unchecked - the flickers will increase in frequency and severity, until I get to 100% CPU usage from X, and the system freezes. Killing X over ssh from another box seems to fix it, and I seem to be able to avert any problems by killing both my browser (in this case, also Opera), and gkrellm.

But like I said: I'm getting no error messages anywhere, so I'm just flying blind right now. Any advice on how I can figure out what's going wrong?

EDIT 3: Well, I got smart and ssh'ed into the box while it was frozen, and now I have some log messages. Here's what I see in Xorg.log.0:

Code:
[ 27631.970] (**) NVIDIA(0):     has been enabled on all display devices.)
[ 29337.163] [mi] EQ overflowing.  Additional events will be discarded until existing events are processed.
[ 29337.163]
[ 29337.163] Backtrace:
[ 29337.163] 0: /usr/bin/X (xorg_backtrace+0x36) [0x5684c6]
[ 29337.163] 1: /usr/bin/X (mieqEnqueue+0x273) [0x549563]
[ 29337.163] 2: /usr/bin/X (0x400000+0x49a8d) [0x449a8d]
[ 29337.163] 3: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7f61925e7000+0x60f0) [0x7f61925ed0f0]
[ 29337.163] 4: /usr/bin/X (0x400000+0x711c7) [0x4711c7]
[ 29337.163] 5: /usr/bin/X (0x400000+0x9583a) [0x49583a]
[ 29337.163] 6: /lib64/libc.so.6 (0x7f6197fcd000+0x35620) [0x7f6198002620]
[ 29337.163] 7: linux-vdso.so.1 (__vdso_gettimeofday+0x59) [0x7fff79d93929]
[ 29337.163] 8: /usr/lib64/xorg/modules/drivers/nvidia_drv.so (0x7f6193132000+0x883a5) [0x7f61931ba3a5]
[ 29337.163] 9: /usr/lib64/xorg/modules/drivers/nvidia_drv.so (0x7f6193132000+0xff713) [0x7f6193231713]
[ 29337.163] 10: /usr/lib64/xorg/modules/drivers/nvidia_drv.so (0x7f6193132000+0xc26a2) [0x7f61931f46a2]
[ 29337.163] 11: /usr/lib64/xorg/modules/drivers/nvidia_drv.so (0x7f6193132000+0x52a90c) [0x7f619365c90c]
[ 29337.163] 12: /usr/lib64/xorg/modules/drivers/nvidia_drv.so (0x7f6193132000+0x4f98ac) [0x7f619362b8ac]
[ 29337.163] 13: /usr/bin/X (BlockHandler+0x4f) [0x439b4f]
[ 29337.163] 14: /usr/bin/X (WaitForSomething+0x12a) [0x565aea]
[ 29337.163] 15: /usr/bin/X (0x400000+0x35a32) [0x435a32]
[ 29337.163] 16: /usr/bin/X (0x400000+0x24f0a) [0x424f0a]
[ 29337.163] 17: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x7f6197fef4bd]
[ 29337.163] 18: /usr/bin/X (0x400000+0x24aa9) [0x424aa9]
[ 29337.163]
[ 29337.163] [mi] These backtraces from mieqEnqueue may point to a culprit higher up the stack.
[ 29337.163] [mi] mieq is *NOT* the cause.  It is a victim.
[ 29337.571] [mi] EQ overflow continuing.  100 events have been dropped.
This message repeats several times, with the following added a few more backtraces down:
Code:
[ 29348.195]
[ 29348.195] [mi] These backtraces from mieqEnqueue may point to a culprit higher up the stack.
[ 29348.195] [mi] mieq is *NOT* the cause.  It is a victim.
[ 29348.720] (WW) NVIDIA(0): WAIT (2, 6, 0x8000, 0x0000a9f0, 0x00004e54)
[ 29355.720] (WW) NVIDIA(0): WAIT (1, 6, 0x8000, 0x0000a9f0, 0x00004e54)
[ 29355.720] [mi] Increasing EQ size to 1024 to prevent dropped events.
[ 29355.721] [mi] EQ processing has resumed after 25 dropped events.
[ 29355.721] [mi] This may be caused my a misbehaving driver monopolizing the server's resources.
[ 29367.458] [mi] EQ overflowing.  Additional events will be discarded until existing events are processed.


So now that I have some errors logged, what *is* the actual culprit? Is this a legitimate problem in the nvidia-driver? A misconfigured kernel?

Thanks for the help,

EE

PS: I changed the title to reflect more accurately the problem.
Back to top
View user's profile Send private message
bjlockie
Veteran
Veteran


Joined: 18 Oct 2002
Posts: 1182
Location: Canada

PostPosted: Sat Sep 21, 2013 3:49 pm    Post subject: Reply with quote

If you find a version of the xorg server that doesn't have this bug then that'd be great.

There is a bug report from 2013 but it sounds like the bug has existed for much longer.

https://bugs.freedesktop.org/show_bug.cgi?id=62912

I haven't found any post with a work around.

I have 2 monitors and it seems to happen less since I disconnected the second monitor.
_________________
AMD FX6100 CPU, 16 GiB RAM, OCZ Vertex 3 SSD
ASRock 970 Extreme3 motherboard with S/PDIF audio
Galaxy-NVidia GeForce 8800GT video card, Cyber Power CP550HG USB UPS
Back to top
View user's profile Send private message
PaulBredbury
Watchman
Watchman


Joined: 14 Jul 2005
Posts: 7310

PostPosted: Thu Oct 10, 2013 1:27 am    Post subject: Reply with quote

I just encountered this problem. What seems to help, is reverting back (from acpi_pm) to a quicker clocksource. In the bootloader's kernel command-line:

Code:
hpet=disable clocksource=tsc processor.max_cstate=1


Setting max_cstate is needed for my TSC to be stable. Here's some good clocksource info.

Edit: Bah, still not fixed completely. 0001-mieq-Bump-default-queue-size-to-512.patch should help.

Edit2: I've had no problems after changing the clocksource and adding Fedora's "512 queue size" patch 8)


Last edited by PaulBredbury on Sat Oct 19, 2013 4:31 pm; edited 1 time in total
Back to top
View user's profile Send private message
Ivion
n00b
n00b


Joined: 23 Jan 2003
Posts: 45
Location: Amsterdam

PostPosted: Mon Oct 14, 2013 7:23 pm    Post subject: Reply with quote

Just adding my voice to those having this problem.

I've had this problem for quite a while and I've searched and read reports of many other people having this problem, yet no solutions were to be found anywhere. It happens to me sporadically, usually when opening a video-file with mplayer/mpv or when opening a new link in Firefox. But it happens sporadically enough to not be a major issue, it happens maybe once every 1 to 2 months.

The most recent log of the crash can be found here. To summarize:
Code:
(EE) [mi] EQ overflowing.  Additional events will be discarded until existing events are processed.
(EE)
(EE) Backtrace:
(EE) 0: /usr/bin/X (xorg_backtrace+0x34) [0x592c54]
(EE) 1: /usr/bin/X (mieqEnqueue+0x263) [0x573a53]
(EE) 2: /usr/bin/X (0x400000+0x4edb4) [0x44edb4]
(EE) 3: /usr/bin/X (xf86PostMotionEvent+0xce) [0x489b6e]
(EE) 4: /usr/lib/xorg/modules/input/mouse_drv.so (0x7f9fd3058000+0x761f) [0x7f9fd305f61f]
(EE) 5: /usr/lib/xorg/modules/input/mouse_drv.so (0x7f9fd3058000+0x7c58) [0x7f9fd305fc58]
(EE) 6: /usr/lib/xorg/modules/input/mouse_drv.so (0x7f9fd3058000+0x4645) [0x7f9fd305c645]
(EE) 7: /usr/bin/X (0x400000+0x79477) [0x479477]
(EE) 8: /usr/bin/X (0x400000+0xa21f7) [0x4a21f7]
(EE) 9: /lib64/libpthread.so.0 (0x7f9fd8fe6000+0x10bf0) [0x7f9fd8ff6bf0]
(EE) 10: /usr/lib/xorg/modules/drivers/nvidia_drv.so (0x7f9fd3264000+0x6388d) [0x7f9fd32c788d]
(EE) 11: /usr/lib/xorg/modules/drivers/nvidia_drv.so (0x7f9fd3264000+0xdd42a) [0x7f9fd334142a]
(EE) 12: /usr/lib/xorg/modules/drivers/nvidia_drv.so (0x7f9fd3264000+0x93c12) [0x7f9fd32f7c12]
(EE) 13: /usr/lib/xorg/modules/drivers/nvidia_drv.so (0x7f9fd3264000+0x4c153c) [0x7f9fd372553c]
(EE) 14: /usr/lib/xorg/modules/drivers/nvidia_drv.so (0x7f9fd3264000+0x49d089) [0x7f9fd3701089]
(EE) 15: /usr/bin/X (BlockHandler+0x44) [0x43e364]
(EE) 16: /usr/bin/X (WaitForSomething+0x11d) [0x59016d]
(EE) 17: /usr/bin/X (0x400000+0x39f52) [0x439f52]
(EE) 18: /usr/bin/X (0x400000+0x28dc4) [0x428dc4]
(EE) 19: /lib64/libc.so.6 (__libc_start_main+0xed) [0x7f9fd7c8760d]
(EE) 20: /usr/bin/X (0x400000+0x2910d) [0x42910d]
(EE)
(EE) [mi] These backtraces from mieqEnqueue may point to a culprit higher up the stack.
(EE) [mi] mieq is *NOT* the cause.  It is a victim.
(EE) [mi] EQ overflow continuing.  100 events have been dropped.

This repeats a few times, with the dropped events increasing, until:
Code:
[953276.863] (WW) NVIDIA(0): WAIT (1, 8, 0x8000, 0x000047ac, 0x00007504)
[953276.863] [mi] Increasing EQ size to 512 to prevent dropped events.
[953276.864] [mi] EQ processing has resumed after 643 dropped events.
[953276.864] [mi] This may be caused my a misbehaving driver monopolizing the server's resources.
[953279.864] (WW) NVIDIA(0): WAIT (2, 8, 0x8000, 0x000047ac, 0x0000ca84)
[953286.864] (WW) NVIDIA(0): WAIT (1, 8, 0x8000, 0x000047ac, 0x0000ca84)
[953289.865] (WW) NVIDIA(0): WAIT (2, 8, 0x8000, 0x000047ac, 0x0000fe6c)
[953296.865] (WW) NVIDIA(0): WAIT (1, 8, 0x8000, 0x000047ac, 0x0000fe6c)
[953299.866] (WW) NVIDIA(0): WAIT (2, 8, 0x8000, 0x000047ac, 0x0000124c)
[953306.866] (WW) NVIDIA(0): WAIT (1, 8, 0x8000, 0x000047ac, 0x0000124c)
[953309.867] (WW) NVIDIA(0): WAIT (2, 8, 0x8000, 0x000047ac, 0x0000249c)
[953316.867] (WW) NVIDIA(0): WAIT (1, 8, 0x8000, 0x000047ac, 0x0000249c)
[953319.868] (WW) NVIDIA(0): WAIT (2, 8, 0x8000, 0x000047ac, 0x0000467c)
[953326.868] (WW) NVIDIA(0): WAIT (1, 8, 0x8000, 0x000047ac, 0x0000467c)


I have saved the relevant log from the previous 2 times this crash happened, those two crashes were with a different system (Core 2 Duo -> Core i5) with different archs (x86 -> amd64), but with the same graphics card (GeForce GTX 550 Ti). Here and here.

The post by PaulBredbury made me check which clocksource my system is using. It happens to be TSC already, so that doesn't seem to be the problem. I might try out the mieq patch for Xorg, but I'm not sure whether that will have any effect - since the dropped events exceed 512 in any case. The log even says "EQ processing has resumed after X dropped events.", yet X stays frozen and there's nothing I can do besides a hard reset.

It's a really puzzling error, that's for sure.
_________________
This post was created by millions of tiny cows jumping around on my keyboard.
Back to top
View user's profile Send private message
PaulBredbury
Watchman
Watchman


Joined: 14 Jul 2005
Posts: 7310

PostPosted: Tue Oct 15, 2013 1:28 am    Post subject: Reply with quote

Ivion wrote:
the dropped events exceed 512 in any case

Yes but that's *after* it's diverted effort to moan, when 256 is hit.

Try 0001-mieq-Bump-default-queue-size-to-512.patch, because it seems to have fixed the issue for me ;)

Edit: This patch has been committed to the xorg-server 1.15 branch.
Back to top
View user's profile Send private message
bandurvp
n00b
n00b


Joined: 04 Dec 2010
Posts: 10

PostPosted: Sat Jan 04, 2014 3:21 pm    Post subject: Reply with quote

I have the same problem on a mid-2010 MacBook, have had it since I got the laptop, and have been unable to find a solution. I would try the clocksource approach, but though /proc/cpuinfo shows ``tsc'' and ``constant_tsc'', /sys/bus/clocksource/devices/clocksource0/available_clocksource only shows ``hpet acpi_pm''. I also asked a question about this issue here. I apologize, but I only came across this thread when I also discovered the ``EQ'' messages in the Xorg log.
Back to top
View user's profile Send private message
PaulBredbury
Watchman
Watchman


Joined: 14 Jul 2005
Posts: 7310

PostPosted: Sat Jan 04, 2014 3:39 pm    Post subject: Reply with quote

Try acpi_pm (it's probably quicker) instead of hpet.

Try the patch that, ahem, I keep mentioning.

Don't use nvidia 331.20 - it messes up apps.
_________________
Improve your font rendering and ALSA sound
Back to top
View user's profile Send private message
bandurvp
n00b
n00b


Joined: 04 Dec 2010
Posts: 10

PostPosted: Sat Jan 04, 2014 4:04 pm    Post subject: Reply with quote

Thanks very much for the reply PaulBredbury, I have downgraded nvidia-drivers to 319.76 and changed the clocksource to acpi_pm, which seems to have at least improved the situation: I have not yet succeeded in reproducing the problem in the usual way. I'll wait on the patch until Xorg 1.15 makes it into Portage simply because I haven't worked with custom patches before. Thanks again!
Back to top
View user's profile Send private message
byebytoad
n00b
n00b


Joined: 10 Mar 2014
Posts: 2

PostPosted: Mon Mar 10, 2014 10:45 am    Post subject: Reply with quote

PaulBredbury wrote:


Edit: Bah, still not fixed completely. 0001-mieq-Bump-default-queue-size-to-512.patch should help.

Edit2: I've had no problems after changing the clocksource and adding Fedora's "512 queue size" patch 8)


Hi, I encountered the same issue and I wanted to ask
how can I apply the patch?

I'm on a debian testing installation
Back to top
View user's profile Send private message
krinn
Advocate
Advocate


Joined: 02 May 2003
Posts: 3907

PostPosted: Mon Mar 10, 2014 1:38 pm    Post subject: Reply with quote

http://www.cyberciti.biz/faq/appy-patch-file-using-patch-command/

But considering how hard that patch is, you can just edit the file with your favourite text editor and replace 256 with 512
Back to top
View user's profile Send private message
byebytoad
n00b
n00b


Joined: 10 Mar 2014
Posts: 2

PostPosted: Mon Mar 10, 2014 5:50 pm    Post subject: Reply with quote

krinn wrote:
http://www.cyberciti.biz/faq/appy-patch-file-using-patch-command/

But considering how hard that patch is, you can just edit the file with your favourite text editor and replace 256 with 512


Thanks for your reply.
My issue would be figuring which file should be edited.
Sorry for my dumbness but I would appreciate if anyone could tell me which file needs to be edited.

I looked into the patch in the hope of finding the answer to that, but I didn't find it as neither it seems it was mentioned in the discussion.
Only thing I understood is that it should be a xorg file, and I doubt it's the xorg.conf one.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Desktop Environments All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum