Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
3.3 kernel bugs & quirks thread
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Unsupported Software
View previous topic :: View next topic  
Author Message
kernelOfTruth
Watchman
Watchman


Joined: 20 Dec 2005
Posts: 5610
Location: Vienna, Austria; Germany; hello world :)

PostPosted: Sat Feb 25, 2012 12:43 pm    Post subject: 3.3 kernel bugs & quirks thread Reply with quote

Hi guys,

lets do the bugs & issue collection with 3.3-rc* or 3.3.* kernels




ok, here the first (severe) I enountered:


eSATA-hotplug only seems to work partially

meaning that after you've detached the harddrive and attach another one - it doesn't get re-scanned and/or recognized

so the port is basically dead until you restart your box



Quote:
[ 2921.164423] ata8: exception Emask 0x10 SAct 0x0 SErr 0x990000 action 0xe frozen
[ 2921.164428] ata8: irq_stat 0x00400000, PHY RDY changed
[ 2921.164433] ata8: SError: { PHYRdyChg 10B8B Dispar LinkSeq }
[ 2921.164440] ata8: hard resetting link
[ 2921.885705] ata8: SATA link down (SStatus 0 SControl 300)
[ 2926.877081] ata8: hard resetting link
[ 2927.181561] ata8: SATA link down (SStatus 0 SControl 300)
[ 2927.181575] ata8: limiting SATA link speed to 1.5 Gbps
[ 2932.173000] ata8: hard resetting link
[ 2932.477411] ata8: SATA link down (SStatus 0 SControl 310)
[ 2932.477424] ata8.00: disabled
[ 2932.477440] ata8: EH complete
[ 2932.477446] ata8.00: detaching (SCSI 7:0:0:0)
[ 2932.477784] sd 7:0:0:0: [sde] Synchronizing SCSI cache
[ 2932.477814] sd 7:0:0:0: [sde] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[ 2932.477820] sd 7:0:0:0: [sde] Stopping disk
[ 2932.477828] sd 7:0:0:0: [sde] START_STOP FAILED
[ 2932.477831] sd 7:0:0:0: [sde] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK


usually after that it should detect the change in state and at least attempt to detect a new harddrive (which wasn't always successful in the past)

but this now is a major show stopper



how would I normally trigger a manual rescan or disable / re-enable the port ?


if that works only the automatic triggering is defunct

otherwise it's really broken


anyone else encountered this ?

if yes - what eSATA controller ?

for me it's an JMicron:

Quote:
03:00.0 SATA controller: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (rev 03)
Subsystem: Acer Incorporated [ALI] Device 036b
Kernel driver in use: ahci
03:00.1 IDE interface: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (rev 03)
Subsystem: Acer Incorporated [ALI] Device 036b
Kernel driver in use: pata_jmicron




many thanks in advance ! :)
_________________
Unofficial minimal livecd x86/amd64 w/reiser4+truecrypt (by Neo2)
2.6.37.2_plus_v1: BFS, CFS,THP,compaction, zcache or TOI
Hardcore Linux user since 2004 :D
Back to top
View user's profile Send private message
genstorm
Advocate
Advocate


Joined: 05 Apr 2007
Posts: 2349
Location: Austria

PostPosted: Sat Feb 25, 2012 12:49 pm    Post subject: Reply with quote

Probably related? https://lkml.org/lkml/2012/1/14/4

As always, I'm using 3.3-drm and just found this in dmesg:
Code:
[ 9998.432442] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[ 9998.433008] render error detected, EIR: 0x00000010
[ 9998.433008]   IPEIR: 0x00000000
[ 9998.433008]   IPEHR: 0x00000000
[ 9998.433008]   INSTDONE: 0xfffffffe
[ 9998.433008]   INSTPS: 0x0001e000
[ 9998.433008]   INSTDONE1: 0xffffffff
[ 9998.433008]   ACTHD: 0x0ec0ba58
[ 9998.433008] page table error
[ 9998.433008]   PGTBL_ER: 0x00000001
[ 9998.433008] [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x00000010, masking

_________________
backend.cpp:92:2: warning: #warning TODO - this error message is about as useful as a cooling unit in the arctic
Back to top
View user's profile Send private message
genstorm
Advocate
Advocate


Joined: 05 Apr 2007
Posts: 2349
Location: Austria

PostPosted: Sun Mar 04, 2012 3:18 pm    Post subject: Reply with quote

Gave all of 3.3_rc6 a shot and it seems to be quite solid. I still need to patch it (as each version since 2.6.33) for correct external display resolution, but oh well. ;)
_________________
backend.cpp:92:2: warning: #warning TODO - this error message is about as useful as a cooling unit in the arctic
Back to top
View user's profile Send private message
skunk
Guru
Guru


Joined: 28 May 2003
Posts: 571
Location: granada, spain

PostPosted: Sun Mar 04, 2012 6:16 pm    Post subject: Reply with quote

genstorm wrote:
Gave all of 3.3_rc6 a shot and it seems to be quite solid. I still need to patch it (as each version since 2.6.33) for correct external display resolution, but oh well. ;)

which patch do you apply?
i'm wondering because i just get a subset of resolution choices for my lcd tv when switching from the catalyst to the radeon driver...
Back to top
View user's profile Send private message
genstorm
Advocate
Advocate


Joined: 05 Apr 2007
Posts: 2349
Location: Austria

PostPosted: Sun Mar 04, 2012 6:27 pm    Post subject: Reply with quote

Mine is only Intel drm/i915 and lid detection related, so that won't help you...
_________________
backend.cpp:92:2: warning: #warning TODO - this error message is about as useful as a cooling unit in the arctic
Back to top
View user's profile Send private message
SlashBeast
Moderator
Moderator


Joined: 23 May 2006
Posts: 2799

PostPosted: Sun Mar 04, 2012 7:21 pm    Post subject: Reply with quote

I happen to have the eSATA issue with 'SATA controller: Intel Corporation 5 Series/3400 Series Chipset 6 port SATA AHCI Controller (rev 06)', but I saw some patches on linux-ide supposed to fix it. http://www.spinics.net/lists/linux-ide/msg42853.html fwiw I have this issue with regular sata dvdrw connected via sata<>esata cable.
_________________
BitBucket -- better-initramfs to address many usecases and linux's limitations.
Back to top
View user's profile Send private message
kernelOfTruth
Watchman
Watchman


Joined: 20 Dec 2005
Posts: 5610
Location: Vienna, Austria; Germany; hello world :)

PostPosted: Mon Mar 05, 2012 10:04 am    Post subject: Reply with quote

genstorm wrote:
Probably related? https://lkml.org/lkml/2012/1/14/4

[snip]
[snip]


I'll try it out - thanks ! :)

SlashBeast wrote:
I happen to have the eSATA issue with 'SATA controller: Intel Corporation 5 Series/3400 Series Chipset 6 port SATA AHCI Controller (rev 06)', but I saw some patches on linux-ide supposed to fix it. http://www.spinics.net/lists/linux-ide/msg42853.html fwiw I have this issue with regular sata dvdrw connected via sata<>esata cable.


that threads also seems to refer to the same patch referenced to by genstorm

will try that :)


3.2 has bad latencies & chromium, firefox and other apps can't do anything while stuff is being copied newly (via rsync, etc.)

afaik 3.3 didn't have that to such a extreme degree


thanks !
_________________
Unofficial minimal livecd x86/amd64 w/reiser4+truecrypt (by Neo2)
2.6.37.2_plus_v1: BFS, CFS,THP,compaction, zcache or TOI
Hardcore Linux user since 2004 :D
Back to top
View user's profile Send private message
SlashBeast
Moderator
Moderator


Joined: 23 May 2006
Posts: 2799

PostPosted: Mon Mar 05, 2012 3:55 pm    Post subject: Reply with quote

because the 3.3 got smart writaback code, so you can, for example, ^C dd anytime and it will stop instant.
_________________
BitBucket -- better-initramfs to address many usecases and linux's limitations.
Back to top
View user's profile Send private message
genstorm
Advocate
Advocate


Joined: 05 Apr 2007
Posts: 2349
Location: Austria

PostPosted: Mon Mar 12, 2012 7:09 am    Post subject: Reply with quote

So, what about your bug? The final release is almost there. ;)
_________________
backend.cpp:92:2: warning: #warning TODO - this error message is about as useful as a cooling unit in the arctic
Back to top
View user's profile Send private message
kernelOfTruth
Watchman
Watchman


Joined: 20 Dec 2005
Posts: 5610
Location: Vienna, Austria; Germany; hello world :)

PostPosted: Mon Mar 12, 2012 9:54 am    Post subject: Reply with quote

SlashBeast wrote:
because the 3.3 got smart writaback code, so you can, for example, ^C dd anytime and it will stop instant.


yeah, it's really more noticeable than before :)


genstorm wrote:
So, what about your bug? The final release is almost there. ;)


it's fortunately fixed :)


thanks for the interest ;)


you tried asking Dave Airlie, Matthew Wilcox or one of the other developers whether the patch for the screen resolution can be included in the kernel ?

I'm sure it's annoying having to patch it into the kernel each time
_________________
Unofficial minimal livecd x86/amd64 w/reiser4+truecrypt (by Neo2)
2.6.37.2_plus_v1: BFS, CFS,THP,compaction, zcache or TOI
Hardcore Linux user since 2004 :D
Back to top
View user's profile Send private message
SlashBeast
Moderator
Moderator


Joined: 23 May 2006
Posts: 2799

PostPosted: Mon Mar 12, 2012 10:25 am    Post subject: Reply with quote

rc7 is working all right and finally the s2disk (uswsusp) does work, before on all rcs I had freeze on 'doing snapshot'. I think it could be releted to the sata issue as well.
_________________
BitBucket -- better-initramfs to address many usecases and linux's limitations.
Back to top
View user's profile Send private message
kernelOfTruth
Watchman
Watchman


Joined: 20 Dec 2005
Posts: 5610
Location: Vienna, Austria; Germany; hello world :)

PostPosted: Mon Mar 12, 2012 12:04 pm    Post subject: Reply with quote

SlashBeast wrote:
rc7 is working all right and finally the s2disk (uswsusp) does work, before on all rcs I had freeze on 'doing snapshot'. I think it could be releted to the sata issue as well.


for me it's a whole different situation:

from time to time it turns off the harddrives but doesn't turn off the computer and this only every 2nd to 10th times (from experience until now)

with pm-suspend it also doesn't switch to real suspend and instead keeps running with turned off harddrives

seems there are some power-management issues with the 3.3 release ...
_________________
Unofficial minimal livecd x86/amd64 w/reiser4+truecrypt (by Neo2)
2.6.37.2_plus_v1: BFS, CFS,THP,compaction, zcache or TOI
Hardcore Linux user since 2004 :D
Back to top
View user's profile Send private message
abulak
n00b
n00b


Joined: 12 Dec 2008
Posts: 29

PostPosted: Fri Jul 06, 2012 2:18 pm    Post subject: Reply with quote

Quote:
from time to time it turns off the harddrives but doesn't turn off the computer and this only every 2nd to 10th times (from experience until now)

with pm-suspend it also doesn't switch to real suspend and instead keeps running with turned off harddrives

seems there are some power-management issues with the 3.3 release ...


I'm experiencing the same problems with kernel 3.3-gentoo

Have You found a solution?
Back to top
View user's profile Send private message
kernelOfTruth
Watchman
Watchman


Joined: 20 Dec 2005
Posts: 5610
Location: Vienna, Austria; Germany; hello world :)

PostPosted: Fri Jul 06, 2012 5:02 pm    Post subject: Reply with quote

abulak wrote:
Quote:
from time to time it turns off the harddrives but doesn't turn off the computer and this only every 2nd to 10th times (from experience until now)

with pm-suspend it also doesn't switch to real suspend and instead keeps running with turned off harddrives

seems there are some power-management issues with the 3.3 release ...


I'm experiencing the same problems with kernel 3.3-gentoo

Have You found a solution?


sorry, unfortunately not

I've moved on to 3.4

from what I saw there are generally somewhat bigger power-management changes in 3.3, 3.4 probably also 3.5 which might cause some minor issues if there isn't/wasn't enough testing on most hardware


the kernel I'm currently running uses uksm and doesn't come up all the times therefore I'm currently not using suspend-to-ram at all, besides that I'm using it quite heavily during the whole day so suspend-to-ram isn't needed right now



for troubleshooting

- make sure you're using latest (3.3.8 based)
- try whether enabled/disabled opengl or composited desktop makes a change
- some apps also affect whether it succeeds or not
- pulseaudio +/-


hope that helps
_________________
Unofficial minimal livecd x86/amd64 w/reiser4+truecrypt (by Neo2)
2.6.37.2_plus_v1: BFS, CFS,THP,compaction, zcache or TOI
Hardcore Linux user since 2004 :D
Back to top
View user's profile Send private message
Ant P.
Advocate
Advocate


Joined: 18 Apr 2009
Posts: 2204
Location: UK

PostPosted: Fri Jul 06, 2012 5:29 pm    Post subject: Reply with quote

I've got a bizarre one that started around 3.3.6 and still happens on 3.4.4: If I boot normally, there's a 60 second hang. If I boot using bootchart2's cmdline, it works fine.

dmesg goes like this, and lines further out don't tell me much either:
Code:
[    0.295404] TCP: reno registered
[    0.295407] UDP hash table entries: 4096 (order: 5, 131072 bytes)
[    0.295456] UDP-Lite hash table entries: 4096 (order: 5, 131072 bytes)
[    0.295546] NET: Registered protocol family 1
[   63.573870] INFO: rcu_preempt detected stalls on CPUs/tasks: { 2 3} (detected by 1, t=18989 jiffies)
[   63.573972] INFO: Stall ended before state dump start
[   63.959686] pci 0000:01:00.0: Boot video device
[   63.959699] PCI: CLS 64 bytes, default 64

Anyone know if there's a thing I can turn on in kconfig to get this to print out more information?
Back to top
View user's profile Send private message
kernelOfTruth
Watchman
Watchman


Joined: 20 Dec 2005
Posts: 5610
Location: Vienna, Austria; Germany; hello world :)

PostPosted: Fri Jul 06, 2012 5:44 pm    Post subject: Reply with quote

Ant P. wrote:
I've got a bizarre one that started around 3.3.6 and still happens on 3.4.4: If I boot normally, there's a 60 second hang. If I boot using bootchart2's cmdline, it works fine.

dmesg goes like this, and lines further out don't tell me much either:
Code:
[    0.295404] TCP: reno registered
[    0.295407] UDP hash table entries: 4096 (order: 5, 131072 bytes)
[    0.295456] UDP-Lite hash table entries: 4096 (order: 5, 131072 bytes)
[    0.295546] NET: Registered protocol family 1
[   63.573870] INFO: rcu_preempt detected stalls on CPUs/tasks: { 2 3} (detected by 1, t=18989 jiffies)
[   63.573972] INFO: Stall ended before state dump start
[   63.959686] pci 0000:01:00.0: Boot video device
[   63.959699] PCI: CLS 64 bytes, default 64

Anyone know if there's a thing I can turn on in kconfig to get this to print out more information?


yes, you could enable all of the rcu debug functionality to make it more verbose (afaik in the kernel hacking section)

besides that enable frame-pointers and other stuff in the hacking section


hope that helps
_________________
Unofficial minimal livecd x86/amd64 w/reiser4+truecrypt (by Neo2)
2.6.37.2_plus_v1: BFS, CFS,THP,compaction, zcache or TOI
Hardcore Linux user since 2004 :D
Back to top
View user's profile Send private message
abulak
n00b
n00b


Joined: 12 Dec 2008
Posts: 29

PostPosted: Fri Jul 06, 2012 5:57 pm    Post subject: Reply with quote

Quote:
I've moved on to 3.4


So did I, but I still can't suspend... that last kernel with suspend working was 3.2.12 (I haven't tried 3.2.21) on the other hand, hibernation (kernel and tuxonice) works as it should.

I checked
Code:
/var/log/pm-suspend.log
for both kernels and they are essentially the same
(only the last lines are different: instead of "%date: performing suspend" in 3.2 there is "%date: Finish" in 3.4)

I think this is not opengl/composite issue. I tried with X running and without, with nvidia module present and unloaded -- the result is always the same.

suspend goes ok, but in the end after spinning off hard-drive, switching off music card and wifi -- screen remains lit and fan keeps going...

This looks like a deep kernel change in communicating to BIOS on suspending... but I'm not very knowledgeable on that.

Where did You check a 3.2-3.3-3.4 kernel diffs?
Back to top
View user's profile Send private message
kernelOfTruth
Watchman
Watchman


Joined: 20 Dec 2005
Posts: 5610
Location: Vienna, Austria; Germany; hello world :)

PostPosted: Fri Jul 06, 2012 6:14 pm    Post subject: Reply with quote

occasionally I'm surfing over to lkml.org and/or looking for some changes that might interest & affect me (including power management)

for final releases:

kernelnewbies.org/

and

http://www.h-online.com/open/

(kernel log)
_________________
Unofficial minimal livecd x86/amd64 w/reiser4+truecrypt (by Neo2)
2.6.37.2_plus_v1: BFS, CFS,THP,compaction, zcache or TOI
Hardcore Linux user since 2004 :D
Back to top
View user's profile Send private message
Ant P.
Advocate
Advocate


Joined: 18 Apr 2009
Posts: 2204
Location: UK

PostPosted: Sat Jul 07, 2012 12:45 pm    Post subject: Reply with quote

kernelOfTruth wrote:
yes, you could enable all of the rcu debug functionality to make it more verbose (afaik in the kernel hacking section)

besides that enable frame-pointers and other stuff in the hacking section

hope that helps

Did all of that, it doesn't tell me anything useful :/
Code:
[    0.295228] NET: Registered protocol family 1
[   65.971634] INFO: rcu_preempt detected stalls on CPUs/tasks:
[   65.971684]    1: (3 GPs behind) idle=1c6/0/0 drain=0 . timer=-1
[   65.971703]    2: (3 GPs behind) idle=09a/0/0 drain=0 . timer=-1
[   65.971723]    3: (3 GPs behind) idle=088/0/0 drain=0 . timer=-1
[   65.971741]    (detected by 0, t=19708 jiffies)
[   65.971772] INFO: Stall ended before state dump start
[   66.355707] pci 0000:01:00.0: Boot video device

Given that I've got a workaround I think I'm just going to ignore it until 3.5...
Back to top
View user's profile Send private message
kernelOfTruth
Watchman
Watchman


Joined: 20 Dec 2005
Posts: 5610
Location: Vienna, Austria; Germany; hello world :)

PostPosted: Sat Jul 07, 2012 4:02 pm    Post subject: Reply with quote

Ant P. wrote:
kernelOfTruth wrote:
yes, you could enable all of the rcu debug functionality to make it more verbose (afaik in the kernel hacking section)

besides that enable frame-pointers and other stuff in the hacking section

hope that helps

Did all of that, it doesn't tell me anything useful :/
Code:
[    0.295228] NET: Registered protocol family 1
[   65.971634] INFO: rcu_preempt detected stalls on CPUs/tasks:
[   65.971684]    1: (3 GPs behind) idle=1c6/0/0 drain=0 . timer=-1
[   65.971703]    2: (3 GPs behind) idle=09a/0/0 drain=0 . timer=-1
[   65.971723]    3: (3 GPs behind) idle=088/0/0 drain=0 . timer=-1
[   65.971741]    (detected by 0, t=19708 jiffies)
[   65.971772] INFO: Stall ended before state dump start
[   66.355707] pci 0000:01:00.0: Boot video device

Given that I've got a workaround I think I'm just going to ignore it until 3.5...


I'm not sure whether the following patch applies fine (it does on 3.4.2 at least) but you could give the following a try:


rcu: endless stalls

Code:
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 0da7b88..6462056d6 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -818,10 +818,25 @@ static void print_cpu_stall(struct rcu_state *rsp)
    set_need_resched();  /* kick ourselves to get things going. */
 }
 
+/**
+ * rcu_cpu_stall_reset - prevent further stall warnings in current grace period
+ *
+ * Set the stall-warning timeout way off into the future, thus preventing
+ * any RCU CPU stall-warning messages from appearing in the current set of
+ * RCU grace periods.
+ *
+ * The caller must disable hard irqs.
+ */
+void rcu_cpu_stall_reset(void)
+{
+   rcu_sched_state.jiffies_stall = jiffies + ULONG_MAX / 2;
+   rcu_bh_state.jiffies_stall = jiffies + ULONG_MAX / 2;
+   rcu_preempt_stall_reset();
+}
+
 static void check_cpu_stall(struct rcu_state *rsp, struct rcu_data *rdp)
 {
-   unsigned long j;
-   unsigned long js;
+   unsigned long j, js, flags;
    struct rcu_node *rnp;
 
    if (rcu_cpu_stall_suppress)
@@ -832,13 +847,23 @@ static void check_cpu_stall(struct rcu_state *rsp, struct rcu_data *rdp)
    if ((ACCESS_ONCE(rnp->qsmask) & rdp->grpmask) && ULONG_CMP_GE(j, js)) {
 
       /* We haven't checked in, so go dump stack. */
+      rcu_cpu_stall_suppress = 1;
       print_cpu_stall(rsp);
+      local_irq_save(flags);
+      rcu_cpu_stall_reset();
+      local_irq_restore(flags);
+      rcu_cpu_stall_suppress = 0;
 
    } else if (rcu_gp_in_progress(rsp) &&
          ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY)) {
 
       /* They had a few time units to dump stack, so complain. */
+      rcu_cpu_stall_suppress = 1;
       print_other_cpu_stall(rsp);
+      local_irq_save(flags);
+      rcu_cpu_stall_reset();
+      local_irq_restore(flags);
+      rcu_cpu_stall_suppress = 0;
    }
 }
 
@@ -848,22 +873,6 @@ static int rcu_panic(struct notifier_block *this, unsigned long ev, void *ptr)
    return NOTIFY_DONE;
 }
 
-/**
- * rcu_cpu_stall_reset - prevent further stall warnings in current grace period
- *
- * Set the stall-warning timeout way off into the future, thus preventing
- * any RCU CPU stall-warning messages from appearing in the current set of
- * RCU grace periods.
- *
- * The caller must disable hard irqs.
- */
-void rcu_cpu_stall_reset(void)
-{
-   rcu_sched_state.jiffies_stall = jiffies + ULONG_MAX / 2;
-   rcu_bh_state.jiffies_stall = jiffies + ULONG_MAX / 2;
-   rcu_preempt_stall_reset();
-}
-
 static struct notifier_block rcu_panic_block = {
    .notifier_call = rcu_panic,
 };

 



been some time I last went through lkml
_________________
Unofficial minimal livecd x86/amd64 w/reiser4+truecrypt (by Neo2)
2.6.37.2_plus_v1: BFS, CFS,THP,compaction, zcache or TOI
Hardcore Linux user since 2004 :D
Back to top
View user's profile Send private message
abulak
n00b
n00b


Joined: 12 Dec 2008
Posts: 29

PostPosted: Sun Nov 11, 2012 3:33 pm    Post subject: Problems Reply with quote

In case anyone has problems with sleep/suspend or hibernate which doesn't power off your laptop -- in my case it was related to kernel trying to perform asynchronous suspend of the drive (see this commit).

My solution is to

Code:
echo 0 > /sys/power/pm_async


before every suspend/hibernate (via /etc/pm/sleep.d hooks).

You may also try to revert the patch.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Unsupported Software All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum