Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Kernel Bug net/core/skbuff.c:127
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Networking & Security
View previous topic :: View next topic  
Author Message
Jarli
n00b
n00b


Joined: 22 Mar 2012
Posts: 8

PostPosted: Thu Jan 31, 2013 4:29 pm    Post subject: Kernel Bug net/core/skbuff.c:127 Reply with quote

Quote:

870499.556664] skbuff: skb_over_panic: text:ffffffff816ba890 len:1568 put:289 head:ffff880f76ca7000 data:ffff880f76ca7160 tail:0x780 end:0x6c0 dev:<NULL>
[870499.556685] ------------[ cut here ]------------
[870499.556687] kernel BUG at net/core/skbuff.c:127!
[870499.556690] invalid opcode: 0000 [#1] SMP
[870499.556692] CPU 0
[870499.556693] Modules linked in: microcode
[870499.556698]
[870499.556700] Pid: 17274, comm: openvpn Not tainted 3.5.7-gentoo #2 empty empty/S8230
[870499.556703] RIP: e030:[<ffffffff816315f1>] [<ffffffff816315f1>] skb_put+0x91/0xa0
[870499.556711] RSP: e02b:ffff88021a1afb98 EFLAGS: 00010296
[870499.556713] RAX: 000000000000008a RBX: 0000000000000121 RCX: 0000000000000044
[870499.556715] RDX: 00000000000000c6 RSI: 0000000000000007 RDI: ffff88021a1a0258
[870499.556716] RBP: ffff88021a1afbb8 R08: 000000000000ffff R09: 0000000000000a4b
[870499.556717] R10: 0000000000000000 R11: ffffffff81da4278 R12: 0000000000000607
[870499.556719] R13: ffff880055b69000 R14: ffff88006f832880 R15: 000000000000bd00
[870499.556728] FS: 00007fec19fee700(0000) GS:ffff880faa800000(0000) knlGS:0000000000000000
[870499.556731] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[870499.556732] CR2: 00007f4010395158 CR3: 000000077b509000 CR4: 0000000000000660
[870499.556734] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[870499.556736] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[870499.556738] Process openvpn (pid: 17274, threadinfo ffff88021a1ae000, task ffff88012e06d580)
[870499.556739] Stack:
[870499.556740] ffff880f76ca7160 0000000000000780 00000000000006c0 ffffffff819d4e1b
[870499.556743] ffff88021a1afc78 ffffffff816ba890 ffffffff8152512a ffff880fa6402900
[870499.556745] ffffffff8175461a ffffffff81754178 00000000000004ff 01ffffff81631a99
[870499.556748] Call Trace:
[870499.556753] [<ffffffff816ba890>] tcp_sendmsg+0x330/0xe10
[870499.556757] [<ffffffff8152512a>] ? tun_do_read.clone.24+0x1da/0x450
[870499.556761] [<ffffffff8175461a>] ? error_exit+0x2a/0x60
[870499.556764] [<ffffffff81754178>] ? retint_restore_args+0x5/0x6
[870499.556767] [<ffffffff816deb7f>] inet_sendmsg+0x5f/0xa0
[870499.556769] [<ffffffff81631c02>] ? __kfree_skb+0x42/0xa0
[870499.556772] [<ffffffff8162897e>] sock_sendmsg+0xee/0x120
[870499.556774] [<ffffffff81525488>] ? tun_chr_aio_read+0x78/0xc0
[870499.556779] [<ffffffff811309d2>] ? do_sync_read+0xe2/0x120
[870499.556781] [<ffffffff81628a24>] ? sockfd_lookup_light+0x24/0x80
[870499.556784] [<ffffffff8162b134>] sys_sendto+0x104/0x140
[870499.556787] [<ffffffff8103b1f5>] ? pvclock_clocksource_read+0x55/0xd0
[870499.556790] [<ffffffff81009a40>] ? xen_clocksource_read+0x20/0x30
[870499.556793] [<ffffffff81009bc9>] ? xen_clocksource_get_cycles+0x9/0x10
[870499.556796] [<ffffffff810854b2>] ? getnstimeofday+0x52/0xd0
[870499.556800] [<ffffffff81754939>] system_call_fastpath+0x16/0x1b
[870499.556802] Code: 00 00 48 89 44 24 10 8b 87 c0 00 00 00 48 89 44 24 08 48 8b 87 d0 00 00 00 48 c7 c7 d0 90 9d 81 48 89 04 24 31 c0 e8 84 fe 11 00 <0f> 0b 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83
[870499.556827] RIP [<ffffffff816315f1>] skb_put+0x91/0xa0
[870499.556830] RSP <ffff88021a1afb98>
[870499.556902] ---[ end trace 0ab373d3349c9b83 ]---
[892070.329129] ip (20187) used greatest stack depth: 744 bytes left



This is a bug that I had this morning on my cluster. I am not the most skilled Gentoo person around, and I am learning the system. This system was setup by a vendor, and we are trying to find out what caused this and what the fix is.

Any help would be greatly appreciated.

There are numerous sites around google that are similar but not exact.

http://www.serverphorums.com/read.php?12,341550

The one above is plastered just about everywhere I can find.
Back to top
View user's profile Send private message
ce110ut
Apprentice
Apprentice


Joined: 27 Sep 2002
Posts: 199

PostPosted: Thu Jan 31, 2013 4:44 pm    Post subject: Reply with quote

Hello Jarli,

Can you share the following information:

- what kernel version are you using?

The following will list the kernels installed on the host in question:
Code:
equery list sys-kernel/*


The following will show which (if multiple kernels installed) is active:
Code:
eselect kernel list


- how often does this happen?

- any diagnostics you can share? Is the host under unusual load when this happens?
Back to top
View user's profile Send private message
Jarli
n00b
n00b


Joined: 22 Mar 2012
Posts: 8

PostPosted: Thu Jan 31, 2013 4:47 pm    Post subject: Reply with quote

cannon1 ~ # equery list sys-kernel/*
* Searching for * in sys-kernel ...
[I--] [??] sys-kernel/gentoo-sources-3.2.1-r2:3.2.1-r2
[IP-] [ ] sys-kernel/gentoo-sources-3.3.8:3.3.8
[IP-] [ ] sys-kernel/gentoo-sources-3.4.9:3.4.9
[I--] [??] sys-kernel/gentoo-sources-3.5.2:3.5.2
[IP-] [ ] sys-kernel/gentoo-sources-3.5.7:3.5.7
[I--] [??] sys-kernel/git-sources-3.3_rc1:3.3_rc1
[IP-] [ ] sys-kernel/linux-headers-3.4:0

cannon1 ~ # eselect kernel list
Available kernel symlink targets:
[1] linux-3.2.1-gentoo-r2
[2] linux-3.3.8-gentoo
[3] linux-3.4.9-gentoo
[4] linux-3.5.2-gentoo
[5] linux-3.5.7-gentoo *

This is the first time we've had this issue occur. Just a two weeks ago we had a raid controller boot from the wrong drive and revert all of the data by a week. Fortunately I have backups of all the system data and was able to restore the data. I have a bios update that I have to apply to fix this issue from the MB manufacturer.

This seems to have occurred last evening at some time, but I can't be certain. Everything was operation at 5PM when I left for the day.
Back to top
View user's profile Send private message
ce110ut
Apprentice
Apprentice


Joined: 27 Sep 2002
Posts: 199

PostPosted: Thu Jan 31, 2013 6:11 pm    Post subject: Reply with quote

Is your raid controller 'new' or recent?

I ask because the message may be misleading.

The last time I dealt with this was several years ago. The company where I worked at the time (~2004) migrated from 2.4 to 2.6 kernel. Most of the gear we had was older and the drivers worked, save for the new network controllers we had.

The vendor didn't officially support 2.6 kernel but they did provide us with an release candidate driver. We noticed intermittent kernel panics and the last log line mentioned skbuff and SMP - just like yours.

It turned out that the driver didn't have blocks on a certain buffer. The driver presumably worked fine under a single-core processor / system. With SMP, some buffer was prone to a race condition which lead to the skbuff facility to throw a panic.


That said, I can only see this happening if you're using new drivers. If you're running hardware that requires external drivers, that MAY be the problem. Other than that, I'm guessing you'll have to test and see if you can forcibly reproduce the panic.
Back to top
View user's profile Send private message
Jarli
n00b
n00b


Joined: 22 Mar 2012
Posts: 8

PostPosted: Thu Jan 31, 2013 6:43 pm    Post subject: Reply with quote

The cluster system is hardly a year old.

Drivers possibly an issue, as I said before I do have to update the bios on the raid controller to resolve another issue.

But at this point you believe the issue to be a drivers issue with the raid controller to cause this then?
Back to top
View user's profile Send private message
ce110ut
Apprentice
Apprentice


Joined: 27 Sep 2002
Posts: 199

PostPosted: Thu Jan 31, 2013 7:18 pm    Post subject: Reply with quote

It's hard to say definitively, but that is where my money is given the information you shared. I strongly recommend that your next course of action is to do research.

I'd ask your team the following questions:

Did it only panic the two times you mentioned?

Do you have any monitoring that reports host activity for all the nodes in the cluster?

If so, How are the other nodes behaving?

Are the nodes the same build, tin and OS?
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Networking & Security All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum