Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
swapper: page allocation failure
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
fangorn
Veteran
Veteran


Joined: 31 Jul 2004
Posts: 1886

PostPosted: Sat Oct 29, 2011 9:41 am    Post subject: swapper: page allocation failure Reply with quote

Hi,

I get the following messages on a production server with increasing amount.
It was running for months without problems. Then a few messages appeared.
Then the machine completely hung itself without even recognizing it.
After a hard reset (pulled the power cables because the machine did not
react to anything less intruding any more) these messages appear multiple
times a day while the server is doing next to nothing. Just a little NFS v3
server work for ten clients.

Code:
Oct 27 10:02:06 <server> kernel: [95082.443483] swapper: page allocation failure. order:0, mode:0x4020
Oct 27 10:02:06 <server> kernel: [95082.443489] Pid: 0, comm: swapper Not tainted 2.6.32-5-amd64 #1
Oct 27 10:02:06 <server> kernel: [95082.443493] Call Trace:
Oct 27 10:02:06 <server> kernel: [95082.443495]  <IRQ>  [<ffffffff810ba61a>] ? __alloc_pages_nodemask+0x592/0x5f4
Oct 27 10:02:06 <server> kernel: [95082.443514]  [<ffffffff810e696a>] ? new_slab+0x5b/0x1ca
Oct 27 10:02:06 <server> kernel: [95082.443520]  [<ffffffff810e6cc9>] ? __slab_alloc+0x1f0/0x39b
Oct 27 10:02:06 <server> kernel: [95082.443526]  [<ffffffff81249a8c>] ? __netdev_alloc_skb+0x29/0x45
Oct 27 10:02:06 <server> kernel: [95082.443532]  [<ffffffff810e76fb>] ? __kmalloc_node_track_caller+0xbb/0x11b
Oct 27 10:02:06 <server> kernel: [95082.443537]  [<ffffffff81249a8c>] ? __netdev_alloc_skb+0x29/0x45
Oct 27 10:02:06 <server> kernel: [95082.443544]  [<ffffffff81248ab9>] ? __alloc_skb+0x69/0x15a
Oct 27 10:02:06 <server> kernel: [95082.443550]  [<ffffffff8119bb88>] ? swiotlb_map_page+0x0/0xc4
Oct 27 10:02:06 <server> kernel: [95082.443552]  [<ffffffff81249a8c>] ? __netdev_alloc_skb+0x29/0x45
Oct 27 10:02:06 <server> kernel: [95082.443570]  [<ffffffffa00ce7b7>] ? e1000_alloc_rx_buffers+0x85/0x1b3 [e1000e]
Oct 27 10:02:06 <server> kernel: [95082.443576]  [<ffffffffa00cebcd>] ? e1000_clean_rx_irq+0x282/0x2bb [e1000e]
Oct 27 10:02:06 <server> kernel: [95082.443582]  [<ffffffffa00d0104>] ? e1000_clean+0x70/0x219 [e1000e]
Oct 27 10:02:06 <server> kernel: [95082.443585]  [<ffffffff810e584f>] ? __slab_free+0x7f/0x27a
Oct 27 10:02:06 <server> kernel: [95082.443591]  [<ffffffffa00ccb27>] ? e1000_put_txbuf+0x35/0x47 [e1000e]
Oct 27 10:02:06 <server> kernel: [95082.443596]  [<ffffffff8124fbe1>] ? net_rx_action+0xae/0x1c9
Oct 27 10:02:06 <server> kernel: [95082.443601]  [<ffffffff81053caf>] ? __do_softirq+0xdd/0x1a6
Oct 27 10:02:06 <server> kernel: [95082.443606]  [<ffffffffa00cce51>] ? e1000_intr_msix_tx+0x30/0x4f [e1000e]
Oct 27 10:02:06 <server> kernel: [95082.443611]  [<ffffffff81011cac>] ? call_softirq+0x1c/0x30
Oct 27 10:02:06 <server> kernel: [95082.443614]  [<ffffffff8101322b>] ? do_softirq+0x3f/0x7c
Oct 27 10:02:06 <server> kernel: [95082.443617]  [<ffffffff81053b1f>] ? irq_exit+0x36/0x76
Oct 27 10:02:06 <server> kernel: [95082.443619]  [<ffffffff81012922>] ? do_IRQ+0xa0/0xb6
Oct 27 10:02:06 <server> kernel: [95082.443622]  [<ffffffff810114d3>] ? ret_from_intr+0x0/0x11
Oct 27 10:02:06 <server> kernel: [95082.443623]  <EOI>  [<ffffffff8102c58c>] ? native_safe_halt+0x2/0x3
Oct 27 10:02:06 <server> kernel: [95082.443632]  [<ffffffffa018a1ad>] ? acpi_idle_do_entry+0x31/0x58 [processor]
Oct 27 10:02:06 <server> kernel: [95082.443636]  [<ffffffffa018a23c>] ? acpi_idle_enter_c1+0x68/0xb8 [processor]
Oct 27 10:02:06 <server> kernel: [95082.443640]  [<ffffffff81239e26>] ? cpuidle_idle_call+0x94/0xee
Oct 27 10:02:06 <server> kernel: [95082.443643]  [<ffffffff8100feb1>] ? cpu_idle+0xa2/0xda
Oct 27 10:02:06 <server> kernel: [95082.443648]  [<ffffffff814f3140>] ? early_idt_handler+0x0/0x71
Oct 27 10:02:06 <server> kernel: [95082.443651]  [<ffffffff814f3cdd>] ? start_kernel+0x3dc/0x3e8
Oct 27 10:02:06 <server> kernel: [95082.443654]  [<ffffffff814f33b7>] ? x86_64_start_kernel+0xf9/0x106
Oct 27 10:02:06 <server> kernel: [95082.443656] Mem-Info:
Oct 27 10:02:06 <server> kernel: [95082.443657] Node 0 DMA per-cpu:
Oct 27 10:02:06 <server> kernel: [95082.443659] CPU    0: hi:    0, btch:   1 usd:   0
Oct 27 10:02:06 <server> kernel: [95082.443661] CPU    1: hi:    0, btch:   1 usd:   0
Oct 27 10:02:06 <server> kernel: [95082.443663] CPU    2: hi:    0, btch:   1 usd:   0
Oct 27 10:02:06 <server> kernel: [95082.443665] CPU    3: hi:    0, btch:   1 usd:   0
Oct 27 10:02:06 <server> kernel: [95082.443666] Node 0 DMA32 per-cpu:
Oct 27 10:02:06 <server> kernel: [95082.443668] CPU    0: hi:  186, btch:  31 usd: 158
Oct 27 10:02:06 <server> kernel: [95082.443670] CPU    1: hi:  186, btch:  31 usd: 172
Oct 27 10:02:06 <server> kernel: [95082.443672] CPU    2: hi:  186, btch:  31 usd:  63
Oct 27 10:02:06 <server> kernel: [95082.443674] CPU    3: hi:  186, btch:  31 usd: 185
Oct 27 10:02:06 <server> kernel: [95082.443675] Node 0 Normal per-cpu:
Oct 27 10:02:06 <server> kernel: [95082.443677] CPU    0: hi:  186, btch:  31 usd: 192
Oct 27 10:02:06 <server> kernel: [95082.443679] CPU    1: hi:  186, btch:  31 usd: 159
Oct 27 10:02:06 <server> kernel: [95082.443680] CPU    2: hi:  186, btch:  31 usd:  66
Oct 27 10:02:06 <server> kernel: [95082.443682] CPU    3: hi:  186, btch:  31 usd: 180
Oct 27 10:02:06 <server> kernel: [95082.443687] active_anon:36074 inactive_anon:9148 isolated_anon:0
Oct 27 10:02:06 <server> kernel: [95082.443687]  active_file:533987 inactive_file:1177060 isolated_file:0
Oct 27 10:02:06 <server> kernel: [95082.443688]  unevictable:1517 dirty:4770 writeback:0 unstable:0
Oct 27 10:02:06 <server> kernel: [95082.443689]  free:10073 slab_reclaimable:251046 slab_unreclaimable:8685
Oct 27 10:02:06 <server> kernel: [95082.443690]  mapped:10507 shmem:77 pagetables:3683 bounce:0
Oct 27 10:02:06 <server> kernel: [95082.443692] Node 0 DMA free:15840kB min:20kB low:24kB high:28kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15280kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Oct 27 10:02:06 <server> kernel: [95082.443701] lowmem_reserve[]: 0 2991 8041 8041
Oct 27 10:02:06 <server> kernel: [95082.443704] Node 0 DMA32 free:21788kB min:4264kB low:5328kB high:6396kB active_anon:16924kB inactive_anon:4576kB active_file:506692kB inactive_file:1980000kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3063264kB mlocked:0kB dirty:9788kB writeback:0kB mapped:380kB shmem:0kB slab_reclaimable:376116kB slab_unreclaimable:10168kB kernel_stack:312kB pagetables:2872kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Oct 27 10:02:06 <server> kernel: [95082.443714] lowmem_reserve[]: 0 0 5050 5050
Oct 27 10:02:06 <server> kernel: [95082.443717] Node 0 Normal free:2664kB min:7200kB low:9000kB high:10800kB active_anon:127372kB inactive_anon:32016kB active_file:1629256kB inactive_file:2728240kB unevictable:6068kB isolated(anon):0kB isolated(file):0kB present:5171200kB mlocked:6068kB dirty:9292kB writeback:0kB mapped:41648kB shmem:308kB slab_reclaimable:628068kB slab_unreclaimable:24556kB kernel_stack:2672kB pagetables:11860kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Oct 27 10:02:06 <server> kernel: [95082.443727] lowmem_reserve[]: 0 0 0 0
Oct 27 10:02:06 <server> kernel: [95082.443729] Node 0 DMA: 2*4kB 1*8kB 1*16kB 2*32kB 2*64kB 2*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15840kB
Oct 27 10:02:06 <server> kernel: [95082.443737] Node 0 DMA32: 314*4kB 8*8kB 341*16kB 33*32kB 86*64kB 53*128kB 5*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 21912kB
Oct 27 10:02:06 <server> kernel: [95082.443744] Node 0 Normal: 368*4kB 9*8kB 0*16kB 1*32kB 1*64kB 0*128kB 0*256kB 2*512kB 0*1024kB 0*2048kB 0*4096kB = 2664kB
Oct 27 10:02:06 <server> kernel: [95082.443751] 1707109 total pagecache pages
Oct 27 10:02:06 <server> kernel: [95082.443753] 0 pages in swap cache
Oct 27 10:02:06 <server> kernel: [95082.443754] Swap cache stats: add 0, delete 0, find 0/0
Oct 27 10:02:06 <server> kernel: [95082.443755] Free swap  = 11717624kB
Oct 27 10:02:06 <server> kernel: [95082.443757] Total swap = 11717624kB
Oct 27 10:02:06 <server> kernel: [95082.471457] 2097136 pages RAM
Oct 27 10:02:06 <server> kernel: [95082.471458] 49852 pages reserved
Oct 27 10:02:06 <server> kernel: [95082.471460] 1022202 pages shared
Oct 27 10:02:06 <server> kernel: [95082.471461] 1070407 pages non-shared
Oct 27 10:02:06 <server> kernel: [95082.471463] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
Oct 27 10:02:06 <server> kernel: [95082.471466]   cache: kmalloc-4096, object size: 4096, buffer size: 4096, default order: 3, min order: 0
Oct 27 10:02:06 <server> kernel: [95082.471468]   node 0: slabs: 1182, objs: 3044, free: 0


Do I read this right that at network communication buffer allocation the kernel runs out of memory
and drops the process?

I have atop monitoring this machine every ten minutes and here are the printouts from before and after this error:

Code:
ATOP - <server>              2011/10/27  09:59:02              600 seconds elapsed
PRC | sys  36.15s | user  17.12s | #proc    257 | #zombie    0 | #exit    963 |
CPU | sys      6% | user      3% | irq       1% | idle    385% | wait      5% |
cpu | sys      3% | user      3% | irq       1% | idle     89% | cpu000 w  4% |
cpu | sys      2% | user      0% | irq       0% | idle     98% | cpu003 w  0% |
cpu | sys      1% | user      0% | irq       0% | idle     99% | cpu002 w  0% |
cpu | sys      1% | user      0% | irq       0% | idle     98% | cpu001 w  1% |
CPL | avg1   0.25 | avg5    0.26 | avg15   0.54 | csw  1309931 | intr 1282502 |
MEM | tot    7.8G | free  463.4M | cache   5.9G | buff  246.7M | slab    1.0G |
SWP | tot   11.2G | free   11.2G |              | vmcom 834.6M | vmlim  15.1G |
DSK |         sdb | busy     15% | read   12543 | write   2727 | avio    5 ms |
DSK |         sda | busy      0% | read     259 | write   4530 | avio    0 ms |
NET | transport   | tcpi  547521 | tcpo  409905 | udpi    2074 | udpo    1110 |
NET | network     | ipi   549273 | ipo   411027 | ipfrw      0 | deliv 549248 |
NET | eth0     0% | pcki  548716 | pcko  620165 | si 2971 Kbps | so 6149 Kbps |
NET | lo     ---- | pcki     858 | pcko     858 | si    3 Kbps | so    3 Kbps |

  PID  SYSCPU  USRCPU  VGROW  RGROW  RDDSK  WRDSK  ST EXC S  CPU CMD     1/47 
27213  10.91s   8.98s   272K   404K 54184K  1368K  --   - S   3% smbd
 1464  13.16s   0.36s     0K     0K     0K     0K  --   - S   2% gkrellmd
22360   6.67s   5.60s   152K   -24K 118.8M  1056K  --   - S   2% smbd


Code:
ATOP - <server>              2011/10/27  10:09:02              600 seconds elapsed
PRC | sys  56.75s | user   6.35s | #proc    258 | #zombie    0 | #exit   1530 |
CPU | sys      9% | user      1% | irq       5% | idle    342% | wait     43% |
cpu | sys      5% | user      1% | irq       5% | idle     64% | cpu000 w 26% |
cpu | sys      1% | user      0% | irq       0% | idle     97% | cpu003 w  1% |
cpu | sys      2% | user      0% | irq       0% | idle     87% | cpu001 w 11% |
cpu | sys      1% | user      0% | irq       0% | idle     95% | cpu002 w  4% |
CPL | avg1   0.16 | avg5    0.84 | avg15   0.77 | csw  4851725 | intr 4075693 |
MEM | tot    7.8G | free   56.4M | cache   6.9G | buff  226.4M | slab  440.5M |
SWP | tot   11.2G | free   11.2G |              | vmcom 836.2M | vmlim  15.1G |
PAG | scan 2019e3 | stall      0 |              | swin       0 | swout      0 |
DSK |         sdb | busy     40% | read   15233 | write 100967 | avio    2 ms |
DSK |         sda | busy      3% | read     874 | write   8171 | avio    2 ms |
NET | transport   | tcpi 7002578 | tcpo 3735999 | udpi    2212 | udpo    1335 |
NET | network     | ipi  7004629 | ipo  3737364 | ipfrw      0 | deliv 7005e3 |
NET | eth0    11% | pcki 7003715 | pcko 4866331 | si  111 Mbps | so   37 Mbps |
NET | lo     ---- | pcki    1185 | pcko    1185 | si    3 Kbps | so    3 Kbps |

  PID  SYSCPU  USRCPU  VGROW  RGROW  RDDSK  WRDSK  ST EXC S  CPU CMD     1/72 
22622  15.48s   2.26s     0K     0K    76K 418.8M  --   - S   3% apt-cacher-ng
 1464  13.87s   0.41s     0K     0K     0K     0K  --   - S   2% gkrellmd
 3089   3.36s   0.00s     0K     0K 217.6M   1.3G  --   - S   1% nfsd


As you might have noticed, this is not a Gentoo server, but in my experience
the people in this forum can give you answers that others can't. :wink:

The server has always had performance problems from time to time that were not
assignable to any events in the syslog. My gutt feeling tells me these problems
have something in common. As this exact same software has worked for over
half a year I fear this degradation is due to an increasing hardware problem of
either network chip or RAM.

I know of memtest to verify memory fitness - regardless the fact that I have to do
this on a production server and therefore replace it temporarily.
Is there a test to verify NIC fitness?

A little information on the machine:
Code:
-> uname -a
Linux lin71 2.6.32-5-amd64 #1 SMP Tue Jun 14 09:42:28 UTC 2011 x86_64 GNU/Linux

$ lspci
00:00.0 Host bridge: Intel Corporation 5520 I/O Hub to ESI Port (rev 22)
00:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 1 (rev 22)
00:03.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 3 (rev 22)
00:05.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express Root Port 5 (rev 22)
00:07.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 7 (rev 22)
00:09.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 9 (rev 22)
00:0d.0 Host bridge: Intel Corporation Device 343a (rev 22)
00:0d.1 Host bridge: Intel Corporation Device 343b (rev 22)
00:0d.2 Host bridge: Intel Corporation Device 343c (rev 22)
00:0d.3 Host bridge: Intel Corporation Device 343d (rev 22)
00:0d.4 Host bridge: Intel Corporation 5520/5500/X58 Physical Layer Port 0 (rev 22)
00:0d.5 Host bridge: Intel Corporation 5520/5500 Physical Layer Port 1 (rev 22)
00:0d.6 Host bridge: Intel Corporation Device 341a (rev 22)
00:0e.0 Host bridge: Intel Corporation Device 341c (rev 22)
00:0e.1 Host bridge: Intel Corporation Device 341d (rev 22)
00:0e.2 Host bridge: Intel Corporation Device 341e (rev 22)
00:0e.4 Host bridge: Intel Corporation Device 3439 (rev 22)
00:13.0 PIC: Intel Corporation 5520/5500/X58 I/O Hub I/OxAPIC Interrupt Controller (rev 22)
00:14.0 PIC: Intel Corporation 5520/5500/X58 I/O Hub System Management Registers (rev 22)
00:14.1 PIC: Intel Corporation 5520/5500/X58 I/O Hub GPIO and Scratch Pad Registers (rev 22)
00:14.2 PIC: Intel Corporation 5520/5500/X58 I/O Hub Control Status and RAS Registers (rev 22)
00:14.3 PIC: Intel Corporation 5520/5500/X58 I/O Hub Throttle Registers (rev 22)
00:16.0 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.1 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.2 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.3 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.4 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.5 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.6 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.7 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:1a.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #4
00:1a.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #5
00:1a.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #6
00:1a.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #2
00:1c.0 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 1
00:1c.4 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 5
00:1c.5 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 6
00:1d.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #1
00:1d.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #2
00:1d.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #3
00:1d.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #1
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)
00:1f.0 ISA bridge: Intel Corporation 82801JIR (ICH10R) LPC Interface Controller
00:1f.2 IDE interface: Intel Corporation 82801JI (ICH10 Family) 4 port SATA IDE Controller #1
00:1f.3 SMBus: Intel Corporation 82801JI (ICH10 Family) SMBus Controller
00:1f.5 IDE interface: Intel Corporation 82801JI (ICH10 Family) 2 port SATA IDE Controller #2
01:03.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200eW WPCM450 (rev 0a)
03:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
04:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
07:00.0 RAID bus controller: 3ware Inc 9750 SAS2/SATA-II RAID PCIe (rev 05)

$ lspci -n
00:00.0 0600: 8086:3406 (rev 22)
00:01.0 0604: 8086:3408 (rev 22)
00:03.0 0604: 8086:340a (rev 22)
00:05.0 0604: 8086:340c (rev 22)
00:07.0 0604: 8086:340e (rev 22)
00:09.0 0604: 8086:3410 (rev 22)
00:0d.0 0600: 8086:343a (rev 22)
00:0d.1 0600: 8086:343b (rev 22)
00:0d.2 0600: 8086:343c (rev 22)
00:0d.3 0600: 8086:343d (rev 22)
00:0d.4 0600: 8086:3418 (rev 22)
00:0d.5 0600: 8086:3419 (rev 22)
00:0d.6 0600: 8086:341a (rev 22)
00:0e.0 0600: 8086:341c (rev 22)
00:0e.1 0600: 8086:341d (rev 22)
00:0e.2 0600: 8086:341e (rev 22)
00:0e.4 0600: 8086:3439 (rev 22)
00:13.0 0800: 8086:342d (rev 22)
00:14.0 0800: 8086:342e (rev 22)
00:14.1 0800: 8086:3422 (rev 22)
00:14.2 0800: 8086:3423 (rev 22)
00:14.3 0800: 8086:3438 (rev 22)
00:16.0 0880: 8086:3430 (rev 22)
00:16.1 0880: 8086:3431 (rev 22)
00:16.2 0880: 8086:3432 (rev 22)
00:16.3 0880: 8086:3433 (rev 22)
00:16.4 0880: 8086:3429 (rev 22)
00:16.5 0880: 8086:342a (rev 22)
00:16.6 0880: 8086:342b (rev 22)
00:16.7 0880: 8086:342c (rev 22)
00:1a.0 0c03: 8086:3a37
00:1a.1 0c03: 8086:3a38
00:1a.2 0c03: 8086:3a39
00:1a.7 0c03: 8086:3a3c
00:1c.0 0604: 8086:3a40
00:1c.4 0604: 8086:3a48
00:1c.5 0604: 8086:3a4a
00:1d.0 0c03: 8086:3a34
00:1d.1 0c03: 8086:3a35
00:1d.2 0c03: 8086:3a36
00:1d.7 0c03: 8086:3a3a
00:1e.0 0604: 8086:244e (rev 90)
00:1f.0 0601: 8086:3a16
00:1f.2 0101: 8086:3a20
00:1f.3 0c05: 8086:3a30
00:1f.5 0101: 8086:3a26
01:03.0 0300: 102b:0532 (rev 0a)
03:00.0 0200: 8086:10d3
04:00.0 0200: 8086:10d3
07:00.0 0104: 13c1:1010 (rev 05)


I'm grateful for any hints on how to address this server problem.

fangorn

Edit: Is it possible that MTU settings have something to do with this? I do not remember
setting it up with jumbo frames, but I am not sure right now. Normally I keep the
default mtu=1500, even if the Intel chips and Ciscos should be able to handle Jumbo Frames.
_________________
Video Encoding scripts collection | Project page
Back to top
View user's profile Send private message
linuxtuxhellsinki
l33t
l33t


Joined: 15 Nov 2004
Posts: 700
Location: Hellsinki

PostPosted: Mon Oct 31, 2011 7:20 pm    Post subject: Reply with quote

Hello,

I've seen this in multiple old servers with older kernels and with those the solution was to increase the memory buffers of network like in this LINK. You can test by echoing to /proc like "echo "4096 655360 6553600" > /proc/sys/net/ipv4/tcp_wmem" and for permanent solution add 'em in /etc/sysctl.conf and run sysctl -p
Code:
net/core/rmem_max = 8738000
net/core/wmem_max = 6553600

net/ipv4/tcp_rmem = 8192 873800 8738000
net/ipv4/tcp_wmem = 4096 655360 6553600

vm/min_free_kbytes = 65536

But this was more like solution for some older 2.6.18 kernels or sth...

In some cases it was helping to turning off TCP TSO with ethtool -K eth0 tso off

And I've seen also similar swapper errors with some servers which are using broadcom "tg3"-driver, which is not your case.


Check also this about NFS+SLUB ?
https://forums.gentoo.org/viewtopic-t-843865-start-0.html


Hope that you find some solution to your problem with these.
_________________
1st use 'Search' & lastly add [Solved] to
the subject of your first post in the thread.
Back to top
View user's profile Send private message
fangorn
Veteran
Veteran


Joined: 31 Jul 2004
Posts: 1886

PostPosted: Tue Nov 01, 2011 12:18 pm    Post subject: Reply with quote

Thanks a lot.

I was hoping that someone of the pros has seen such a thing.

After I have switched the production and the backup server last
weekend I now have something to test.

Thanks again.
fangorn
_________________
Video Encoding scripts collection | Project page
Back to top
View user's profile Send private message
Treborius
Guru
Guru


Joined: 18 Oct 2005
Posts: 584
Location: Berlin

PostPosted: Fri Nov 04, 2011 11:55 am    Post subject: Reply with quote

similar problem here, see https://forums.gentoo.org/viewtopic-t-899392.html

i get these messages :
Code:

Oct 25 12:00:53 ponyslaystation kernel: swapper: page allocation failure. order:1, mode:0x20
 Oct 25 12:00:53 ponyslaystation kernel: Pid: 0, comm: swapper Tainted: P        W   2.6.38-gentoo-r6-alix #9
 Oct 25 12:00:53 ponyslaystation kernel: Call Trace:
 Oct 25 12:00:53 ponyslaystation kernel: [<c106950d>] ? __alloc_pages_nodemask+0x4ad/0x650
 Oct 25 12:00:53 ponyslaystation kernel: [<c108585e>] ? cache_alloc_refill+0x2ae/0x470
 Oct 25 12:00:53 ponyslaystation kernel: [<c1085a92>] ? __kmalloc+0x72/0xa0
 Oct 25 12:00:53 ponyslaystation kernel: [<c1229219>] ? __alloc_skb+0x49/0x100
 Oct 25 12:00:53 ponyslaystation kernel: [<d00c504e>] ? ath_rxbuf_alloc+0x1e/0x80 [ath]
 Oct 25 12:00:53 ponyslaystation kernel: [<cfff9095>] ? ath_rx_tasklet+0x615/0x15b0 [ath9k]
 Oct 25 12:00:53 ponyslaystation kernel: [<c103ef51>] ? sched_clock_local.clone.1+0x41/0x170
 Oct 25 12:00:53 ponyslaystation kernel: [<c10220bd>] ? enqueue_task_rt+0x1d/0x120
 Oct 25 12:00:53 ponyslaystation kernel: [<c10221ea>] ? enqueue_task.clone.127+0x2a/0x60
 Oct 25 12:00:53 ponyslaystation kernel: [<cfff7274>] ? ath9k_tasklet+0x64/0x120 [ath9k]
 Oct 25 12:00:53 ponyslaystation kernel: [<c102aba9>] ? tasklet_action+0x39/0x70
 Oct 25 12:00:53 ponyslaystation kernel: [<c102b09c>] ? __do_softirq+0x6c/0xd0
 Oct 25 12:00:53 ponyslaystation kernel: [<c102b030>] ? __do_softirq+0x0/0xd0
 Oct 25 12:00:53 ponyslaystation kernel: <IRQ>  [<c102b1c5>] ? irq_exit+0x65/0x70
 Oct 25 12:00:53 ponyslaystation kernel: [<c1004215>] ? do_IRQ+0x35/0x90
 Oct 25 12:00:53 ponyslaystation kernel: [<c1002ef0>] ? common_interrupt+0x30/0x40
 Oct 25 12:00:53 ponyslaystation kernel: [<c100850c>] ? default_idle+0x2c/0x40
 Oct 25 12:00:53 ponyslaystation kernel: [<c10015f4>] ? cpu_idle+0x74/0x90
 Oct 25 12:00:53 ponyslaystation kernel: [<c138c5f0>] ? start_kernel+0x281/0x287
 Oct 25 12:00:53 ponyslaystation kernel: Mem-Info:
 Oct 25 12:00:53 ponyslaystation kernel: DMA per-cpu:
 Oct 25 12:00:53 ponyslaystation kernel: CPU    0: hi:    0, btch:   1 usd:   0
 Oct 25 12:00:53 ponyslaystation kernel: Normal per-cpu:
 Oct 25 12:00:53 ponyslaystation kernel: CPU    0: hi:   90, btch:  15 usd:  84
 Oct 25 12:00:53 ponyslaystation kernel: active_anon:11061 inactive_anon:12313 isolated_anon:0
 Oct 25 12:00:53 ponyslaystation kernel: active_file:14113 inactive_file:14172 isolated_file:0
 Oct 25 12:00:53 ponyslaystation kernel: unevictable:0 dirty:663 writeback:353 unstable:0
 Oct 25 12:00:53 ponyslaystation kernel: free:985 slab_reclaimable:5358 slab_unreclaimable:2492
 Oct 25 12:00:53 ponyslaystation kernel: mapped:3344 shmem:584 pagetables:406 bounce:0
 Oct 25 12:00:53 ponyslaystation kernel: DMA free:1076kB min:124kB low:152kB high:184kB active_anon:876kB inactive_anon:2608kB active_file:4824kB inactive_file:5016kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15800kB mlocked:0kB di
 rty:308kB writeback:80kB mapped:1304kB shmem:0kB slab_reclaimable:1344kB slab_unreclaimable:88kB kernel_stack:40kB pagetables:40kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
 Oct 25 12:00:53 ponyslaystation kernel: lowmem_reserve[]: 0 229 229
 Oct 25 12:00:53 ponyslaystation kernel: Normal free:2864kB min:1876kB low:2344kB high:2812kB active_anon:43368kB inactive_anon:46644kB active_file:51628kB inactive_file:51672kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:235392kB mlocked:0kB dirty:2344kB writeback:1332kB mapped:12072kB shmem:2336kB slab_reclaimable:20088kB slab_unreclaimable:9880kB kernel_stack:704kB pagetables:1584kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:9 all_unreclaimable? no
 Oct 25 12:00:53 ponyslaystation kernel: lowmem_reserve[]: 0 0 0
 Oct 25 12:00:53 ponyslaystation kernel: DMA: 269*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1076kB
 Oct 25 12:00:53 ponyslaystation kernel: Normal: 716*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2864kB
 Oct 25 12:00:53 ponyslaystation kernel: 29744 total pagecache pages
 Oct 25 12:00:53 ponyslaystation kernel: 874 pages in swap cache
 Oct 25 12:00:53 ponyslaystation kernel: Swap cache stats: add 15150, delete 14276, find 8958/9845
 Oct 25 12:00:53 ponyslaystation kernel: Free swap  = 1946376kB
 Oct 25 12:00:53 ponyslaystation kernel: Total swap = 1959924kB
 Oct 25 12:00:53 ponyslaystation kernel: 63392 pages RAM
 Oct 25 12:00:53 ponyslaystation kernel: 1600 pages reserved
 Oct 25 12:00:53 ponyslaystation kernel: 39078 pages shared
 Oct 25 12:00:53 ponyslaystation kernel: 33126 pages non-shared
 Oct 25 12:00:53 ponyslaystation kernel: skbuff alloc of size 3872 failed

_________________
Systems running gentoo :
Desktop, Laptop, ZOTAC AD-10 media-center, odroid-xu4 server / wLan-router
Back to top
View user's profile Send private message
fangorn
Veteran
Veteran


Joined: 31 Jul 2004
Posts: 1886

PostPosted: Fri Nov 04, 2011 12:12 pm    Post subject: Reply with quote

After switching from kernel 2.6.32 to 2.6.39 the servers now are running a week without error
messages. I did not have the time for excessive testing so far.

I did not change the network settings until now. But ultimately I want to go back to distribution
standard kernel. So I will test with 2.6.32 later and probably need the network setting changes.
_________________
Video Encoding scripts collection | Project page
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum