Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
page allocation failure (Proliant HP Gen8, 4.4.6-gentoo)
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
atmosx
n00b
n00b


Joined: 17 Jul 2009
Posts: 42

PostPosted: Sat Jun 10, 2017 7:09 am    Post subject: page allocation failure (Proliant HP Gen8, 4.4.6-gentoo) Reply with quote

Hi,

I have an HP Proliant Gen8 microserver running Gentoo. I'm getting a page allocation failure consistently using the "4.4.6-gentoo". The system is stable, I don't have hangups or anything. The "dmesg" logs:

Code:
[1053460.047725] rdiff-backup: page allocation failure: order:2, mode:0x2204020
[1053460.047726] CPU: 0 PID: 57583 Comm: rdiff-backup Tainted: P           OE   4.4.6-gentoo #1
[1053460.047727] Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 11/02/2015
[1053460.047728]  0000000000000000 ffff88040b403ae8 ffffffff8129bc12 0000000002204020
[1053460.047730]  0000000000000002 ffff88040b403b70 ffffffff811475db ffffffff81aef378
[1053460.047732]  ffff880400000060 ffffffff81094064 0220402000000000 0000000000000100
[1053460.047735] Call Trace:
[1053460.047735]  <IRQ>  [<ffffffff8129bc12>] dump_stack+0x67/0x95
[1053460.047739]  [<ffffffff811475db>] warn_alloc_failed+0xdb/0x130
[1053460.047741]  [<ffffffff81094064>] ? __wake_up+0x44/0x50
[1053460.047744]  [<ffffffff8114ae18>] __alloc_pages_nodemask+0x878/0xa20
[1053460.047746]  [<ffffffff810c7dff>] ? clockevents_program_event+0x7f/0x120
[1053460.047748]  [<ffffffff8118d571>] cache_alloc_refill+0x2f1/0x590
[1053460.047750]  [<ffffffff8118dc2f>] __kmalloc+0x1ef/0x230
[1053460.047753]  [<ffffffffa0d35821>] ? tg3_alloc_rx_data+0x71/0x260 [tg3]
[1053460.047756]  [<ffffffffa0d35821>] tg3_alloc_rx_data+0x71/0x260 [tg3]
[1053460.047760]  [<ffffffffa0d3cf93>] tg3_poll_work+0x633/0xf10 [tg3]
[1053460.047762]  [<ffffffff814e3248>] ? __netif_receive_skb+0x18/0x60
[1053460.047765]  [<ffffffffa0d3d8b6>] tg3_poll_msix+0x46/0x160 [tg3]
[1053460.047768]  [<ffffffff814e4853>] net_rx_action+0x1d3/0x330
[1053460.047770]  [<ffffffff8105d1aa>] __do_softirq+0x12a/0x2d0
[1053460.047772]  [<ffffffff8105d4aa>] irq_exit+0x8a/0xa0
[1053460.047774]  [<ffffffff815ec704>] do_IRQ+0x54/0xd0
[1053460.047777]  [<ffffffff815eab49>] common_interrupt+0x89/0x89
[1053460.047777]  <EOI> Mem-Info:
[1053460.047781] active_anon:53278 inactive_anon:53422 isolated_anon:0
                  active_file:346165 inactive_file:173692 isolated_file:0
                  unevictable:0 dirty:452 writeback:0 unstable:0
                  slab_reclaimable:880470 slab_unreclaimable:1893874
                  mapped:6400 shmem:233 pagetables:1627 bounce:0
                  free:78959 free_pcp:417 free_cma:0
[1053460.047787] DMA free:15884kB min:60kB low:72kB high:88kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15968kB managed:15884kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[1053460.047788] lowmem_reserve[]: 0 3778 15967 15967
[1053460.047794] DMA32 free:123960kB min:15488kB low:19360kB high:23232kB active_anon:50996kB inactive_anon:51316kB active_file:269096kB inactive_file:139136kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3946384kB managed:3868760kB mlocked:0kB dirty:204kB writeback:0kB mapped:2764kB shmem:464kB slab_reclaimable:812872kB slab_unreclaimable:1850016kB kernel_stack:3024kB pagetables:1704kB unstable:0kB bounce:0kB free_pcp:868kB local_pcp:696kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[1053460.047795] lowmem_reserve[]: 0 0 12189 12189
[1053460.047800] Normal free:175992kB min:49980kB low:62472kB high:74968kB active_anon:162116kB inactive_anon:162372kB active_file:1115564kB inactive_file:555632kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:12779516kB managed:12481760kB mlocked:0kB dirty:1604kB writeback:0kB mapped:22836kB shmem:468kB slab_reclaimable:2709008kB slab_unreclaimable:5725480kB kernel_stack:9712kB pagetables:4804kB unstable:0kB bounce:0kB free_pcp:800kB local_pcp:656kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[1053460.047802] lowmem_reserve[]: 0 0 0 0
[1053460.047804] DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15884kB
[1053460.047812] DMA32: 18972*4kB (UME) 6004*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 123920kB
[1053460.047818] Normal: 37239*4kB (UE) 3366*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 175884kB
[1053460.047825] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[1053460.047826] 520513 total pagecache pages
[1053460.047827] 386 pages in swap cache
[1053460.047828] Swap cache stats: add 5302, delete 4916, find 202551/203330
[1053460.047829] Free swap  = 8372680kB
[1053460.047830] Total swap = 8387580kB
[1053460.047830] 4185467 pages RAM
[1053460.047831] 0 pages HighMem/MovableOnly
[1053460.047832] 93866 pages reserved
[1053460.047833] 0 pages hwpoisoned


Following an advise on SO I set vm.min_free_kbytes to 65536 which is more than twice the original size. That diminished considerably the frequency where page allocation failure would appear and basically, limited the failure to "fail2ban". I'm not running many processes so, this can't be an "out of memory" thing. This is a backup server, running a few dockerized services, a ZFS pool which used to cause the page allocation failure before updating the "min_free_kbytes" size. To give an idea, that's the mem usage on avg:

Quote:
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 15004 155552 1118828 4239840 0 0 98 543 7 15 2 0 96 2 0


Any idea why this happens?

With the exception of a possible kernel upgrade, is there any possible workaround? Ideas, thoughts, etc. are welcomed!

Thanks
Back to top
View user's profile Send private message
tholin
Apprentice
Apprentice


Joined: 04 Oct 2008
Posts: 200

PostPosted: Sat Jun 10, 2017 11:06 am    Post subject: Reply with quote

I think occasional page allocation failures are normal. The memory allocation failed when the kernel received a network packet. The packet will be dropped and retransmitted. It can happen when the kernel have to allocate a lot of pages all at once like when there is suddenly a lot of network traffic.

Is there a reason why you are running kernel 4.4.6? That's a lot of bug fixes in the memory management subsystem you are missing out on. 4.4.6 also got at least 100 security vulnerabilities fixed in later kernels.

Make sure your kernel is built with CONFIG_COMPACTION. Memory allocations can fail if memory is too fragmented and without compaction the kernel can't do anything about that. ZFS also complicates things because it use it's own pagecache. Reclaim of pagecache pages is tricky even without ZFS. Make sure you use the most recent version of ZFS.
Back to top
View user's profile Send private message
bunder
Bodhisattva
Bodhisattva


Joined: 10 Apr 2004
Posts: 5934

PostPosted: Sat Jul 01, 2017 6:08 am    Post subject: Reply with quote

if you're running zfs 0.6.5.x, you might consider upgrading to 0.7.0-rc3 or -rc4 as it has memory improvements.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum