View previous topic :: View next topic |
Author |
Message |
CanuteTheGreat n00b
Joined: 10 Feb 2007 Posts: 58 Location: Bellingham, WA, USA
|
Posted: Wed Mar 30, 2016 3:29 pm Post subject: [SOLVED] High server load, yet low CPU and disk utilization |
|
|
Hello!
I am trying to track down the cause of a high (relative) server load. On the same test machines running Ubuntu 14.04 LTS the problem does not exist. The culprit is teamspeak3 server. I have tried teamspeak-server-bin 3.0.12.3 from portage as well as early 3.0.x to 3.0.12.3 from teamspeak site.
I have 5 test machines: two Amazon AWS EC2 instances: one t2.small and one t2.medium. A XEN-based VM hosted with VirtualHost (vr.org). As well as two locally hosted KVM-based VMs. All are configured similarly to the AWS instances. The two locally KVM-based VMs as well as the VirtualHost VM were built from scratch using the latest minimal install ISO and stage3. The two AWS instances were built using the latest Pygoscelis Papua Linux AMIs. In all setups, the problem does not appear under Ubuntu.
Under Gentoo the load is between 5.60 and 6.40 while idle and no TS3 connections. Under Ubuntu the load is between 0.00 and 0.08 while idle and no TS3 connections. CPU usage is 99 to 100% idle. Disk I/O is zero or close to it while idle.
Unfortunately dumping TS3 isn't an option.. so does anyone have any ideas on how to get TS3 server to run reasonably on Gentoo?
Last edited by CanuteTheGreat on Mon May 09, 2016 10:53 pm; edited 1 time in total |
|
Back to top |
|
|
Tatsh Apprentice
Joined: 22 Jul 2007 Posts: 187
|
Posted: Mon Apr 04, 2016 7:29 am Post subject: |
|
|
What does strace show?
strace -ff -y -s 200 -p <the PID to the parent process> |
|
Back to top |
|
|
CanuteTheGreat n00b
Joined: 10 Feb 2007 Posts: 58 Location: Bellingham, WA, USA
|
Posted: Mon Apr 04, 2016 2:23 pm Post subject: |
|
|
Code: | # strace -ff -y -s 200 -p 17868
Process 17868 attached with 25 threads
[pid 17898] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid 17897] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid 17896] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid 17895] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid 17894] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid 17893] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid 17892] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid 17891] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid 17890] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid 17889] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid 17888] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid 17887] epoll_wait(22<anon_inode:[eventpoll]>, <unfinished ...>
[pid 17886] futex(0xf672fc, FUTEX_WAIT_PRIVATE, 4, NULL <unfinished ...>
[pid 17885] futex(0xf672fc, FUTEX_WAIT_PRIVATE, 4, NULL <unfinished ...>
[pid 17884] futex(0xf672fc, FUTEX_WAIT_PRIVATE, 4, NULL <unfinished ...>
[pid 17883] futex(0xf672fc, FUTEX_WAIT_PRIVATE, 4, NULL <unfinished ...>
[pid 17882] epoll_wait(24<anon_inode:[eventpoll]>, <unfinished ...>
[pid 17881] gettimeofday( <unfinished ...>
[pid 17880] select(19, [18<socket:[2430008]>], NULL, [18<socket:[2430008]>], {0, 90936} <unfinished ...>
[pid 17879] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid 17875] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid 17874] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid 17873] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid 17871] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid 17868] futex(0xdc73e4, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 17881] <... gettimeofday resumed> {1459779258, 379448}, NULL) = 0
[pid 17881] gettimeofday({1459779258, 379480}, NULL) = 0
[pid 17881] epoll_wait(16<anon_inode:[eventpoll]>, <unfinished ...>
[pid 17898] <... restart_syscall resumed> ) = 0
[pid 17897] <... restart_syscall resumed> ) = 0
[pid 17896] <... restart_syscall resumed> ) = 0
[pid 17895] <... restart_syscall resumed> ) = 0
[pid 17894] <... restart_syscall resumed> ) = 0
[pid 17893] <... restart_syscall resumed> ) = 0
[pid 17892] <... restart_syscall resumed> ) = 0
[pid 17891] <... restart_syscall resumed> ) = 0
[pid 17890] <... restart_syscall resumed> ) = 0
[pid 17889] <... restart_syscall resumed> ) = 0
[pid 17898] poll(0x7fd0b80008e0, 0, 10 <unfinished ...>
[pid 17897] poll(0xf7bed0, 0, 10 <unfinished ...>
[pid 17896] poll(0x7fd0b80008c0, 0, 10 <unfinished ...>
[pid 17895] poll(0x7fd0c40008c0, 0, 10 <unfinished ...>
[pid 17894] poll(0x7fd0c00008c0, 0, 10 <unfinished ...>
[pid 17893] poll(0x7fd0c40008e0, 0, 10 <unfinished ...>
[pid 17892] poll(0x7fd0c00008e0, 0, 10 <unfinished ...>
[pid 17891] poll(0x7fd0cc0008e0, 0, 10 <unfinished ...>
[pid 17890] poll(0x7fd0c80008e0, 0, 10 <unfinished ...>
[pid 17889] poll(0x7fd0d40008e0, 0, 10 <unfinished ...>
[pid 17898] <... poll resumed> ) = 0 (Timeout)
[pid 17897] <... poll resumed> ) = 0 (Timeout)
[pid 17896] <... poll resumed> ) = 0 (Timeout)
[pid 17895] <... poll resumed> ) = 0 (Timeout)
[pid 17894] <... poll resumed> ) = 0 (Timeout)
[pid 17893] <... poll resumed> ) = 0 (Timeout)
[pid 17892] <... poll resumed> ) = 0 (Timeout)
[pid 17891] <... poll resumed> ) = 0 (Timeout)
[pid 17890] <... poll resumed> ) = 0 (Timeout)
[pid 17889] <... poll resumed> ) = 0 (Timeout)
[pid 17888] <... restart_syscall resumed> ) = 0
[pid 17898] poll(0x7fd0b80008e0, 0, 10 <unfinished ...>
[pid 17897] poll(0xf7bed0, 0, 10 <unfinished ...>
[pid 17896] poll(0x7fd0b80008c0, 0, 10 <unfinished ...>
[pid 17895] poll(0x7fd0c40008c0, 0, 10 <unfinished ...>
[pid 17894] poll(0x7fd0c00008c0, 0, 10 <unfinished ...>
[pid 17893] poll(0x7fd0c40008e0, 0, 10 <unfinished ...>
[pid 17892] poll(0x7fd0c00008e0, 0, 10 <unfinished ...>
[pid 17891] poll(0x7fd0cc0008e0, 0, 10 <unfinished ...>
[pid 17890] poll(0x7fd0c80008e0, 0, 10 <unfinished ...>
[pid 17889] poll(0x7fd0d40008e0, 0, 10 <unfinished ...>
[pid 17888] poll([{fd=26<socket:[2430014]>, events=POLLIN}], 1, 100 <unfinished ...>
[pid 17879] <... restart_syscall resumed> ) = 0
[pid 17875] <... restart_syscall resumed> ) = 0
[pid 17874] <... restart_syscall resumed> ) = 0
[pid 17873] <... restart_syscall resumed> ) = 0
[pid 17879] gettimeofday( <unfinished ...>
[pid 17875] nanosleep({0, 100000000}, <unfinished ...>
[pid 17874] nanosleep({0, 100000000}, <unfinished ...>
[pid 17873] nanosleep({0, 100000000}, <unfinished ...>
[pid 17879] <... gettimeofday resumed> {1459779258, 420121}, NULL) = 0
[pid 17879] nanosleep({0, 100000000}, <unfinished ...>
[pid 17898] <... poll resumed> ) = 0 (Timeout)
[pid 17897] <... poll resumed> ) = 0 (Timeout)
[pid 17896] <... poll resumed> ) = 0 (Timeout)
[pid 17895] <... poll resumed> ) = 0 (Timeout)
[pid 17894] <... poll resumed> ) = 0 (Timeout)
[pid 17893] <... poll resumed> ) = 0 (Timeout)
[pid 17892] <... poll resumed> ) = 0 (Timeout)
[pid 17891] <... poll resumed> ) = 0 (Timeout)
[pid 17890] <... poll resumed> ) = 0 (Timeout)
[pid 17889] <... poll resumed> ) = 0 (Timeout)
[pid 17898] poll(0x7fd0b80008e0, 0, 10 <unfinished ...>
[pid 17897] poll(0xf7bed0, 0, 10 <unfinished ...>
[pid 17896] poll(0x7fd0b80008c0, 0, 10 <unfinished ...>
[pid 17895] poll(0x7fd0c40008c0, 0, 10 <unfinished ...>
[pid 17894] poll(0x7fd0c00008c0, 0, 10 <unfinished ...>
[pid 17893] poll(0x7fd0c40008e0, 0, 10 <unfinished ...>
[pid 17892] poll(0x7fd0c00008e0, 0, 10 <unfinished ...>
[pid 17891] poll(0x7fd0cc0008e0, 0, 10 <unfinished ...>
[pid 17890] poll(0x7fd0c80008e0, 0, 10 <unfinished ...>
[pid 17889] poll(0x7fd0d40008e0, 0, 10 <unfinished ...>
...snip... |
Larger sample: https://www.dropbox.com/s/5t2wsd48brhubbi/ts.log?dl=0 |
|
Back to top |
|
|
Tatsh Apprentice
Joined: 22 Jul 2007 Posts: 187
|
Posted: Mon Apr 04, 2016 6:31 pm Post subject: |
|
|
What does strace look like on the good machine? And does it output as much data in a very short span of time? |
|
Back to top |
|
|
CanuteTheGreat n00b
Joined: 10 Feb 2007 Posts: 58 Location: Bellingham, WA, USA
|
Posted: Mon Apr 04, 2016 7:12 pm Post subject: |
|
|
It appears to output about the same speed, maybe a little faster, on the good machine.
Code: |
strace -ff -y -s 200 -p 1115
Process 1115 attached with 25 threads
[pid 1142] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid 1141] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid 1140] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid 1139] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid 1138] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid 1137] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid 1136] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid 1135] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid 1134] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid 1133] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid 1132] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid 1131] epoll_wait(22<anon_inode:[eventpoll]>, <unfinished ...>
[pid 1130] futex(0x106e44c, FUTEX_WAIT_PRIVATE, 4, NULL <unfinished ...>
[pid 1129] futex(0x106e44c, FUTEX_WAIT_PRIVATE, 4, NULL <unfinished ...>
[pid 1128] futex(0x106e44c, FUTEX_WAIT_PRIVATE, 4, NULL <unfinished ...>
[pid 1127] futex(0x106e44c, FUTEX_WAIT_PRIVATE, 4, NULL <unfinished ...>
[pid 1126] epoll_wait(23<anon_inode:[eventpoll]>, <unfinished ...>
[pid 1125] gettimeofday( <unfinished ...>
[pid 1124] select(19, [18<socket:[12378]>], NULL, [18<socket:[12378]>], {0, 8820} <unfinished ...>
[pid 1125] <... gettimeofday resumed> {1459796821, 533470}, NULL) = 0
[pid 1125] gettimeofday({1459796821, 533548}, NULL) = 0
[pid 1125] epoll_wait(16<anon_inode:[eventpoll]>, <unfinished ...>
[pid 1123] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid 1120] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid 1119] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid 1118] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid 1117] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid 1115] futex(0xdc73e4, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 1142] <... restart_syscall resumed> ) = 0
[pid 1142] poll(0x7f42dc0008e0, 0, 10 <unfinished ...>
[pid 1141] <... restart_syscall resumed> ) = 0
[pid 1141] poll(0x7f42d00008e0, 0, 10 <unfinished ...>
|
Larger sample: https://www.dropbox.com/s/ewlg2y6a9xb34fe/ts3-ubuntu.log?dl=0 |
|
Back to top |
|
|
CanuteTheGreat n00b
Joined: 10 Feb 2007 Posts: 58 Location: Bellingham, WA, USA
|
Posted: Mon May 09, 2016 10:53 pm Post subject: |
|
|
I finally found the solution by comparing a .config from Ubuntu 14.04 LTS with a few different ones from my Gentoo servers and a bit of trail-and-error. The key is changing the kernel into a tickless one. Specifically, these settings work:
Code: |
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
# CONFIG_NO_HZ_IDLE is not set
CONFIG_NO_HZ_FULL=y
# CONFIG_NO_HZ_FULL_ALL is not set
CONFIG_NO_HZ_FULL_SYSIDLE=y
CONFIG_NO_HZ_FULL_SYSIDLE_SMALL=8
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
|
|
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|