Mike81 n00b
Joined: 05 Jan 2011 Posts: 39
|
Posted: Mon Jan 28, 2013 12:49 pm Post subject: arp problem |
|
|
Hi,
well, previously I thought it was just a Gentoo problem... but now I am able to reproduce it on Debian and Gentoo (it is not so easy to reproduce: Empty arp table is not enough. I also have to wait ~10min without any traffic to the gateway). I'll update the thread:
We are experience the following problem in our network:
The systems haven't sent packets to the router for a while (I am logged in via SSH to these systems via LAN).
When you now view the arp table, it will look like:
Code: | gentoo-test ~ # arp -a -n
? (192.168.5.10) at 8c:89:a5:XX:XX:XX [ether] on eth0 |
192.168.5.10 is a Windows system, connected via SSH.
When you ping google.com for example, it will take some time (and fail at the beginning):
Code: | gentoo-test ~ # ping google.com
PING google.com (173.194.35.128) 56(84) bytes of data.
From gentoo-test.intern (192.168.5.147): icmp_seq=1 Destination Host Unreachable
From gentoo-test.intern (192.168.5.147): icmp_seq=2 Destination Host Unreachable
From gentoo-test.intern (192.168.5.147): icmp_seq=3 Destination Host Unreachable
From gentoo-test.intern (192.168.5.147): icmp_seq=4 Destination Host Unreachable
From gentoo-test.intern (192.168.5.147): icmp_seq=5 Destination Host Unreachable
From gentoo-test.intern (192.168.5.147): icmp_seq=6 Destination Host Unreachable
From gentoo-test.intern (192.168.5.147): icmp_seq=7 Destination Host Unreachable
From gentoo-test.intern (192.168.5.147): icmp_seq=8 Destination Host Unreachable
From gentoo-test.intern (192.168.5.147): icmp_seq=9 Destination Host Unreachable
64 bytes from muc03s01-in-f0.1e100.net (173.194.35.128): icmp_seq=10 ttl=58 time=16.5 ms
64 bytes from muc03s01-in-f0.1e100.net (173.194.35.128): icmp_seq=11 ttl=58 time=15.8 ms
64 bytes from muc03s01-in-f0.1e100.net (173.194.35.128): icmp_seq=12 ttl=58 time=16.1 ms
^C
--- google.com ping statistics ---
12 packets transmitted, 3 received, +9 errors, 75% packet loss, time 11128ms
rtt min/avg/max/mdev = 15.849/16.167/16.525/0.313 ms, pipe 4 |
While pinging, a tcpdump on gentoo-test will capture
Code: | 12:55:28.710420 00:0c:29:XX:XX:XX > ff:ff:ff:ff:ff:ff, ARP, length 42: Request who-has 192.168.5.254 tell 192.168.5.147, length 28
12:55:29.711018 00:0c:29:XX:XX:XX > ff:ff:ff:ff:ff:ff, ARP, length 42: Request who-has 192.168.5.254 tell 192.168.5.147, length 28
12:55:30.709549 00:0c:29:XX:XX:XX > ff:ff:ff:ff:ff:ff, ARP, length 42: Request who-has 192.168.5.254 tell 192.168.5.147, length 28
12:55:31.739028 00:0c:29:XX:XX:XX > ff:ff:ff:ff:ff:ff, ARP, length 42: Request who-has 192.168.5.254 tell 192.168.5.147, length 28
12:55:32.737369 00:0c:29:XX:XX:XX > ff:ff:ff:ff:ff:ff, ARP, length 42: Request who-has 192.168.5.254 tell 192.168.5.147, length 28
12:55:33.751372 00:0c:29:XX:XX:XX > ff:ff:ff:ff:ff:ff, ARP, length 42: Request who-has 192.168.5.254 tell 192.168.5.147, length 28
12:55:34.781021 00:0c:29:XX:XX:XX > ff:ff:ff:ff:ff:ff, ARP, length 42: Request who-has 192.168.5.254 tell 192.168.5.147, length 28
12:55:35.779318 00:0c:29:XX:XX:XX > ff:ff:ff:ff:ff:ff, ARP, length 42: Request who-has 192.168.5.254 tell 192.168.5.147, length 28
12:55:36.779004 00:0c:29:XX:XX:XX > ff:ff:ff:ff:ff:ff, ARP, length 42: Request who-has 192.168.5.254 tell 192.168.5.147, length 28
12:55:37.822902 00:0c:29:XX:XX:XX > ff:ff:ff:ff:ff:ff, ARP, length 42: Request who-has 192.168.5.254 tell 192.168.5.147, length 28
12:55:38.821268 00:0c:29:XX:XX:XX > ff:ff:ff:ff:ff:ff, ARP, length 42: Request who-has 192.168.5.254 tell 192.168.5.147, length 28
12:55:38.821894 bc:05:43:XX:XX:XX > 00:0c:29:XX:XX:XX, ARP, length 60: Reply 192.168.5.254 is-at bc:05:43:XX:XX:XX, length 46 |
We see this on almost every system in our network (well, it is hard to make sure that there is not traffic for while). When the arp entry for the router timed out, it will take some time to get it back.
So for example you can be sure, that the first connection attempt to any server on the internet will fail with an error like "no route to host". The second and any further attempt will succeed until the arp entry will be removed (because of a normal timeout, the result of inactivity) again.
So my question is:
What could be the reason for that?
On most systems we never noticed the problem, because there is "always" traffic between the system and the router, so the arp entry wouldn't timeout. But when we can make sure that there is not traffic for ~10-15mins, the next attempt will take some time. We are currently trying to reproduce it on some Windows systems, too.
What could it be?
What could we do to prevent that? Setting a static ARP entry? Ping the gateway to prevent a timeout? That sounds like a bad hack...
Could somebody confirm, that this isn't normal? That a idle system should get the arp entry within ms? |
|