Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
AMD64 + Network Layer [SOLVED]
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Gentoo on AMD64
View previous topic :: View next topic  
Author Message
omschaub
Tux's lil' helper
Tux's lil' helper


Joined: 03 Sep 2004
Posts: 132
Location: Roanoke, VA

PostPosted: Thu Mar 24, 2005 5:42 pm    Post subject: AMD64 + Network Layer [SOLVED] Reply with quote

Hello Fellow AMD64-ers,

I am having issues with my network and seek advice..

My underlying problem is best demonstrated by running the program Azureus.. the newest (CVS) version of Azureus has the ability to detect "network layer" issues (as one of the devs of Azureus calls it). But my problem is more widespread than that. But first, let me post his comment on my problems from the Azureus forum:
Quote:
That message is shown when socket selector spin is detected, and it's pretty hard to trip it off without there actually being a problem. Without getting into specifics, a socket selector is a high-speed way of knowing when a socket is ready to be written to / read from ( ala poll() ). In this case, to warrant a popup alert, a select op would have to return, prematurely, without having selected any channels for readiness 10,000 times in a row. The detection was added because under Windows (under JRE 1.4 series in particular it seemed), when the underlying os network transport was reset (say, when the network cable is unplugged and then replugged), it caused the selector to spin profusely, causing 100% cpu issues. Now, I dont really know why you would be seeing this problem under JRE 1.5 (where the spin bug was supposedly fixed) or under linux, but since you are seeing the alerts, I'd have to guess something IS wrong, as the chances of a false positive is extremely low imho. First thing I'd check is to see if the os network layer is behaving. I'd suggest running a continues ping in the background ('ping -t google.com' under windows) or some long-term file transfer; to see if they stall at the same time Az throws that selector error.


Now, for my description of my issue:
I do have JRE 1.5.0 SUN and the blackdown installed.. both give the same problem, but the issue is more widespread than that. Connections are being dropped from pretty much any 'high connection' program.. for example: KLibido.. one or more of the threads just stalls until a reset happens and Liferea reading one of my many RSS feeds .. one or two will stall and error out until the next update.

As the dev of Azureus suggested, I can ping google to the cows come home and still see 0% packet loss.. I can transfer HUGE files back and forth with no real issues, but large 'multi-connect' programs always are throwing errors.

I have an Intel Gigabit controller card (e1000) and a nice hub.. incedently I am not seeing this on any other of the computers connected to my cable modem.. just my AMD64...

So now my questions:
1. How can I go about testing for this problem?
2. Are there programs that can hunt down these types of issues / logs to display errors in this realm?
3. This seems to be a new problem for me as I have had this AMD64 for 9 or so months now and the problems is only of late.. could it be a new kernel version? I am running dev sources 2.6.11-r4.

Thanks in advance for any ideas in helping me hunt this issue down.
_________________
Support bacteria -- it's the only culture some people have!


Last edited by omschaub on Fri Mar 25, 2005 12:54 pm; edited 1 time in total
Back to top
View user's profile Send private message
adaptr
Watchman
Watchman


Joined: 06 Oct 2002
Posts: 6730
Location: Rotterdam, Netherlands

PostPosted: Thu Mar 24, 2005 8:24 pm    Post subject: Reply with quote

You say huge transfers are not a problem, but large numbers of connections are.
An easy way to reproduce this is to flood ping something - may as well be on your own hub so you don't piss people off ;-)
Code:
ping -f -c 10000 a.b.c.d

If it returns (should return within a second) then the raw amount of packets is not the issue.
Why am I even suggesting this ?
Well.. you have a GigE card connected to a hub.

Two more disparate pieces of networking equipment would be hard to find ;-)

In short: a hub does not forward packets very fast compared to a switch (depends on quality as well, of course) and GigE can obviously flood the hub's internal buffers in under a millisecond.

You don't have jumbo frames enabled by any chance ? ;-)
_________________
>>> emerge (3 of 7) mcse/70-293 to /
Essential tools: gentoolkit eix profuse screen
Back to top
View user's profile Send private message
omschaub
Tux's lil' helper
Tux's lil' helper


Joined: 03 Sep 2004
Posts: 132
Location: Roanoke, VA

PostPosted: Thu Mar 24, 2005 9:11 pm    Post subject: Reply with quote

Hub was probably a poor choice of words. I have a DLink DGS-1005D Gigabit Switch to handle my 4 Gigabit enabled computers.. that is connected to a Zonet ZFS3016 16 Port Ethernet Switch which has my other computers and wireless AP as well as my main router which is a PC running M0n0wall.

I tried your ping command -- even adding a zero :) and got this:

Quote:
ping -f -c 100000 192.168.1.197
PING 192.168.1.197 (192.168.1.197) 56(84) bytes of data.

--- 192.168.1.197 ping statistics ---
100000 packets transmitted, 100000 received, 0% packet loss, time 13551ms
rtt min/avg/max/mdev = 0.042/0.113/1.281/0.023 ms, ipg/ewma 0.135/0.115 ms


To a wireless computer:
Quote:
ping -f -c 100000 192.168.1.195
PING 192.168.1.195 (192.168.1.195) 56(84) bytes of data.

--- 192.168.1.195 ping statistics ---
100000 packets transmitted, 100000 received, 0% packet loss, time 123665ms
rtt min/avg/max/mdev = 0.785/1.182/9.817/0.535 ms, ipg/ewma 1.236/1.019 ms

and to the router:
Quote:
ping -f -c 10000 192.168.1.1
PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.

--- 192.168.1.1 ping statistics ---
10000 packets transmitted, 7848 received, 21% packet loss, time 39574ms
rtt min/avg/max/mdev = 0.092/0.106/0.736/0.040 ms, ipg/ewma 3.957/0.102 ms


So! Thanks to your cute trick, it seems we have found my problem.. now.. ugh.. is it the Zonet, M0n0wall PC, or it's lan side network card.. :?:

But.. many thanks to you for giving me a push in the right direction...

Now.. any hints at how to hunt this down? Start changing out equipment, etc?
_________________
Support bacteria -- it's the only culture some people have!
Back to top
View user's profile Send private message
adaptr
Watchman
Watchman


Joined: 06 Oct 2002
Posts: 6730
Location: Rotterdam, Netherlands

PostPosted: Thu Mar 24, 2005 9:45 pm    Post subject: Reply with quote

Well... I've never heard of Zonet, so that might be a good place to start, yeah...
Also check the NIC on the m0n0wall box - swap it with a decent 3com if possible.
Depending on the box, the m0n0wall may not be able to keep up with that amount of traffic.
I know, it is purported to run on as little as a 486, but don't expect that to properly handle a 4mbit DSL line...

As an aside - we just got our new Watchguard Firebox X1000 at work ;-)
P3-1200 / 256MB / 6 LAN ifaces / 50mbits VPN throughput... oh boy would you want that!

But it would be much (much) simpler to just switch some cables around and test all gbit equipment on the Zonet switch - if that also fails then you know where the problem lies...

"Poor choice of words" indeed!
All hubs should be burned.
_________________
>>> emerge (3 of 7) mcse/70-293 to /
Essential tools: gentoolkit eix profuse screen
Back to top
View user's profile Send private message
omschaub
Tux's lil' helper
Tux's lil' helper


Joined: 03 Sep 2004
Posts: 132
Location: Roanoke, VA

PostPosted: Fri Mar 25, 2005 12:53 pm    Post subject: Reply with quote

Ok.. I replaced the nic card and cable in the monowall box and I bypassed the zonet switch... and still 24% packet loss.. so I got to thinking.. I asked my brother to do the same thing on his home LAN (he also uses M0n0wall).. and 24% packet loss! So I am not sure exactly why, but even on a 1Ghz machine with 512MB Ram.. M0n0wall will limit and drop packets if they are coming too fast to it.. so I switched over to using Clark Connect Home (which uses a RH Fedora 2 core) and ~poof~ all my problems went away! I used the ping command and got 0% packet loss.. I use Azureus and no longer get the Socket Disconnect Errors.. man, I feel that I should go to the m0n0wall forums and post something, but after discovering Clark Connect and it's MANY features, I will never turn back again!

adaptr.. thanks so much for the help and hand holding! It is really nice to have my network back.. I will mark this solved :D
_________________
Support bacteria -- it's the only culture some people have!
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Gentoo on AMD64 All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum