Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
what causes tcp checksum errors [SOLVED]
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Networking & Security
View previous topic :: View next topic  
Author Message
msutton
n00b
n00b


Joined: 20 Jan 2005
Posts: 56

PostPosted: Fri Sep 09, 2005 11:43 pm    Post subject: what causes tcp checksum errors [SOLVED] Reply with quote

I am running iptables as primary firewall.
I have INPUT and FORWARD run a chain called Firewall always.
OUTPUT is set to always ACCEPT
Inside the Firewall chain I have
-A Firewall -m state --state RELATED,ESTABLISHED -j ACCEPT

and I have a NAT rule
-A POSTROUTING -s 192.168.1.0/255.255.255.0 -j MASQUERADE

now when I am browsing the web I get a whole bunch of DROP lines in my log like so.

Sep 9 16:21:20 [kernel] [IPTABLES DROP] : IN=eth1 OUT= MAC=<MAC> SRC=<SRC> DST=<MY EXT IP> LEN=1500 TOS=0x00 PREC=0x00 TTL=50 ID=12331 DF PROTO=TCP SPT=443 DPT=3837 WINDOW=17520 RES=0x00 ACK URGP=0

this is of course is an example of HTTPS protocol but it happens with HTTP and RSYNC and when I connect with gaim
Why is that rule not working all the time?
It isnt dropping all ESTABLISHED/RELATED or I would not be able to connect to anything becasue my iptables rule set is DROP based.

Any help would be appreciated.


Last edited by msutton on Thu Sep 22, 2005 7:21 pm; edited 3 times in total
Back to top
View user's profile Send private message
buzzin
Apprentice
Apprentice


Joined: 17 Oct 2003
Posts: 264
Location: St. Albans, UK.

PostPosted: Fri Sep 09, 2005 11:59 pm    Post subject: Reply with quote

think you need a rule for a 'new' state .. e.g

Code:
 iptables -A block -m state --state NEW -i ! $EXTIF -j ACCEPT


imho I find it better to make a new chain for the states. below is an example script you could edit which works ok for me.

Code:

#!/bin/sh
#

echo "   enabling forwarding.."
echo "1" > /proc/sys/net/ipv4/ip_forward


#outbound
EXTIF="eth0"
#inbound
INTIF="eth1"

echo "   clearing any existing rules and setting default policy.."
iptables --flush
iptables -P INPUT ACCEPT
iptables -F INPUT
iptables -P OUTPUT ACCEPT
iptables -F OUTPUT
iptables -P FORWARD DROP
iptables -F FORWARD
iptables -t nat -F

## Create chain which blocks new connections, except if coming from inside.
 iptables -X block
 iptables -N block
 iptables -A block -m state --state INVALID -j DROP
 iptables -A block -m state --state ESTABLISHED,RELATED -j ACCEPT
 iptables -A block -m state --state NEW -i ! $EXTIF -j ACCEPT
 iptables -A block -j DROP

## Jump to that chain from INPUT and FORWARD chains.
 iptables -A INPUT -j block
 iptables -A FORWARD -j block
 iptables -A OUTPUT -j block

echo "   Enabling SNAT (MASQUERADE) functionality on $EXTIF"
iptables -t nat -A POSTROUTING -o $EXTIF -j MASQUERADE
Back to top
View user's profile Send private message
msutton
n00b
n00b


Joined: 20 Jan 2005
Posts: 56

PostPosted: Sat Sep 10, 2005 2:24 am    Post subject: still no go Reply with quote

I did as you said and I get the same thing

Just seems odd that it is hit and miss like that

do you know of anything else I could check?
Back to top
View user's profile Send private message
buzzin
Apprentice
Apprentice


Joined: 17 Oct 2003
Posts: 264
Location: St. Albans, UK.

PostPosted: Sat Sep 10, 2005 12:04 pm    Post subject: Reply with quote

Did you try the above script?

Make sure you are identifying the states via the interface (-i) the traffic is seen on and not the source ip (-s) flag as ips can be faked.
Back to top
View user's profile Send private message
msutton
n00b
n00b


Joined: 20 Jan 2005
Posts: 56

PostPosted: Sat Sep 10, 2005 6:00 pm    Post subject: seems strange still Reply with quote

ok I loaded the above script

now it says those packets are INVALID in the log, I added logging before the rules.

so why would those packets be invalid if I am browsing that site?

Seems like iptables is not tracking connections correctly.

Sep 10 13:00:56 [kernel] [IPTABLES INVALID] : IN=eth1 OUT= MAC=<MAC> SRC=<SRC IP> DST=<MY IP> LEN=1500 TOS=0x00 PREC=0x00 TTL=50 ID=26190 DF PROTO=TCP SPT=873 DPT=33069 WINDOW=57920 RES=0x00 ACK URGP=0 OPT (0101080A0F5AAE8503F720C4)

and this is logged when I initiate a rsync with an rsync server iptables says the returned packets are INVALID.
Back to top
View user's profile Send private message
buzzin
Apprentice
Apprentice


Joined: 17 Oct 2003
Posts: 264
Location: St. Albans, UK.

PostPosted: Sat Sep 10, 2005 6:31 pm    Post subject: Reply with quote

Strange, not sure whats up.

what kernel are you using? Also can u post the output of iptables --list -v pls

Maybe try another kernel and then re-emerge iptables?
Back to top
View user's profile Send private message
msutton
n00b
n00b


Joined: 20 Jan 2005
Posts: 56

PostPosted: Sat Sep 10, 2005 7:26 pm    Post subject: Reply with quote

kernel=Linux gentoo 2.6.12-gentoo-r10

iptables -L

# iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
Matt all -- anywhere anywhere

Chain FORWARD (policy ACCEPT)
target prot opt source destination
Matt all -- anywhere anywhere

Chain OUTPUT (policy ACCEPT)
target prot opt source destination

Chain Matt (2 references)
target prot opt source destination
ACCEPT all -- anywhere anywhere
ACCEPT all -- anywhere anywhere
LOG all -- anywhere anywhere state INVALID LOG level warning tcp-options ip-options prefix `[IPTABLES INVALID] : '
REJECT all -- anywhere anywhere state INVALID reject-with icmp-port-unreachable
ACCEPT all -- anywhere anywhere state RELATED,ESTABLISHED
ACCEPT all -- anywhere anywhere state NEW
ACCEPT all -- <Friends Static IP> anywhere
ACCEPT tcp -- anywhere anywhere tcp dpt:3390
ACCEPT tcp -- anywhere anywhere tcp dpt:smtp
LOG all -- anywhere anywhere LOG level warning tcp-options ip-options prefix `[IPTABLES DROP] : '
REJECT all -- anywhere anywhere reject-with icmp-port-unreachable


when I download a file I noticed it logs Dropped 3-4 times every second
Back to top
View user's profile Send private message
msutton
n00b
n00b


Joined: 20 Jan 2005
Posts: 56

PostPosted: Sat Sep 10, 2005 8:48 pm    Post subject: stiil weird Reply with quote

I downgraded to 2.6.11-r11
re-emerged iptables
and still get the same behavior.

I am at a loss.

It does not happen on another gentoo box I have access to.

is there any setting in /proc I could check for this problem??
Back to top
View user's profile Send private message
msutton
n00b
n00b


Joined: 20 Jan 2005
Posts: 56

PostPosted: Sat Sep 10, 2005 10:19 pm    Post subject: suggestion Reply with quote

Could it be becasue the packets are recieved out of order?
congestion at a switch?

I added that log line to my other 2 gentoo firewalls that are on the same T1 with different IP's
and they are logging INVALId packets too for packets that should be considered established
Back to top
View user's profile Send private message
CriminalMastermind
Tux's lil' helper
Tux's lil' helper


Joined: 19 Nov 2003
Posts: 132
Location: toronto

PostPosted: Sat Sep 10, 2005 11:40 pm    Post subject: Reply with quote

hmm... interesting
i wonder what would happen if you used SNAT insted of MASQUERADE.
changing...
msutton wrote:
-A POSTROUTING -s 192.168.1.0/255.255.255.0 -j MASQUERADE

from your first post too...
Code:
-A POSTROUTING -s 192.168.1.0/255.255.255.0 -i $YOUR_INTERNAL_INTERFACE -j SNAT --to $YOUR_EXTERNAL_IP

may want to give that a shot and see what happens, i'm not sure if anything will change, but it's a shot.

hope that helps
_________________
"I can picture a perfect world that knows of no war... and I can picture me attacking that world, because they'd never expect it."
Back to top
View user's profile Send private message
msutton
n00b
n00b


Joined: 20 Jan 2005
Posts: 56

PostPosted: Sun Sep 11, 2005 5:07 am    Post subject: still the same Reply with quote

Changed from MASQ to SNAT and I still get the same behavior.

=(

I cannot figure out what is the matter.
I run a gentoo firewall at home too and I added the INVALID logging and have yet to see any INVALID connections at home.
Back to top
View user's profile Send private message
msutton
n00b
n00b


Joined: 20 Jan 2005
Posts: 56

PostPosted: Wed Sep 14, 2005 3:07 am    Post subject: further investigations Reply with quote

With further investigation using ethereal and tcpdump

Every time a packet is logged invalid the packet has a bad checksum.
What causes a bad checksum?

3 machines running firewalls on the T1 all receiving them and are all Athlon XP 2800+ with gig of ram and have 3com 3c905c nics in them.

Could it be the T1 dropping packets thus making the checksum invalid??
I would not think it would be slow or faulty nic since it happens on all 3.

Any insight would be helpful
Back to top
View user's profile Send private message
CriminalMastermind
Tux's lil' helper
Tux's lil' helper


Joined: 19 Nov 2003
Posts: 132
Location: toronto

PostPosted: Thu Sep 15, 2005 6:35 am    Post subject: Reply with quote

msutton wrote:
With further investigation using ethereal and tcpdump

Every time a packet is logged invalid the packet has a bad checksum.


wow. sounds like you've done some homework. it's nice to see people put a good amount of effort into there problems.
msutton wrote:
What causes a bad checksum?

corrupt data?

are these drops on your external/internet interface?

are you sure when they arrive on your external interface they have a bad checksum?

from what i remember, ip has a checksum, and tcp also has a checksum... which one is failing?

one thing you could try if you are sure you are the checksum is bad when it the packet get to you is rebooting anything between you and the router where the bad packets are arriving. (ie if there is a cable modem, switch, hub) and if that doesn't work and you can, try rebooting the router.

i'm just guessing at where i think the problem could be.

hope something there helps.
_________________
"I can picture a perfect world that knows of no war... and I can picture me attacking that world, because they'd never expect it."
Back to top
View user's profile Send private message
msutton
n00b
n00b


Joined: 20 Jan 2005
Posts: 56

PostPosted: Thu Sep 15, 2005 1:31 pm    Post subject: Reply with quote

The drops are on the external interface.

The T1 uplink was on a 5 port linksys switch and I moved it to a Cisco Switch on its own VLAN and still have the problems ruling out the switch.

The only thing after reading an enormous amount of info on the net is that the T1 router is corrupting the packet header when it sends it to me or the uplink cat 5 is bad.

I still need to replace the uplink cat 5 from the router to the switch. It is just a pain cause of drop down celings and having to move ceiling tiles. Really need to do it after hours so I can just string it on the ground and then test before getting the ladder out.

And the reason I believe it could be the T1 router, after reading the net this is what I understand, is that the router should do a tcp checksum by itself and if the packet is corrupt then it should not accept it and should ask for a retransmit. And since it is not asking for a retransmit and it passing it on and when it passes it on it adds its own IP header it could be corrupting.

First I will cycle the router then I will re-wire from the router to the switch and then switch to the firewall and then change the nic.
I will do this during the weekend. And if this doesnt work my next plan is to call the telco to check their router.
Reading on the net it seems that TCP checksum errors going all the way to the destination is very very rare because if the packet is corrupt it should not be passed on.

But in tcpdump it says protocol is TCP and it has packet info with packet ID and checksum failed.
And when I open the iptables log and look at the packet ID for the INVALID entry they match in the log and tcpdump.
How would I check for the IP checksum??
Back to top
View user's profile Send private message
buzzin
Apprentice
Apprentice


Joined: 17 Oct 2003
Posts: 264
Location: St. Albans, UK.

PostPosted: Thu Sep 15, 2005 3:41 pm    Post subject: Reply with quote

Wow, sorry to see this is really turning into a epic headache for you.

ethereal should let you inspect those checksums
Back to top
View user's profile Send private message
CriminalMastermind
Tux's lil' helper
Tux's lil' helper


Joined: 19 Nov 2003
Posts: 132
Location: toronto

PostPosted: Fri Sep 16, 2005 7:15 am    Post subject: Reply with quote

msutton wrote:
The T1 uplink was on a 5 port linksys switch and I moved it to a Cisco Switch on its own VLAN and still have the problems ruling out the switch.

makes sence to me.

msutton wrote:
The only thing after reading an enormous amount of info on the net is that the T1 router is corrupting the packet header when it sends it to me

ya, that is why i suggested a reboot if possible.

msutton wrote:
or the uplink cat 5 is bad.

i'm not sure. i wouldn't think a bad cable would behave like this. most bad cables i've seen ether don't work, or if you wiggle by the connector it starts and stops working. i guess i could see the shielding on the cable having a rip somewhere and noise being introduced, or some equipment the cable is run past is really misbehaving and radiating lots of noise, but i've never seen that. i seem to remember ethernet having a pretty high tolerance for noise, and think this is pretty unlikly. i'm not too sure on any of this though. i'd make a tin-foil helmet and ware it before approaching anything generating that much noise to interfere with ethernet.

another thing is ethernet cable should only be so long before it has to go into a hub or some other device. i don't remember the length, it's pretty far, but if you are running it a long way, you may want to google around for the maximum cable length and see if you are over it or close to it. fyi, it is different for different speeds of ethernet. gig-e is actually pretty short, again from what i remember.

if you wanted to be 100% sure everything with the cable i think i've played with a fancy cable checker that did a frequency sweep making sure the cable will handle everything ethernet will throw at it and that it's not too long. i think it was kind of expensive, but i don't really know. you could try to get your hands on one and check the cable while it's in place.

msutton wrote:
And the reason I believe it could be the T1 router, after reading the net this is what I understand, is that the router should do a tcp checksum by itself and if the packet is corrupt then it should not accept it and should ask for a retransmit. And since it is not asking for a retransmit and it passing it on and when it passes it on it adds its own IP header it could be corrupting.

yep, rebooting the router before you get the packets is where my money is. again, good to see you have done your homework, but i should point out a correction. i'm pretty sure router's don't check the tcp checksum, they check the ip checksum. i don't think routers know or care about tcp/udp at all, just ip.

buzzin wrote:
ethereal should let you inspect those checksums

yes, i second that. ethereal is your friend. if you don't have X11 installed on any of these boxes, i know it is possible to use tcpdump with some flags to capture the packets and then you can transfer the log to a computer with ethereal and open them for viewing from there. i don't remember how to do this, but you should be able to google for it.

also, what happens if you ping the router before you? do you start getting packet loss? getting back replies with bad ip checksums?

i'm not sue what that would prove, but sounds like a good thing to try.

it sure sounds like you are having fun. hope something i've said helped.
_________________
"I can picture a perfect world that knows of no war... and I can picture me attacking that world, because they'd never expect it."
Back to top
View user's profile Send private message
mjensen42
n00b
n00b


Joined: 23 Aug 2005
Posts: 23
Location: Austin, TX

PostPosted: Fri Sep 16, 2005 4:39 pm    Post subject: Checksum errors Reply with quote

Hey, are you sure the checksum failures are on the TCP level, not lower in the protocol stack? The kinds of symptoms you describe are somewhat consistent with ethernet devices trying to communicate with mis-matched speed and/or duplex settings -- but that would cause a checksum error on the packet itself, not just the TCP payload.
Back to top
View user's profile Send private message
msutton
n00b
n00b


Joined: 20 Jan 2005
Posts: 56

PostPosted: Sat Sep 17, 2005 4:00 am    Post subject: Reply with quote

OK I reset the router.

I still get invalids but now it does not complain about the checksum.

for example rsyncing (which generates invalids more than anything)
eth0 is internal
eth1 is external

this is iptables saying packet ID 46968 is INVALID

Sep 16 22:36:57 [kernel] [IPTABLES INVALID] : IN=eth1 OUT= MAC=<Mac> SRC=<RSYNC SERVER> DST=<MY EXT IP> LEN=1500 TOS=0x00 PREC=0x00 TTL=50 ID=46968 DF PROTO=TCP SPT=873 DPT=57082 SEQ=3987671666 ACK=169538902 WINDOW=58400 RES=0x00 ACK URGP=0

this is TCP dump for packet ID 46968

22:36:57.818118 IP (tos 0x0, ttl 50, id 46968, offset 0, flags [DF], length: 1500) <RSYNC SERVER>.rsync > <MY EXT IP>.57082: . 14144952:14146412(1460) ack 29488 win 58400

Can you see anything unusual by this or do you need more detail out of tcpdump??
If you need more detail please let me know what flags to use.
Back to top
View user's profile Send private message
msutton
n00b
n00b


Joined: 20 Jan 2005
Posts: 56

PostPosted: Mon Sep 19, 2005 2:34 pm    Post subject: Reply with quote

I moved the firewall into the Telco closet where the T1 comes in.
Made new cables and still getting the errors ruling out the cat5.

Now how can I set the network card duplex and speed manually from linux for a
Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 78)

With ethtool I get this
Settings for eth1:
Supported ports: [ TP MII ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Advertised auto-negotiation: Yes
Speed: 100Mb/s
Duplex: Full
Port: MII
PHYAD: 24
Transceiver: internal
Auto-negotiation: on
Current message level: 0x00000001 (1)
Link detected: yes

I want to set it to 10MB first then 10MB half duplex and see if these INVALID's go away.
Back to top
View user's profile Send private message
msutton
n00b
n00b


Joined: 20 Jan 2005
Posts: 56

PostPosted: Mon Sep 19, 2005 2:52 pm    Post subject: Reply with quote

ok I changed the speed and duplex.
And after each time I changed it I restarted the interface.

The link only works on 100mb full duplex nothing else.
Back to top
View user's profile Send private message
msutton
n00b
n00b


Joined: 20 Jan 2005
Posts: 56

PostPosted: Mon Sep 19, 2005 9:29 pm    Post subject: Reply with quote

Did some more reading

echo 255 > /proc/sys/net/ipv4/netfilter/ip_conntrack_log_invalid

now I get this in my log file which confirms my earlier findings
Sep 19 16:24:52 [kernel] ip_ct_tcp: bad TCP checksum IN= OUT= SRC=<SRC IP> DST=<My IP> LEN=1500 TOS=0x00 PREC=0x00 TTL=113 ID=6318 DF PROTO=TCP SPT=80 DPT=4148 SEQ=3946414554 ACK=1679195815 WINDOW=16861 RES=0x00 ACK URGP=0

Any other suggestions to fix the TCP checksum errors?
Back to top
View user's profile Send private message
buzzin
Apprentice
Apprentice


Joined: 17 Oct 2003
Posts: 264
Location: St. Albans, UK.

PostPosted: Mon Sep 19, 2005 10:33 pm    Post subject: Reply with quote

maybe try another network card which uses a different kernel driver?
Back to top
View user's profile Send private message
CriminalMastermind
Tux's lil' helper
Tux's lil' helper


Joined: 19 Nov 2003
Posts: 132
Location: toronto

PostPosted: Wed Sep 21, 2005 8:40 am    Post subject: Reply with quote

sorry for the late reply,

buzzin wrote:
maybe try another network card which uses a different kernel driver?

i think it's pretty slim that the problem would be the network driver. i don't think network drivers know anything about the layers above ethernet. they may have some knowledge of ip, but i'm pretty sure they don't know about udp or tcp.

msutton wrote:
Any other suggestions to fix the TCP checksum errors?

not really, you could check and make sure they are leaving there source host ok, if you had access to them, but i doubt they will be sending tcp packets with bad checksum. i've think i've seen tcp performance checkers somewhere... but again, i don't think that will help you.

i'm pretty sure routers don't have any knowledge of tcp/udp or anything above the ip level, so i don't think they will be looking at anything above the ip level. that means any of the routers the packet pass through could be corrupting the packet as it goes through there buffers.

the only way i could think of figureing out what router is corrupting the packest would be if you notice a pattern, like the same hosts always give you corrupt packets once in a while, then maybe using trace route to find common routers used. that doens't sound like anything resembling fun and i think has an extremely low probability of sucess. even if you did find what you thought was the guilty router, i don't know how you could go about proving it had a problem.

this may be one of those cases where things may not be working, yet they are. ip and tcp have check sum's built into them for a reason. data does get corrupted every once and a while along the trip. i've never look at this, so i couldn't comment on how much traffic ends up getting corrupted, but i'd look into the percentage of how much traffic you are getting corrupted vs how much good traffic you get back. i don't know where you could find what is an acceptable percentage, but if it seems low, perhaps this is just the way things are.

hope that helped.
_________________
"I can picture a perfect world that knows of no war... and I can picture me attacking that world, because they'd never expect it."
Back to top
View user's profile Send private message
msutton
n00b
n00b


Joined: 20 Jan 2005
Posts: 56

PostPosted: Wed Sep 21, 2005 1:40 pm    Post subject: Reply with quote

Last Night I figured it out.
It is a Cisco IAD 2431-8FXS.
The ethernet port on the back flips in and out of full and half duplex.
And when you unplug the cable out of it all the lights stay on even though nothing is in it.
I believe the port is borked.
I did change the card to a realtek and the realtek would log the duplex changing and the mismatches where the 3com 3c905c would not.
Guess it is just a driver thing with the logging.

But hopefully this will get squared away soon and my internet will be up at full speed again.

Thanks for all your help guys.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Networking & Security All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum