Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Network connectivity problems: bad NIC? bad switch? bug?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Networking & Security
View previous topic :: View next topic  
Author Message
seifn06
Tux's lil' helper
Tux's lil' helper


Joined: 19 Sep 2004
Posts: 79
Location: Lowell, Michigan

PostPosted: Tue Dec 30, 2014 8:36 pm    Post subject: Network connectivity problems: bad NIC? bad switch? bug? Reply with quote

Any help troubleshooting or solving the following network connectivity problem is most appreciated. I'm trying to identify the cause of network connection failures between several computers so I can correct/fix the problem and avoid the connectivity problems in the future.

I run a small LAN with two computers running Gentoo and eight 64-bit Windows 7 Pro computers. Both Gentoo boxes were built on the x86_64 platform and both run gentoo-sources kernels: 3.16.5 and 3.17.7 respectively. I run several network services on the 3.16.5 computer including Samba for file sharing, sendmail/courier-imap for email, and dnsmasq for DNS and DHCP. Samba, sendmail, courier-imap and dnsmasq are mostly the current available/stable versions available through Portage.
Seemingly at random with no warning, the Win7 boxes will lose connection to the Samba share on the Gentoo box and I am unable to ping the Gentoo box (either its IP address or its hostname) from the same Win7 computers. What's odd is that I can ping the second Gentoo box from the affected Win7 computers. I can ssh into the second Gentoo box and then ping and/or ssh into the first Gentoo box. All of the Windows computers on this LAN are configured with static IP addresses and with only one DNS server which is the LAN IP address of one of the Gentoo boxes running dnsmasq.

The first server shows no relevant info in /var/log/messages. dmesg on the first Gentoo box may offer a clue to what's going on, but I do not know when the NIC up/down messages occurred because I don't understand when these events occurred. (Does dmesg show a timestamp here? Ex: in seconds into the current epoch?, etc.?) I don't see a corresponding NIC up/down message in /var/log/messages around the time of the most recent connection failure. This Gentoo box motherboard has an onboard NIC which I'm not using as well as a PXI-express Intel dual-port gigabit NIC. I'm only using one of the two ports on the Intel NIC.
Code:

# dmesg
...
[    7.672468] systemd-udevd[1396]: renamed network interface eth1 to enp3s0f1
[    7.678459] systemd-udevd[1392]: renamed network interface eth0 to enp3s0f0
[    7.732424] systemd-udevd[1397]: renamed network interface eth2 to enp6s0
[    8.535134] ntfs: driver 2.1.30 [Flags: R/W DEBUG MODULE].
[    8.862414] EXT4-fs (sda4): re-mounted. Opts: (null)
[    9.028111] Adding 8388604k swap on /dev/sda3.  Priority:-1 extents:1 across:8388604k
[   10.154779] e1000e 0000:03:00.1: irq 70 for MSI/MSI-X
[   10.255183] e1000e 0000:03:00.1: irq 70 for MSI/MSI-X
[   10.255380] IPv6: ADDRCONF(NETDEV_UP): enp3s0f1: link is not ready
[   12.186125] e1000e: enp3s0f1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[   12.186273] IPv6: ADDRCONF(NETDEV_CHANGE): enp3s0f1: link becomes ready
[   12.431828] systemd-udevd (1404) used greatest stack depth: 12576 bytes left
[130546.987786] e1000e: enp3s0f1 NIC Link is Down
[130550.483897] e1000e: enp3s0f1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[218219.185772] e1000e: enp3s0f1 NIC Link is Down
[218222.695873] e1000e: enp3s0f1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[822229.204920] e1000e: enp3s0f1 NIC Link is Down
[822232.286229] e1000e: enp3s0f1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx


Logs in /var/log/samba show little information and nothing that seems relevant to this problem.

One affected Win7 client computer is connected to the same unamanaged 24-port gigabit network switch that both Gentoo boxes connect. A second affected Win7 client computer connects to a second unmanaged 5-port gigabit switch which is uplinked to the 24-port switch connecting the first three computers.

Rebooting the client computers which have lost connectivity to the server usually fixes the problem: the clients are once again able to connect to the Gentoo server. Both Windows 7 client computers run Avast Internet Security 2015. I have not yet tried disabling Avast during one of these events to see if I can then ping/ssh into the Gentoo box from the affected Windows computers. I have had problems with Avast breaking the Win7's Thunderbird email software's connection to courier-imapd on the Gentoo server in the past. I would not think that Avast would prevent me from pinging, though.

The fact that two different client computers experience the same issue suggests to me that the problem is with the Gentoo server, its network card or its port on the network switch. But the fact that I can indirectly access the first Gentoo box from the second one really throws me.

On the primary Gentoo box in question:
Code:

 # uname -a
Linux ____ 3.16.5-gentoo #2 SMP Sun Dec 7 23:32:59 EST 2014 x86_64 Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz GenuineIntel GNU/Linux


Code:
 
# lspci | grep net
03:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06)
03:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06)
06:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)


Code:

 # emerge -p gentoo-sources samba dnsmasq

These are the packages that would be merged, in order:

Calculating dependencies... done!
[ebuild  NS    ] sys-kernel/gentoo-sources-3.17.7 [3.16.5] USE="-build -deblob -experimental -symlink"
[ebuild   R    ] net-dns/dnsmasq-2.72
[ebuild     U  ] net-fs/samba-3.6.24 [3.6.23-r1]



Questions:
(1) How do I know when an error/event showing in dmesg occurred given that it does not appear to be timestamped with a month/day/year, etc.?
(2) Does the [822229.204920] e1000e: enp3s0f1 NIC Link is Down event showing in dmesg tell me that the NIC on my Gentoo box is failing? Or, could it also mean that the connection between the NIC and the network switch failed? Might it suggest a loose or faulty network cable? Or could it also suggest a problem with the network switch?
(3) Could a failing hard drive on the Gentoo box cause these problems if network sockets/information stored on the disk are lost?
(4) How is it that two Win7 computers cannot ping the IP address or the hostname of the Gentoo server described above, cannot access Samba shares on the same Gentoo server but can ssh into a second Gentoo box and then ssh into the first, seemingly inaccessible/offline Gentoo box?

Any input or suggestions as to how to troubleshoot from here are greatly appreciated!
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 42569
Location: 56N 3W

PostPosted: Tue Dec 30, 2014 8:56 pm    Post subject: Reply with quote

seifn06,

(1)
Code:
[130546.987786] e1000e: enp3s0f1 NIC Link is Down
[130550.483897] e1000e: enp3s0f1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx

The number 130546.987786 is seconds since the last boot.
uptime may tell you when that was. Beware that it wraps on some systems.

(2) Link is Down, tells you just that. It gives no hints as to why. So all of your possibilities are correct.

(3) Probably not, that info should be in /run or /tmp which really should be in tmpfs in RAM. In any case, a HDD error would probably affect all NICs equally and its just
Code:
03:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06)
03:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) <---
that is affected.

Can you swap the network cables over in the dual port Intel card and see if the problem stays on the same port or moves with the cable?
You will need to adjust the network setup to suit.
If it stays on the same port, its probably the card.
If it changes with the cable, its the cable or something at the other end of the cable.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
seifn06
Tux's lil' helper
Tux's lil' helper


Joined: 19 Sep 2004
Posts: 79
Location: Lowell, Michigan

PostPosted: Tue Dec 30, 2014 10:10 pm    Post subject: Reply with quote

I'll try swapping network cable on the Intel NIC and see if the problem follows the cable. It may be a few days before I post results as that's how intermittent this problem is. Thank you for the quick reply!
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 13498

PostPosted: Tue Dec 30, 2014 11:07 pm    Post subject: Reply with quote

For (1), you should check the log file written by your system logger. That log file will have both the kernel's offset-based timestamp and a traditional day/month/hour/minute/second timestamp. The system logger configuration file will show you which file(s) (if any) it writes kernel log messages to. If you are in a hurry, check /var/log/kern.log, which is the traditional name for kernel-specific logging on machines that save kernel messages to a separate file.
Back to top
View user's profile Send private message
Ant P.
Watchman
Watchman


Joined: 18 Apr 2009
Posts: 5592

PostPosted: Tue Dec 30, 2014 11:32 pm    Post subject: Reply with quote

`dmesg -H` will convert those timestamps into something readable for you.
Back to top
View user's profile Send private message
seifn06
Tux's lil' helper
Tux's lil' helper


Joined: 19 Sep 2004
Posts: 79
Location: Lowell, Michigan

PostPosted: Thu Jan 08, 2015 9:22 pm    Post subject: Reply with quote

Thank you, everyone for your input and quick replies. I'm looking into reconfiguring my syslog-ng software for kernel logging. And the dmesg -H does make those dates more useful.

I'm thinking now that this is not a Gentoo problem or even a problem with my Gentoo box: all network connections appear to work normally between Win7 client computers and Gentoo server computer as soon as I disable Avast Internet Security 2015 software on the Win7 client computers. I'm taking this issue up with Avast. I'll update with any new information.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Networking & Security All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum