Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
kernel failure: hardware fault ("Got tx_timeout")
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo
View previous topic :: View next topic  
Author Message
Vieri
l33t
l33t


Joined: 18 Dec 2005
Posts: 881

PostPosted: Tue Mar 24, 2015 12:23 pm    Post subject: kernel failure: hardware fault ("Got tx_timeout") Reply with quote

Hi,

My Gentoo firewall was running fine until at some point pings started failing and I had to reboot the machine. Oddly, from console the system was responsive but there was no time to do any research so a simple shutdown was issued. The server was stopped, disconnected, purged of electrostatic current and cold booted. I then checked /var/log/messages and found that the system reported the following kernel issue before the general failure:

Code:

Mar 24 12:05:18 kernel: ------------[ cut here ]------------
Mar 24 12:05:18 kernel: WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0xeb/0x17a()
Mar 24 12:05:18 kernel: NETDEV WATCHDOG: enp0s8 (forcedeth): transmit queue 0 timed out
Mar 24 12:05:18 kernel: Modules linked in: cdc_acm sha1_generic ppp_mppe ppp_async crc_ccitt ppp_generic slhc authenc xfrm6_mode_tunnel xfrm4_mode_tunnel arc4 ecb md4 cifs fscache xfrm_user xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp
esp4 ah4 af_key xfrm_algo autofs4 act_police cls_basic cls_flow cls_fw cls_u32 sch_fq_codel sch_tbf sch_prio sch_htb sch_hfsc sch_ingress sch_sfq bridge stp llc xt_CHECKSUM ipt_rpfilter xt_statistic xt_CT xt_LOG xt_connlimit xt_realm xt_a
ddrtype ip_set_hash_ip xt_comment xt_recent xt_nat ipt_ULOG ipt_REJECT ipt_MASQUERADE ipt_ECN ipt_CLUSTERIP ipt_ah xt_set ip_set nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h32
3 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_udplite nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbi
os_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY nf_defrag_ipv6 ipv6 xt_time xt_TCPMSS xt_tcpmss xt_sctp xt_policy xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport xt_
mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_connmark xt_CLASSIFY xt_tcpudp xt_state iptable_raw iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack ipta
ble_mangle nfnetlink iptable_filter ip_tables x_tables snd_hda_codec_analog snd_hda_codec_generic k8temp parport_pc thermal floppy fan parport snd_hda_intel snd_hda_controller snd_hda_codec snd_pcm snd_timer asus_atk0110 snd processor i2c
_nforce2 soundcore ata_generic ohci_pci button pata_acpi thermal_sys xts gf128mul sha256_generic fuse jfs btrfs zlib_deflate multipath linear raid0 dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid10 xo
r raid6_pq dm_snapshot dm_bufio dm_crypt dm_mirror dm_region_hash dm_log dm_mod hid_sunplus hid_sony hid_samsung hid_pl hid_petalynx hid_monterey hid_microsoft hid_logitech hid_gyration hid_ezkey hid_cypress hid_chicony hid_cherry hid_bel
kin hid_apple hid_a4tech sl811_hcd ohci_hcd uhci_hcd usb_storage ehci_pci ehci_hcd mpt2sas raid_class aic94xx libsas lpfc crc_t10dif crct10dif_common qla2xxx megaraid_sas megaraid_mbox megaraid_mm megaraid aacraid sx8 DAC960 hpsa cciss 3w
_9xxx 3w_xxxx mptsas scsi_transport_sas mptfc scsi_transport_fc scsi_tgt mptspi mptscsih mptbase atp870u dc395x qla1280 dmx3191d sym53c8xx gdth advansys initio BusLogic arcmsr aic7xxx aic79xx scsi_transport_spi sg pdc_adma sata_inic162x s
ata_mv ata_piix ahci libahci sata_qstor sata_vsc sata_uli sata_sis sata_sx4 sata_nv sata_via sata_svw sata_sil24 sata_sil sata_promise pata_sl82c105 pata_cs5530 pata_cs5520 pata_via pata_jmicron pata_marvell pata_sis pata_netcell pata_sc1
200 pata_pdc202xx_old pata_triflex pata_atiixp pata_opti pata_amd pata_ali pata_it8213 pata_pcmcia pcmcia pcmcia_core pata_ns87415 pata_ns87410 pata_serverworks pata_platform pata_artop pata_it821x pata_optidma pata_hpt3x2n pata_hpt3x3 pa
ta_hpt37x pata_hpt366 pata_cmd64x pata_efar pata_rz1000 pata_sil680 pata_radisys pata_pdc2027x pata_mpiix libata
Mar 24 12:05:18 kernel: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.15.10-hardened-r1 #1
Mar 24 12:05:18 kernel: Hardware name: System manufacturer System Product Name/M2N-E, BIOS ASUS M2N-E ACPI BIOS Revision 1401 04/01/2008
Mar 24 12:05:18 kernel: 00000000 c1523207 f64b3f68 c102ca77 c14a7917 f650e000 00000000 11148026
Mar 24 12:05:18 kernel: 00000001 c102caf2 00000009 f64b3f68 c177a66b f64b3f80 c14a7917 c177a6a4
Mar 24 12:05:18 kernel: 00000108 c177a66b f650e000 c17681e5 00000000 80000100 c14a782c f64b3fb8
Mar 24 12:05:18 kernel: Call Trace:
Mar 24 12:05:18 kernel: [<c1523207>] ? dump_stack+0x3e/0x4e
Mar 24 12:05:18 kernel: [<c102ca77>] ? warn_slowpath_common+0x61/0x74
Mar 24 12:05:18 kernel: [<c14a7917>] ? dev_watchdog+0xeb/0x17a
Mar 24 12:05:18 kernel: [<c102caf2>] ? warn_slowpath_fmt+0x29/0x2d
Mar 24 12:05:18 kernel: [<c14a7917>] ? dev_watchdog+0xeb/0x17a
Mar 24 12:05:18 kernel: [<c14a782c>] ? pfifo_fast_dequeue+0xa2/0xa2
Mar 24 12:05:18 kernel: [<c10344a8>] ? call_timer_fn.isra.33+0xf/0x5a
Mar 24 12:05:18 kernel: [<c10346f9>] ? run_timer_softirq+0x126/0x16f
Mar 24 12:05:18 kernel: [<c102fb08>] ? __do_softirq+0x97/0x176
Mar 24 12:05:18 kernel: [<c102fa71>] ? __hrtimer_tasklet_trampoline+0x13/0x13
Mar 24 12:05:18 kernel: [<c100337a>] ? do_softirq_own_stack+0x1a/0x1f
Mar 24 12:05:18 kernel: <IRQ>  [<c102fd0c>] ? irq_exit+0x31/0x3d
Mar 24 12:05:18 kernel: [<c1021be1>] ? smp_apic_timer_interrupt+0x30/0x39
Mar 24 12:05:18 kernel: [<c152764d>] ? apic_timer_interrupt+0x2d/0x40
Mar 24 12:05:18 kernel: [<c1007fce>] ? default_idle+0x2/0x3
Mar 24 12:05:18 kernel: [<c1008077>] ? amd_e400_idle+0xa8/0xaa
Mar 24 12:05:18 kernel: [<c100841c>] ? arch_cpu_idle+0x6/0x7
Mar 24 12:05:18 kernel: [<c1052c6b>] ? cpu_startup_entry+0x184/0x189
Mar 24 12:05:18 kernel: [<c102086d>] ? start_secondary+0x1e1/0x209
Mar 24 12:05:18 kernel: ---[ end trace 806bc1f42cde7565 ]---
Mar 24 12:05:18 kernel: forcedeth 0000:00:08.0 enp0s8: Got tx_timeout. irq status: 00000020


The log goes on as if the machine were OK except once in a while I can read "kernel: forcedeth 0000:00:08.0 enp0s8: Got tx_timeout. irq status: 00000020".

So what can I deduce from this?
Is it a NIC hardware error (enp0s8)?

Thanks,

Vieri
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9679
Location: almost Mile High in the USA

PostPosted: Tue Mar 24, 2015 5:33 pm    Post subject: Reply with quote

Very likely. Unless it was a very well timed disconnection of your ethernet cable...

Is this the nforce onchip ethernet?

I was never very impressed by the nforce onchip ethernet on one of my m/b's. I'm not sure if it's the reverse engineered driver or not, but who knows... Might want to try another ethernet card and not use that onboard hardware.
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Vieri
l33t
l33t


Joined: 18 Dec 2005
Posts: 881

PostPosted: Wed Mar 25, 2015 7:47 am    Post subject: Reply with quote

So it's most likely to be the NIC.

It's an nvidia on-board net interface.

This is what I have on this system:

Code:

00:08.0 Bridge: NVIDIA Corporation MCP55 Ethernet (rev a2)
01:07.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 78)
02:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06)
02:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06)


My supposedly failing NIC is this one:

Code:

# lspci -vv -k -s 00:08.0
00:08.0 Bridge: NVIDIA Corporation MCP55 Ethernet (rev a2)
        Subsystem: ASUSTeK Computer Inc. Device 8239
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0 (250ns min, 5000ns max)
        Interrupt: pin A routed to IRQ 21
        Region 0: Memory at fe02a000 (32-bit, non-prefetchable) [size=4K]
        Region 1: I/O ports at b000 [size=8]
        Region 2: Memory at fe029000 (32-bit, non-prefetchable) [size=256]
        Region 3: Memory at fe028000 (32-bit, non-prefetchable) [size=16]
        Capabilities: [44] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable+ DSel=0 DScale=0 PME-
        Capabilities: [70] MSI-X: Enable- Count=8 Masked-
                Vector table: BAR=2 offset=00000000
                PBA: BAR=3 offset=00000000
        Capabilities: [50] MSI: Enable- Count=1/8 Maskable+ 64bit+
                Address: 0000000000000000  Data: 0000
                Masking: 00000000  Pending: 00000000
        Capabilities: [6c] HyperTransport: MSI Mapping Enable- Fixed+
        Kernel driver in use: forcedeth


I guess I'll have to install a new network card.
Is it worth reporting the issue to the kernel/forcedeth team or is there little interest in this and very little chance it will ever get addressed? (I mean, with all the chips out there I doubt that it can be worth the programming/debugging effort)
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9679
Location: almost Mile High in the USA

PostPosted: Wed Mar 25, 2015 2:15 pm    Post subject: Reply with quote

It's just my opinion of this Ethernet chip/manufacturer (and the history of their graphics chips) due to the state of the driver, but you can always ask... but since there's not much documentation on specific chips and there's some specific event on your system that caused the lockup, I don't know what they can do with the report. Is it repeatable? Can you cause it to happen with a specific sequence of events? Also make sure you indicate who made the hardware in case the problem is specific to your board.
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Vieri
l33t
l33t


Joined: 18 Dec 2005
Posts: 881

PostPosted: Wed Mar 25, 2015 5:08 pm    Post subject: Reply with quote

I can't reproduce the failure. At least not yet.
I'll check the log again and see if there's any event that could have triggered it but it doesn't seem obvious.
I guess I'll report the issue only if I manage to reproduce it.

Thanks
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum