Derek,

I was/am having similar issues with the atheros wireless drivers on x86_64. The DMA stuff was kicking in for some reason. Yesterdays update mostly cleared it up for me (it was once every 45 minutes, it is down to once a day) as near as I can tell. My issue sounds very similar to yours, but it has been going on for like 6 kernel updates.

I was lucky, there is a patch for debugging they added to the dma stop function, which actually logged. There may not be the equivalent in your driver.

Otherwise, IIRC, when I was working with the pogoplug, I did have an issue with duplex settings that kept flaking out. Where the switch would go into half duplex mode on a whim. Changing the cable, even though it worked, fixed it. I think it was like "microfractures" in the wire. It may also have ended up on a different port on the switch. That is the type of stuff that usually happens under high loads. I haven't looked at the freescale FEC chip or driver.
 
Sean



From: Derek Atkins <warlord@MIT.EDU>
To: arm@lists.fedoraproject.org
Sent: Friday, March 4, 2016 11:01 PM
Subject: [fedora-arm] Wandboard Quad network dies under load?

Hi,

I'm having an issue with two different wandboard quad systems; one is
running F22, the other is running F23.  When the system is under high
network load, specifically high transmit load, after a while the network
just gives up.  Technically it's not VERY high load, only about 2MB/s,
but it's high transmit load -- high download load seems to be fine as
far as I can tell.  I know that "gives up" isn't a very technical term,
but I frankly don't know what else to call it.

* dmesg doesn't say anything about the link going down
* ifconfig shows the interface still has an IP address
* arp, however, seems to start failing (and my NFS server has an
  incomplete arp address)
* ping doesn't work to anywhere (regardless of the contents of the arp table)
* DNS doesn't work (obviously -- no packets are coming or going).

I can usually recover by doing:

  nmcli con down "Wired connection 1"
  nmcli con up "Wired connection 1"

(the 'up' results in the message "Error: Connection activation failed.")
After that I need to pull the ethernet plug, count to 5-10, and then
plug it in again.  Then I'll get the messages:

[30540.554006] fec 2188000.ethernet eth0: Link is Down       
[30553.558837] fec 2188000.ethernet eth0: Link is Up - 1Gbps/Full - flow contro

(sorry for the cut messages; minicom serial console doesn't wrap lines)

After I do this the system has network again.  However it's quite
frustrating that I have to go through all these hoops.  Note that just
pulling the network cable by itself does not seem sufficient to reset
the network.

Is this a hardware problem or a software problem (or a combination of
the two)?  I've had it happen on this one system three times today; I
can definitely reliably repeat it (although it does take a couple hours
until it dies).  It's also happened on another system, but I've not seen
it happen since I stopped pulling data from it.

Any suggestions?  I'd like to not have to go out and spend more money to
buy an Atom-based solution, even though it might be better for my use
case due to AES-NI.

Thanks,

-derek

--
      Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
      Member, MIT Student Information Processing Board  (SIPB)
      URL: http://web.mit.edu/warlord/   PP-ASEL-IA    N1NWH
      warlord@MIT.EDU                        PGP key available
_______________________________________________
arm mailing list
arm@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/arm@lists.fedoraproject.org