Hi again,

I now understand what is going on :)

there is a bug within the driver with TCP Segmentation Offload (TSO)

the workaround is to disable TSO with the following commands

    ethtool -K eth0 tso off
    ethtool -K eth0 gso off

A patch exists and disables by default TSO with the following comment

/* TSO seems to be having some issue with Selective Acknowledge (SACK) that
 * results in lost data never being retransmitted.
 * Disable it by default now, but adds a module parameter to enable it for
 * debug purposes (the full cause is not currently understood).
 */


I issued these command and tried again the scp command and it works perfectly.


Issue logged here https://github.com/raspberrypi/linux/issues/3395


Thanks a lot

Fox





On 1/6/20 3:46 PM, Stefan Wahren wrote:
Hi,

On 06.01.20 15:30, RENARD Pierre-Francois wrote:
OK,
upgrade done, running now kernel 5.4.7-200.fc31, same issue with scp.
this sounds like a known issue [1]. There is at least a workaround [2].

[1] - https://github.com/raspberrypi/linux/issues/2482
[2] -
https://github.com/raspberrypi/linux/commit/5f0e4c1cc51a2aee86b2a554b65cb0a7909a6e02

Should I open an issue on readhat bugzilla or kernel bugzilla ?
It would be the best to report this to linux-netdev and the lan78xx
maintainer.

Thanks
Stefan

Thanks
Fox


On 1/6/20 12:48 PM, Peter Robinson wrote:
kernel is 5.3.11-300.fc31 (the one I am using these days, for F30 I
don't remember but I  guess the issue was there whatever the kernel)

My devices are all 3B+, but I can do tests with 3B also.

There is no router between client and server.

No packet is lost
outut from ping

200 packets transmitted, 200 received, 0% packet loss, time 206903ms
rtt min/avg/max/mdev = 0.357/0.453/0.659/0.036 ms



I am trying this small test

      in a shell :
          cd /net/NFSSERVER/directories_path/
          while true; do sleep 1; ls -al ; done

      in another shell
          cd /net/NFSSERVER/directories_path/
          dd if=/dev/zero of=./data.dd bs=4k count=1000000
status=progress

in the first shell each second, I have a ls command running
when I launch the second command, I have no more updates ( ls is
hanged)
whatever the configuration ....
and dd is running with an output such as

Output from RPI3B               1227259904 bytes (1.2 GB, 1.1 GiB)
copied, 99 s, 12.4 MB/s
Output from RPI3B+ & usb ethernet        2765832192 bytes (2.8 GB, 2.6
GiB) copied, 106.039 s, 26.1 MB/s
Output from RPI3B+ & native ethernet    378548224 bytes (379 MB, 361
MiB) copied, 8 s, 47.1 MB/s
      on the last one the update is also hanged on dd, here is an
update
                         744927232 bytes (745 MB, 710 MiB) copied,
247 s,
3.0 MB/s



with PI3B I was able to ctrl-c the dd command, and ls restarts to loop

with PI3B+ & usb ethenet card I was able to ctrl-c the dd command, and
ls restarts to loop

with RPI3B+ & natif ethernet I cannot ctrl-c the dd command I needed to
kill -9 the dd process and I got these messages from journal (and ls
restarts to loop)

Jan 06 12:02:43 pi12.intranet.net kernel: nfs: server syno01 not
responding, still trying
Jan 06 12:02:43 pi12.intranet.net kernel: nfs: server syno01 not
responding, still trying
Jan 06 12:02:43 pi12.intranet.net kernel: nfs: server syno01 not
responding, still trying
Jan 06 12:02:43 pi12.intranet.net kernel: nfs: server syno01 not
responding, still trying
Jan 06 12:02:45 pi12.intranet.net kernel: nfs: server syno01 not
responding, still trying
Jan 06 12:02:45 pi12.intranet.net kernel: nfs: server syno01 not
responding, still trying
Jan 06 12:02:46 pi12.intranet.net kernel: nfs: server syno01 not
responding, still trying
Jan 06 12:02:46 pi12.intranet.net kernel: nfs: server syno01 OK
Jan 06 12:02:46 pi12.intranet.net kernel: nfs: server syno01 not
responding, still trying
Jan 06 12:02:46 pi12.intranet.net kernel: nfs: server syno01 OK
Jan 06 12:02:46 pi12.intranet.net kernel: nfs: server syno01 not
responding, still trying
Jan 06 12:02:46 pi12.intranet.net kernel: nfs: server syno01 not
responding, still trying
Jan 06 12:02:47 pi12.intranet.net kernel: nfs: server syno01 OK
Jan 06 12:02:47 pi12.intranet.net kernel: nfs: server syno01 OK
Jan 06 12:02:47 pi12.intranet.net kernel: nfs: server syno01 OK
Jan 06 12:02:47 pi12.intranet.net kernel: nfs: server syno01 OK
Jan 06 12:02:47 pi12.intranet.net kernel: nfs: server syno01 OK
Jan 06 12:02:47 pi12.intranet.net kernel: nfs: server syno01 OK
Jan 06 12:02:47 pi12.intranet.net kernel: nfs: server syno01 OK
Jan 06 12:02:47 pi12.intranet.net kernel: nfs: server syno01 OK
Jan 06 12:02:48 pi12.intranet.net kernel: rpc_check_timeout: 397
callbacks suppressed
Jan 06 12:02:48 pi12.intranet.net kernel: nfs: server syno01 not
responding, still trying
Jan 06 12:02:48 pi12.intranet.net kernel: nfs: server syno01 not
responding, still trying
Jan 06 12:02:48 pi12.intranet.net kernel: nfs: server syno01 not
responding, still trying
Jan 06 12:02:48 pi12.intranet.net kernel: nfs: server syno01 not
responding, still trying
Jan 06 12:02:48 pi12.intranet.net kernel: nfs: server syno01 not
responding, still trying
Jan 06 12:02:48 pi12.intranet.net kernel: nfs: server syno01 not
responding, still trying
Jan 06 12:02:48 pi12.intranet.net kernel: nfs: server syno01 not
responding, still trying
Jan 06 12:02:48 pi12.intranet.net kernel: nfs: server syno01 not
responding, still trying
Jan 06 12:02:48 pi12.intranet.net kernel: nfs: server syno01 not
responding, still trying
Jan 06 12:02:48 pi12.intranet.net kernel: nfs: server syno01 not
responding, still trying


I am also doing the test

generating a 4GB file locally and scp this file to the NFS server

RPI3B     : 100% 4000MB   8.2MB/s   08:06
RPI3B+ & usb ethernet         : 100% 4000MB   8.8MB/s   07:32
RPI3B+ & native ethernet     : scp freezes after a few very slow MB
transfered  : "5%  216MB 4.7MB/s"
and finally I got
client_loop: send disconnect: Broken pipe
lost connection




I don't think this is a NFS issue but more generaly an issue with the
driver of the native ethernet card ??
Quite possibly a combination of driver issues with your particular
setup.

We don't do any direct development on the drivers, we simply ship
upstream kernel and drivers as we don't have the resources to that
with the RPi or any other arm device.

That said there was some fixes in the 5.4 series on the ethernet
driver shipped on the 3B+ and the current stable kernel is 5.4.7

Peter

On 12/31/19 5:40 PM, Peter Robinson wrote:
I am running Fedora 31, aarch64 release on many Raspberry PI 3B+.
You don't state which kernel you're running.

I have exactly the same behaviour with all the RPIs.
Are they all 3B+ devices?

When trying to access a NFS server ( perfectly working with my
laptops
running fedora X86) with a ls command when a huge transfer is on
going,
I have a lot of "NFS server not responding" in journal and ls is
hanged.


I tried to replace the onboard ethernet card with a USB 1GB
ethernet and
I  don't have this issue at all.

I also tried to change port on the switch with the same result ( not
working with on board card, working with USB one)

I also had such network issue with Fedora 30.
Which kernel are you running with F-30? Are you stating you see the
same issue as on F-31.

Are you aware of such a behavior, if not I will open an issue.
Not had one reported, and there are others using NFS on their
Raspberry Pis. We would need a LOT more information than what you've
supplied, the problem with NFS issues is that they are often very
dependent on the local setup, client/server/switch/network config etc
so are often hard to replicate, especially without a lot of extra
details.

          
_______________________________________________
arm mailing list -- arm@lists.fedoraproject.org
To unsubscribe send an email to arm-leave@lists.fedoraproject.org
Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives:
https://lists.fedoraproject.org/archives/list/arm@lists.fedoraproject.org