On 01/12/2017 05:29 AM, Jan Gutter wrote:
When enumerating devices on systems with a large amount of
network virtual functions, the netlink receive buffer could
be too small and the message gets truncated.
This patch enables peeking: libnl will first query the buffer
size, expand the receive buffer to the correct size, then
receive the full buffer.
For a similar issue in libvirt.git, look at commit ID:
8c70d04bab7278c96390a913fa949a17cd3124f9
The difference between the two situations is that in libvirt's case,
libvirt itself is actually using a libnl socket to send/receive netlink
messages, so it can be expected that it needs to set the proper options
for buffer size, but netcf never explicitly sends/receives any netlink
messages - instead it sets up link and IP address caches (which are
filled in by libnl). This is a minor distinction, but still important -
if libnl is reading netlink messages for itself, it needs to be making
sure that it's properly setting up the netlink sockets it uses
internally to read the entire message. For that reason, upstream libnl
has recently been changed so that message peeking is turned on by
default (and a RHEL build with that change will be released soon).
Here is the upstream libnl commit that causes the increase in message size:
https://github.com/thom311/libnl/commit/90c6ebec9bd7adbe6dc7aca114b4304c1...
and here is the commit that turns on message peeking by default:
https://github.com/thom311/libnl/commit/55ea6e6b6cd805f441b410971c9dd7575...
After all that background, though, I don't have a problem with
explicitly enabling message peeking in netcf too - it will eliminate bug
reports for anyone who has a libnl build that is from between those two
commits (but also has a new enough netcf). If your aim is to see this
fixed in RHEL or CentOS though, you're going to see the problem fixed
sooner if you just wait for the libnl update.
ACK to the patch. I added the information about the libnl commits to the
commit log, and added your name/email to the AUTHORS file, then pushed
it. Thanks for the contribution!
Reviewed-by: Dinan Gunawardena <dinan.gunawardena(a)netronome.com>
Signed-off-by: Jan Gutter <jan.gutter(a)netronome.com>
---
src/dutil_linux.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/src/dutil_linux.c b/src/dutil_linux.c
index f1bf8e0..742153a 100644
--- a/src/dutil_linux.c
+++ b/src/dutil_linux.c
@@ -687,6 +687,7 @@ int netlink_init(struct netcf *ncf) {
goto error;
if (nl_connect(ncf->driver->nl_sock, NETLINK_ROUTE) < 0)
goto error;
+ nl_socket_enable_msg_peek(ncf->driver->nl_sock);
ncf->driver->link_cache =
__rtnl_link_alloc_cache(ncf->driver->nl_sock);
if (ncf->driver->link_cache == NULL)