Discovering machines (or: OpenSLP, a laundry list)

Wednesday, 11 December 2013

Hi,

one of the next steps for Cockpit to take would be to discover machines
on the network and show them in the "Add Server" dialog, as a
convenience.

This is a little report of where I am with that.

- OpenSLP relies on unicast replies to multicast queries.  This does not
  work in the presence of NAT or normal firewalls.

  The fundamental problem seems to be that the connection tracking
  facility in Linux does not expect the replies and thus doesn't create
  a 'RELATED' connection flow for them.  Both NAT and normal firewall
  rules rely on connection tracking to work.

  http://www.cs.helsinki.fi/linux/linux-kernel/2001-12/0789.html
  https://lkml.org/lkml/2013/5/7/614
  https://lkml.org/lkml/2013/5/7/830

  The proper fix would be to write a user space conntrack helper for use
  with "nfct helper add".  nfct is part of conntrack-tools, and we
  should probably contact upstream for advise before writing any code.

- Before we have the proper fix, we can try to workaround the conntrack
  problem.

- In order to receive the unicast replies, we can punch a small hole in
  the firewall while we listen for those replies.  The hole would allow
  packets from *:427 to the socket we listen on, nothing else.  This is
  done inside cockpitd.

- We can explicitly disable NAT for packets to the SLP multicast address
  239.155.155.253.  This needs to be done on the machine doing the NAT,
  which is normally the host for the virtual test machines.  We would
  thus need to ask people to mess with an important firewall.

  The challenge is to make a permanent change to the host firewall setup
  that doesn't get clobbered by libvirt.  Thus, the virtual network for
  the virtual machines needs to use forward mode 'route', we explicitly
  enable masquerading in the firewall (globally), and then disallow the
  multicast address:

     # firewall-cmd --add-masquerade
     # firewall-cmd --direct --add-rule ipv4 nat POSTROUTING 0 \
       -d 239.255.255.253 -j ACCEPT

  (Direct rules go before the general masquerade rule.  Not sure if this
  is guaranteed, though.)

- We can also switch NAT on before PREPARE and switch it off before
  VERIFY.  This makes OpenSLP work, and we also get better isolated test
  runs.  We really don't want to access the Internet from tests.

  (We can't have two networks and just use the right one since FreeIPA
  needs the same IP address during setup as during normal operation, so
  this needs to happen on the same network.)

- OpenSLP also has a startup problem where it enters the failed state
  reliably on every boot.  This should be easily fixable.

- We also need to open the firewall for incoming SLP queries.  Trivial.

- Using Avahi would just work.  We could list all instances of
  _workstation._tcp.local, say, or _cockpit-ssh._tcp.local.

  Avahi also has dynamic notifications, so the list of servers in the
  "Add Servers" dialog would be 'live' without any extra magic like
  polling or out-of-band notifications.

So:

* NAT on/off switch in vm-prep.
* PREPARE and VERIFY check that the switch is in the right position.
* Bugs in OpenSLP startup are workedaround.
* cockpitd learns to punch holes and to talk SLP.
* The "Add Server" dialog shows the result of find-srvs:service-agent.
* Tests for the above.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013