Dear All,
This should be the last RFC for today :-). We discussed some problems
related to cleanup at the slaves recently. The problem is, that slave
cleanup is directed from the controller. And if it terminates before
actually cleaning the slave, it stays hanging ther in an inconsistant
state.
To solve this, we use the -c option which will force the cleanup before
the execution of the recipes. However, this cleanup is done basically by
removing all the kernel modules for soft devices and loading them back
up. That means, we cannot use the modules for anything else than testing
(for instance the controller interface cannot be a bond). This is not
the biggest problem though; the number of configuration options and
tasks grows and we have to be able to roll them back as well (namely
system config, network manager connections, active instances of teamd
and more).
In my opinion, we should think about having some sort of "configuration
journal" to keep track of the configuration actions that has been done,
so we are able to roll them back automatically upon receiving a new
HELLO from the controller. We discussed this with Ondrej who had a similar
idea.
It could work like this:
1. HELLO from the controller for run #1
2. Controller configures the slave
3. Controller starts the test
4. Someting goes wrong, the controller fails.
5. HELLO from the controller for run #2
6. The slave detects that it has not been cleaned up properly
7. Slave does a rollback
8. Controller configures the slave
9. Controller starts the test
10. Controller sends BYE
11. Slave does a rollback
Additionally, we could add a new action to lnst-ctl called "reset" that
would just send HELLO and BYE to a slave to clean them up. But even this
would be a bit redundant in case the slaves are able to clean up
themselves.
The "configuration journal" could be basically a list of objects per
interface to remember the way each interface was configured and would
have a deconfigure method. The rollback feature is already there for
system configs.
Cheers,
Radek