Hi everybody,
I'm sending some patches which implement the primary server support as I see
it. It it based on comment #6 in the related ticket
https://fedorahosted.org/sssd/ticket/1128
Patch #0001 basically adds the necessary support in failover code
Patches #0002 - #0004 extend this support in each provider
Patch #0005 documents the new concept in failover section in man pages
Patches #0006 - #0008 add new options for each provider which utilize this
concept.
Just briefly about the approach. When adding a new server to the list of
servers related to a service, each server can be marked either as primary or
secondary.
When selecting new server from failover list, the algorithm iterates over the
list twice - first it tries to look for primary server and if none is found, it
tries also secondary server.
If a server is returned from failover, and it is not primary server, a timeout
is set (currently hard-coded for 30 seconds) for primary server lookup. This
timeout is rescheduled until a primary server is found. If a primary server
(either working or neutral) is found after this timeout, status of the backend
is reset, i.e. first all offline and then all online callbacks are called. This
is done to interrupt connection to the secondary server in favor of new
connection to a primary server.
I have just couple concerns about things which I have yet to inspect. First of
all when connection to the old server is interrupted, what if an operation is
currently in progress on this connection? I know ticket #1027 induces similar
scenario. Maybe to kill two birds with one stone I could design an extension
to immediately invoke callbacks of all operations running on existing
connection. What do you think about that?
My second concern is about port status timeout. After some time a port is
marked as neutral if originally marked as not-working. That effectively leads
to second attempt to primary server reconnection being successful even though
the server is still not running. As a result, an unnecessary reconnection to
secondary server is performed once some data are needed from the server.
Thank you very much for your opinions on this
Jan