Am 17.01.2023 um 22:30 schrieb Chris Murphy
<lists(a)colorremedies.com>:
On Tue, Jan 17, 2023, at 11:51 AM, Peter Boy wrote:
>> Am 16.01.2023 um 13:23 schrieb Lennart Poettering <mzerqung(a)0pointer.de>:
>>
>> Just to say this cleary btw: when we introduced the time-out initially
>> we were coming from sysvinit where no such time-out existed at
>> all. Hence we picked a conservative (i.e. overly long) value to not
>> upset things too badly. And yes, some people were very much upset we
>> now defaulted to a time-out.
>>
>> If we'd start from scratch without sysvinit heritage, I think we
>> would have started with something much much lower right-away.
>
> When introducing a timeout, you obviously had the grace to choose a
> fairly conservative (i.e. cautious) default value that did not lead to
> major problems. Would be interesting what would have been if you had
> started with 15 sec.
Why? it was 0 sec before systemd.
As far as I understood Lennart, there was no timeout in Sys V that killed a hanging
process. But that is not the relevant point.
If anything, the time out behavior is masking problems with services
not shutting down in a timely manner.
It's not necessarily that. It is only one of at least 2 possibilities.
One possibility is indeed that a service "hangs" and therefore does not
terminate in a timely manner. This is then a bug or inappropriate programming in the
service. And there is no point in waiting for this service, you have to abort, the sooner
the better.
The other possibility, especially on a highly loaded server, is that processes impede each
other in the special situation of a shutdown and resource bottleneck resp. resource
concurrency. And this is not dependent on the individual service, but on the multitude of
services and their interdependencies. This process is not determined and is randomly
driven. The time required for a single event, i.e. an individual shutdown, is not
predictable. At best, one can approximate a range. If the range is exceeded, the
assumption of a non-faulty flow becomes increasingly improbable and there is no point in
waiting for any service anymore. No more improvement can be expected. You have to abort.
Unfortunately, we have no data in this case, only different "feelings". We
can't estimate a plausible range, we can only kind of guess. And in the case of a
server, we might be accept to wait a little longer in light of potential, major follow-on
issues.
So, the current decision is not optimal, but OK and manageable.
> The way it is proposed it doesn’t make a lot of sense. Desktops
and
> servers work very differently and have different requirements. For
> servers, this proposal in its present form makes no sense at all, and
> is on the contrary dangerous.
Why? It's been said in this thread that servers come with a higher expectation of
rebooting upon request rather than indefinitely hanging, in contrast to desktops where
there can be some tolerance for delay in exchange for safety.
Maybe I don’t fully understand this due to translation issues. On a server, a reboot is a
rare event. Optimally it is up 24/7/365. If I suffer the misfortune of having to reboot
the server, it doesn't matter if it's 45 sec, 2 min or 5 min. All important
services are redundant, there is no total failure. And the startup BIOS processing often
takes longer than any (regular) shutdown process. So, if I have 15 sec timeout instead of
2 mins, is no noticeable improvement. The most important thing is to get back up without
any damage.
What I've seen on Fedora Server when there are services that hold
things up is invariably sshd does immediately quit so now I can't even log back in to
find out what's holding up the reboot. It's quite substantially a worse Ux than on
the desktop. I mean, ostensibly I know what I'm doing on my own server and don't
need to be second guessed like a desktop user.
Yes, it's pretty annoying that ssh always reliably stops immediately, unlike all other
processes. It would be most helpful if systemd would terminate ssh last.
At least postgresql and libvirtd are configured to inhibit
reboot/shutdown indefinitely until they properly quit. Services can opt into this
behavior, overriding the default. But indefinite delay would pose a bigger problem on
server than on desktops, due to the loss of any feedback and control.
Agreed. Nobody voted for an indefinite delay, as far as I have read the posts. It's
all about how long who is willing to wait and about the relevance of possible damages.
--
Peter Boy
https://fedoraproject.org/wiki/User:Pboy
pboy(a)fedoraproject.org
Timezone: CET (UTC+1) / CEST (UTC+2)
Fedora Server Edition Working Group member
Fedora docs team contributor
Java developer and enthusiast