On 01/22/2014 07:33 PM, Vivek Goyal wrote:
On Mon, Jan 13, 2014 at 01:39:11PM +0100, Marek Grac wrote:
[..]
> if you have a lot of memory, you should set fence_kdump to wait
> longer (default 60 seconds)
> pcs stonith update myfence pcmk_reboot_timeout=600 --force
Hi Marek,
I think this is a problem. How would we know in advance how
much it will take for dump to finish. And it will vary depending
on so many things. (size of memory, speed of network etc).
You don't need to
know this in advance. This is set on cluster-side and
administrator should be able to set this timeout to proper value.
By default, why this value can't be very high? Or this value can act
more like a watchdog. As long as you keep on getting tick, you keep
resetting internal counter. If you don't get a tick (message from
node which is saving vmcore) for 60 seconds, then you assume
that something went wrong with the node and power cycle it.
Trying to keep an upper limit of 60 seconds and assuming dump will
finish in this time, will not help.
This is a general fence agent settings in
cluster and fence_kdump is
only one that uses 'ticking' mechanism, all other should finished in a
much more fixed time. Setting this value for kdump agent is fine as
fence_kdump itself contains a different timeout mechanism which are
based on 'ticks'. I agree that it should be explained in
documentation/kbase but it is not something what can be changed on fence
agent level.
Cluster (pacemaker/corosync) accepts that some fence agents are slower
than others, so it is possible to set this timeout value on
per-instance-of-agent with given command.