On Thu, Jan 23, 2014 at 08:21:58PM +0100, Marek Grac wrote:
[..]
>How would cluster admin know how long will it take to save dump
and
>what's the right value for this parameter?
Documentation but mainly it is matter of experience and testing. It
was same in previous versions.
But dump time varies based on machine type. So if you add a machine
to cluster with large amount of memory, it could take 30minutes easily
to dump.
And there is no documentation which explains how much time it will take
to dump. Nobody knows.
[..]
>IOW, as long as fence_kdump keeps on sending message to
manager/nodes,
>every 60 seconds, theoritically dump could take inifinitely long?
Nope. Default is 60 seconds for fence agent then cluster decides
that it fails - this is tunable.
If you set this value to a really high number (like 1 day) then it
will work with fence_kdump because if there is no 'tick' it will
fail and timeout will not be applied. In general we can say that
admin can set it to such high number and do not risk. But if there
is a problem in a fence_kdump (we believe that this is not true), it
is possible that node will continue and potentially it can destroy
data. I wanted to add a link to a fence_kdump technical paper but
unfortunately it is not online anymore (I will contact author)
I am sorry I still don't understand how does this timeout logic work.
- Is it a tick based mechanism where 60 seconds represent the interval
in which atleast one tick should be received.
- Or is it absolute upper limit of time in which dump should be completed.
Thanks
Vivek