On Fri, Jun 06, 2014 at 01:55:09PM +0800, WANG Chao wrote:
On 06/04/14 at 09:57am, Vivek Goyal wrote:
> On Wed, Jun 04, 2014 at 11:13:45AM +0800, WANG Chao wrote:
>
> [..]
> > > > if [ $_ret -ne 0 ]; then
> > > > + echo "ssh failed after multiple tries"
> > > > echo "Could not create $DUMP_TARGET:$SAVE_PATH, you probably
need to run \"kdumpctl propagate\"" >&2
> > >
> > > Hold on. So assume that network is up but keys are not propagated or keys
> > > are not valid, we will still keep on retyring? That does not sound right.
> > >
> > > We need to retry only if network interface is not up. If ssh fails
because
> > > of no keys or wrong keys, then we should not retry.
> >
> > I'm not sure how can we do this, the return code from ssh is always 255
> > in any case of failure, ie. wrong key, no key, network issue.
>
> Hey from DUMP_TARGET, can't we figure out which local network interface
> it is routed through and then check the status of that network interface?
When network isn't ready, we can't really figure out which interface
routes to DUMP_TARGET.
There can be situations that local network is up, but there's something
wrong with the network connection between the host and local system, or
host network is initializing.
I think we need to ask networking folks and also check how apache waits
for the interfaces.
In this case, should we fail right away without trying for a few more time?
So I'm not too particular to stop trying when local network is up and
ssh fails.
I think it's not too bad to fail after 180 seconds. If it's a
configuration issue (wrong key, no key..), user could fix it after the
first time the kdump service fails, and the next time there would be no
such issues and the retry will be only for polling network connection.
In simplest form we could probably use something like "ping" and try to
ping target.
But this will have issue if target has specified that don't respond to
ping requests.
What do you think?
I am really not convinced that if keys are wrong that we should continue
to retry. Expect string of bugs on this.
We need to think of something else.
Thanks
Vivek