Re: How much downtime do we afford for nagios?

Sunday, 27 April 2008

...
>  > So if a service or host is unreachable for 3 or 4 mins, we
get a
>  > notification. (However most of the cases it is false positive, due to
>  > congestion or others).
>  Looking through my email, from what I can recall there are no false
>  positives.  xen6 had to be power-cycled which caused all the other
>  collateral notifications.

 How long was it down?  Why should a normal reboot will send 23 mails?
 Reboot is not any exceptional thing. Is it?
 An alert should be when its absolutely necessary...
 it should report only  when xen6 comes up but a service does not come up..
 What do you think?
 Thanks. Remembering that unresponsive and down are different things it looks like
it went unresponsive ~0210 UTC (2-3 minutes before first email) - I
*think* this might have just being domU's at that point, from IRC logs it
looks like the dom0 was rebooted sometime around 0228 (potentially before
hand I do not know).

It's 1 email per checked item for down/up and I guess in perspective, it
was quite big...

IMO these reports are 'absolutely necessary' and I personally like to
check it every now and then (especially after an outage like this to see
if everything was back up (service/host overview on nagios web is handy
for this).

- Nigel
...

 --
 Regards,
 Susmit.

 =============================================
 ssh
 0x86DD170A
 http://www.fedoraproject.org/wiki/SusmitShannigrahi
 =============================================

 _______________________________________________
 Fedora-infrastructure-list mailing list
 Fedora-infrastructure-list(a)redhat.com
 https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006