How much downtime do we afford for nagios?

Sunday, 27 April 2008

Hi,

For a few days false notification of nagios reduced. But it has increased again.

Looking at the /configs/system/nagios/services/template.cfg reveals
that it is configured as
max_check_attempt = 4 and retry_check_interval  1 for hosts
and
 max_check_attempts = 3 and retry_check_interval  1.

So if a service or host is unreachable for 3 or 4 mins, we get a
notification. (However most of the cases it is false positive, due to
congestion or others).

How about finding out a working delay which we can afford, if a
service or host is really down. How about 10 mins ? (5 attempt x 2
mins?).

Also we may list services/host which are critical and which are not.
That will help to define different notification period for the
different hots/services.

I thought I shall do it after the freeze, but its becoming too annoying.

Thanks

-- 
Regards,
Susmit.

=============================================
ssh
0x86DD170A
http://www.fedoraproject.org/wiki/SusmitShannigrahi
=============================================

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006