Hi,
For a few days false notification of nagios reduced. But it has increased again.
Looking at the /configs/system/nagios/services/template.cfg reveals that it is configured as max_check_attempt = 4 and retry_check_interval 1 for hosts and max_check_attempts = 3 and retry_check_interval 1.
So if a service or host is unreachable for 3 or 4 mins, we get a notification. (However most of the cases it is false positive, due to congestion or others).
How about finding out a working delay which we can afford, if a service or host is really down. How about 10 mins ? (5 attempt x 2 mins?).
Also we may list services/host which are critical and which are not. That will help to define different notification period for the different hots/services.
I thought I shall do it after the freeze, but its becoming too annoying.
Thanks
infrastructure@lists.fedoraproject.org