So if a service or host is unreachable for 3 or 4 mins, we get a notification. (However most of the cases it is false positive, due to congestion or others).
Looking through my email, from what I can recall there are no false positives. xen6 had to be power-cycled which caused all the other collateral notifications.
How long was it down? Why should a normal reboot will send 23 mails? Reboot is not any exceptional thing. Is it? An alert should be when its absolutely necessary... it should report only when xen6 comes up but a service does not come up.. What do you think? Thanks.
Remembering that unresponsive and down are different things it looks like it went unresponsive ~0210 UTC (2-3 minutes before first email) - I *think* this might have just being domU's at that point, from IRC logs it looks like the dom0 was rebooted sometime around 0228 (potentially before hand I do not know).
It's 1 email per checked item for down/up and I guess in perspective, it was quite big...
IMO these reports are 'absolutely necessary' and I personally like to check it every now and then (especially after an outage like this to see if everything was back up (service/host overview on nagios web is handy for this).
- Nigel
-- Regards, Susmit.
============================================= ssh 0x86DD170A http://www.fedoraproject.org/wiki/SusmitShannigrahi =============================================
Fedora-infrastructure-list mailing list Fedora-infrastructure-list@redhat.com https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list
infrastructure@lists.fedoraproject.org