On Mon, 28 Apr 2008, Nigel Jones wrote:
On Sun, April 27, 2008 11:01 pm, Jeroen van Meeuwen wrote:
Nigel Jones wrote:
Looking through my email, from what I can recall there are no false positives. xen6 had to be power-cycled which caused all the other collateral notifications.
Collateral notifications can be caught using service dependencies and parent hosts. Do we currently use any?
I believe we do, but it wouldn't have helped in this case (I've done a bit more digging)
Half the notifications came from the external nagios instance on noc2, while the xen6/db alerts came from the internal nagios instance. Another reason why I like the current setup and don't think we should change a thing :)
Also, the UNKNOWN alerts weren't that bad, they were a precursor to the box having to restarted, only in this case was the up/down alerts a little useless. However, I'd sooner keep them as it because otherwise we run the risk of not noticing a box down immediately and get everyone under the moon asking "why can't I access fedoraproject.org... it's down your OS can't be that good".
One thing I would like implemented is event handlers. Some things (probably not this thing) could be handled automatically for us.
-Mike