High Availability in Aeolus

Thursday, 9 June 2011

Since March the pacemaker cloud project [1] has been working to provide
high availability functionality for Aeolus.

What is high service availability?
==================================
A high degree of availability [2] applied to end user applications.  For
more details, see some more detailed mathematical constructs see [3].

How do you achieve high service availability?
=============================================
There are four major steps:

1. Monitor components failure
2. Isolate and terminate failed component
3. recover component by restart and escalation
4. report error so that a physical repair can be made

How does this relate to Aeolus?
===============================
Aeolus has several components that we are interested in appling the HA
methodology to.  These are applications, assemblies, and deployables.
In order to execute the high availability methodology, pacemaker-cloud
needs to know how an application is started, stopped, or monitored.
This is usually achieved via init scripts, but custom mechanisms could
also be used.

What is recovery escalation?
===========================
If steps 2 or 3 fail, it is an indicator that the higher level object
may have failed, and at minimum can no longer be trusted.  For example,
if an application fails to restart, the assembly may be bad.  To resolve
this problem, we escalate application failures into assembly failures.

Why monitor applications at all?
================================
Most enterprise software consumers want automatic restart of failed
applications with escalation because unavailability of services results
in lost opportunity.  The only way to achieve recovery is to notice the
fault happened in the first place, via active monitoring.

why just not use cloud provider to tell us assembly has failed?
===============================================================
This technique, called passive monitoring, relies on the cloud provider
to determine that the vm has failed.  Unfortunately it doesn't actually
check the health of the virtual machine internally.  A more advanced
approach is active monitoring, where the internal function of the
virtual machine is checked periodically.

What do we need out of Aeolus?
==============================
1. We need Matahari installed in the jeos assemblies.

2. An XML schema that describes the user application's start, stop, and
monitoring mechanism.  This is typically achieved through init scripts
or ocf compliant scripts on the assembly.

Regards
-steve

[1] http://www.pacemaker-cloud.org
[2] http://en.wikipedia.org/wiki/Availability
[3]
http://www.redhat.com/summit/2011/presentations/summit/whats_new/thursday...

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011