RFC: Pluggable modules to allow administrators to select their preferred provider selection algorithm

Thursday, 5 April 2012

We should present administrators, with the ability to configure the 
launch-time provider account selection policy for a specific pool, and 
to set a global default policy to apply in pools where no custom policy 
is defined.

The policy would be applied after a set of viable provider accounts has 
been identified. Those will be the provider accounts to which the 
relevant images have been pushed, and for which a set of hardware 
profile matches can be made etc..

The selection policy should work by defining a probability distribution, 
stating how likely each provider account is to be selected to host the 
new deployment, expressed as a percentage. Once those percentage are 
calculated, Conductor should pick a random number between 1 and 100 and 
attempt to launch on the lucky provider account.

Using a probability range and randomly selecting within that range might 
seem counter-intuitive: Having done the maths and assigned a numerical 
probability to each provider account, based on its suitability to host 
the deployment, why not just launch on the "best" provider? The issue is 
one of scale. When considering a single launch, selecting the account 
which gathered the highest score makes sense, but once Conductor is 
managing a large volume of deployments, the downside of that approach 
becomes clear - If one provider account gathered more than 50% of the 
probability ranking, it would get 100% of the instances, without the 
randomness.

Whilst the various policies should be stackable, one of the two 
following policies should be the initial basis for the calculation:

*Round robin, with optional weighting: *

With this policy, Conductor would use each of the available provider 
accounts equally, by assigning the same probability to each of them. 
Varying the probabilities, to assign a weighting, would be useful in 
instances where the private cloud providers associated with each 
provider account are of differing sizes. e.g. Three vSphere clusters, 
one of which has double the capacity of the other two. In that 
circumstance, the Administrator could adjust the weighting ratios to 
more closely reflect the actual capacities of each cluster.

It is worth noting that this isn't strictly round robin. The provider 
accounts wouldn't be selected in strict rotation, though the overall 
result is the same.

*Least used, with optional weighting: *

This policy would make most sense in scenarios where Conductor is the 
sole means by which instances are launched on private cloud providers. 
Conductor would seek to ensure that the usage of the providers was 
balanced, by giving a higher probability to whichever provider accounts 
are currently least used. As with round robin, the weightings could be 
adjusted to reflect differing capacities between providers.

Having used on of those two policies to acquire an initial set of 
probabilities, administrators could then elect to apply additional 
policies, including:

*Assigned priority: *

The probability assigned to each provider account would be adjusted 
according to the provider accounts' priority, by increasing the 
probability ranking percentage of the higher priority provider accounts 
at the expense of the lower priority ones

*Punishing failure: *

Once the audit history records past failures, for each occurrence of a 
launch failure within a configurable period (6 hours feels reasonable), 
a provider account would be fined 5% from its probability ranking. This 
would serve to reduce the attempts to launch on a provider which is 
running out of capacity, or experiencing hardware issues etc.

*Cost *

There are three principle cloud uses which can incur costs: consumption 
of network bandwidth, consumption of storage and running a VM.

Happily, only one of these needs to be a factor when Conductor is 
selecting a provider account to launch: the cost of running the VM.

The amount of network bandwidth that a deployment will consume is pretty 
much unknowable at launch time. And, if it is known because, for 
example, a deployment is for a streaming media server, then 
Administrators can minimize costs by only launching that deployment in a 
"Low cost bandwith" pool.

As long as we're not supporting deployments which include the allocation 
of additional storage, the costs of storage consumption are an issue to 
consider at build & push time, rather than at launch time.

So, in order to allow cost to be another factor which affects the 
probability rankings, all we need is a cost per realm, per hour, for 
each provider hardware profile, for each provider account.

Admins are going to have to enter that data themselves. That's not as 
onerous as it sounds, given that, for example, it will often be the case 
that costs will not vary across realms, so the UI can help by pre-filling.

Clearly, for private clouds, no alternative means for getting pricing 
data into Conductor exists. For public providers, it would be beneficial 
if their APIs exported list pricing, however:

- Few organizations which operate on a scale which justifies using 
Aeolus are likely to be paying list price

- Organizations may wish to store and export the adjusted costs that 
they'll be assigning to users, rather than the basic costs appearing on 
the provider's monthly invoice.

Once the cost data is in Conductor, adjusting the probabilities of each 
provider account to favour whichever provider could more cheaply host 
the specific range of hardware profiles would be a relatively simple 
matter of increasing the selection probability percentages of cheaper 
provider accounts, by a configurable amount, at the expense of the more 
costly provider accounts.

Having completed the stack of modules' calculations, the result is a 
final set of probabilities. At this point, Conductor would roll the 
loaded dice and attempt to launch on the winner.

The UI to allow Administrators to enable modules, and to tune the 
parameters associated with them, could give a real-time representation 
of the effect of the current settings for a specific deployable. A 
certain type of Administrator would be very happy, tuning options and 
seeing an immediate change in, for example, a pie chart, which showed 
the resultant probability ranking percentages.

In future, we could provide Administrators with an interface to 
implement their own selection modules. They might choose, for example, 
to vary the selection probability percentages according to time and 
date, to increase usage of private cloud at times when they would 
otherwise be relatively idle.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011