On Mon, 2010-05-10 at 14:26 -0400, Jeff Darcy wrote:
On 05/10/2010 11:38 AM, Steven Dake wrote:
> As there is only one writer in every case, a simple lamport time stamp
> would do the trick.
How is there only one writer? Either there's only one possible writer
globally, there's only one possible writer per object, or you have to
coordinate/reconcile between possible writers - potentially across
sites, despite partitions, etc. Nothing in the very brief description
There is only ever one VM which may write to the vm store. During
migration LTS can be used to determine who has the most recent replica.
I don't see a strong motivation for vector clocks. I don't have a clear
migration solution worked out yet, but don't see it as the most pressing
difficulty of the project...
I've seen implies either the first or second. The first is so
non-scalable it's beneath notice, so that's just as well. The second is
possible if you adopt a write-once model where the object ID has the
writer ID as a prefix and objects are immutable once written, but I
haven't seen any mention of that and in any case it leaves open the
question of how to know an object is fully written. In either the
initial-write or the migration case (which is specifically mentioned in
the requirements), it's necessary to define what will happen when
someone at one end of a slow/flaky network connection writes a block and
someone at the far end asks for it while it's still in transit, or what
happens when two nodes on opposite sides of a network partition - i.e.
lacking any means of coordination - attempt to write the same block.
Who blocks? Who fails? Who reconciles conflicts? Lamport clocks and
their relatives do not solve the problem by themselves. They merely
provide information that some other part of the system must use (often
using multiple clock vectors for multiple related pieces of data) to
arrive at a correct result.
> The reason those full database systems you mention
> have all that complicated consistency management is because they
> potentially have multiple writers. With multiple writers, after a
> failure during an update, a consistent view must be created out of the
> mess. With one writer, the worst case that could happen is that *some*
> of the blocks were not written.
This is not something that only happens under failure conditions. In a
*distributed system*, data might be absent locally even if it's present
somewhere else in the system, and an EOF before the actual end of the
file - as when the writer was notified of completion at a clearly prior
time - is incorrect data. The situation's even worse when the data are
being written to multiple destinations with no assurance of order
between them, and I've seen nothing so far that assures order.
> This happens with normal storage in a
> block system all the time during failure and is well tolerated by
> journal filesystems.
Yes, by having a journal that's fully/sequentially/consistently written
before the live filesystem, so that it can be replayed accurately even
if the writes to the live filesystem are interrupted. What do you
propose other nodes should do if they ask for data that's sitting in the
journal of a failed/unreachable node? Would those blocks be unavailable
for either reading or writing until that node comes back up, or do you
propose to implement a global distributed journal with all of the
necessary consistency and ordering to enable recovery? At some point
the hard distributed-system problems have to be solved, not just pushed
into another component.
My model with developing software is a little like blowing into many
balloons. I make a few small balloons, and blow into them and over time
they grow to encompass the problem space I am interested in solving.
The balloon blowing happens in an iterative fashion. I have found most
experienced and/or even high performing engineers are unaware of 80% of
the requirements until they are about 80% done.... What better way to
get to 80% then blow up the balloons. Rather then try to solve every
problem all at once let us focus on the smaller problems we can tackle
to give ourselves a foothold.
Regards
-steve