Re: [vinzvault] Vinzvault consistency?

Monday, 10 May 2010

On Mon, 2010-05-10 at 14:26 -0400, Jeff Darcy wrote:
...
 On 05/10/2010 11:38 AM, Steven Dake wrote:
 > As there is only one writer in every case, a simple lamport time stamp
 > would do the trick.

 How is there only one writer?  Either there's only one possible writer
 globally, there's only one possible writer per object, or you have to
 coordinate/reconcile between possible writers - potentially across
 sites, despite partitions, etc.  Nothing in the very brief description 
There is only ever one VM which may write to the vm store.  During
migration LTS can be used to determine who has the most recent replica.
I don't see a strong motivation for vector clocks.  I don't have a clear
migration solution worked out yet, but don't see it as the most pressing
difficulty of the project...

...
 I've seen implies either the first or second.  The first is so
 non-scalable it's beneath notice, so that's just as well.  The second is
 possible if you adopt a write-once model where the object ID has the
 writer ID as a prefix and objects are immutable once written, but I
 haven't seen any mention of that and in any case it leaves open the
 question of how to know an object is fully written.  In either the
 initial-write or the migration case (which is specifically mentioned in
 the requirements), it's necessary to define what will happen when
 someone at one end of a slow/flaky network connection writes a block and
 someone at the far end asks for it while it's still in transit, or what
 happens when two nodes on opposite sides of a network partition - i.e.
 lacking any means of coordination - attempt to write the same block.
 Who blocks?  Who fails?  Who reconciles conflicts?  Lamport clocks and
 their relatives do not solve the problem by themselves.  They merely
 provide information that some other part of the system must use (often
 using multiple clock vectors for multiple related pieces of data) to
 arrive at a correct result.

 > The reason those full database systems you mention
 > have all that complicated consistency management is because they
 > potentially have multiple writers.  With multiple writers, after a
 > failure during an update, a consistent view must be created out of the
 > mess.  With one writer, the worst case that could happen is that *some*
 > of the blocks were not written.

 This is not something that only happens under failure conditions.  In a
 *distributed system*, data might be absent locally even if it's present
 somewhere else in the system, and an EOF before the actual end of the
 file - as when the writer was notified of completion at a clearly prior
 time - is incorrect data.  The situation's even worse when the data are
 being written to multiple destinations with no assurance of order
 between them, and I've seen nothing so far that assures order.

 > This happens with normal storage in a
 > block system all the time during failure and is well tolerated by
 > journal filesystems.

 Yes, by having a journal that's fully/sequentially/consistently written
 before the live filesystem, so that it can be replayed accurately even
 if the writes to the live filesystem are interrupted.  What do you
 propose other nodes should do if they ask for data that's sitting in the
 journal of a failed/unreachable node?  Would those blocks be unavailable
 for either reading or writing until that node comes back up, or do you
 propose to implement a global distributed journal with all of the
 necessary consistency and ordering to enable recovery?  At some point
 the hard distributed-system problems have to be solved, not just pushed
 into another component. 
My model with developing software is a little like blowing into many
balloons.  I make a few small balloons, and blow into them and over time
they grow to encompass the problem space I am interested in solving.
The balloon blowing happens in an iterative fashion.  I have found most
experienced and/or even high performing engineers are unaware of 80% of
the requirements until they are about 80% done....  What better way to
get to 80% then blow up the balloons.  Rather then try to solve every
problem all at once let us focus on the smaller problems we can tackle
to give ourselves a foothold.

Regards
-steve

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [vinzvault] Vinzvault consistency?