On 04/27/2010 07:36 PM, Steven Dake wrote:
Some of our requirements:
* Easy to use, deploy, and manage.
* 100,000 host count scalability.
* Only depend on commodity hardware systems.
* Migration works seamlessly within a datacenter without SAN hardware.
* VM block images can be replicated to N where N is configurable per VM
image.
* VM block images can be replicated to various data centers.
* Low latency block storage access for all VMs.
* Tuneable block sizes per VM.
* Use standard network mechanisms to transmit blocks to the various
replicas.
* Avoid multicast.
* Ensure only authorized host machines may connect to the vinzvault
storage areas.
* No central metadata server - everything is 100% distributed.
Neat project. There is a definite niche for vinzvault in the open
source-o-sphere.
Comments, in no particular order:
1) On the client side, block storage access should be via kernel block
device driver. This ensures a minimum number of data copies, as well as
maximum level of OS integration. Of course, that likely limits
vinzvault to Linux-only, unless people want to step up and write
non-Linux drivers.
2) How "close" is the client to the D1HT? By that, I mean, will a
vinzvault client also participate fully in the D1HT, and provide some
amount of local storage to its peers? Or will there be a clear
distinction between vinzvault clients and vinzvault servers?
This is a key design decision. Pros: If a vinzvault client is also a
server, one reduces latency and number of total data copies. Cons: If
a vinzvault client is also a server, the client must dedicate CPU and
storage resources to unrelated processes, and potentially be subject to
unanticipated, large resource loads.
3) Consider the client->storage protocol carefully. Will the client
access the vinzvault data store via TCP/IP network, or something faster
(RDMA or proprietary virtualization bus)? If traversing a TCP/IP
network, I recommend using a standardized storage protocol like iSCSI.
In such a scenario, vinzvault could be implemented as a plug-in for the
SCSI target project[1], thereby eliminating a large amount of the client
work (kernel already has an iSCSI client). I also have my own iSCSI
target daemon[2], but it is less mature than STGT.
4) Locality, locality, locality. A distributed hash table is fantastic
for widely distributing data across nodes -- but such a wide
distribution can become a problem. Inefficiency and latency from
overhead increases dramatically if a client must contact 1,000 nodes to
read 1,000 512-byte sectors, for example.
Jeff
[1]
http://stgt.sourceforge.net/
[2]
http://marc.info/?l=hail-devel&m=127275292729710&w=2