short update on where we are headed
by Steven Dake
Hi,
Posting a short update as to where we are headed. I have had a good
long look at the sheepdog source base and believe the sheepdog devs and
community have developed a great solution. To say it is brilliant is an
understatement.
Since there is no point in re-inventing the wheel when a better one
already exists, I want us to solve one obstacle to wide-scale sheepdog
deployment.
In simple terms, sheepdog requires a membership system and low
performance atomic commit framework. Corosync provides a membership
system and very high performance atomic commit framework, but at low
scale. By relaxing performance requirements around commits, we can
scale to very high node counts.
The goals of this project are changing as a result to provide a high
scale membership protocol (d1ht) as well as low performance commit
framework. While I personally feel this has many applications outside
of sheepdog, our first target will be to provide a framework for which
sheepdog can use for membership and commit.
Regards
-steve
13 years, 10 months
announcement of the vinzvault project
by Steven Dake
CCing openais and linux-cluster ml since there may be some cluster
developers interested in participating in this project there.
I am pleased to announce we are starting a new project called vinzvault
to help resolve some of the difficulties in deploying virtual machines
in data-centers.
There are other projects that use similar technology or have similar
goals as ours. The Ceph filesystem provides a cloud file-system for
large scale machines to use as storage. Hail provides a S3 API for
accessing information. Cassandra provides a distributed database using
techniques similar to what we are planning to provide eventually
consistent replicated bigtable style databases.
Our project is focused around one goal: providing a small footprint
(10kloc) highly available block storage area for virtual machines
optimized for Linux data-centers. Our plans don't depend on SAN
hardware, software, hardware fencing devices, or any other hardware then
is commonly available on commodity hardware. We intend to trade these
lower-scale high cost technologies for higher-scale lower cost
techniques.
Some of our requirements:
* Easy to use, deploy, and manage.
* 100,000 host count scalability.
* Only depend on commodity hardware systems.
* Migration works seamlessly within a datacenter without SAN hardware.
* VM block images can be replicated to N where N is configurable per VM
image.
* VM block images can be replicated to various data centers.
* Low latency block storage access for all VMs.
* Tuneable block sizes per VM.
* Use standard network mechanisms to transmit blocks to the various
replicas.
* Avoid multicast.
* Ensure only authorized host machines may connect to the vinzvault
storage areas.
* No central metadata server - everything is 100% distributed.
We plan to execute this project using an overlay DHT hash table called
D1HT(1). The 1 in D1HT indicates there is, in a majority of cases, only
1 network request/response required per block of storage. Like all
solutions that trade performance for scale/cost, our project may not
meet your deployment needs, but we aim to focus on correctness first and
performance second. We hope readers will participate in the development
of this LGPL/GPL open source project.
Our mailing list is vinzvault(a)fedorahosted.org.
One final note - no code is in our repo yet - that is for developers
interested in this technology to make happen (this is a from scratch
implementation). Lets get cracking!
Regards
-steve
(1) http://www.cos.ufrj.br/~monnerat/D1HT_paper.html
13 years, 10 months
[PATCH] calculate SHA1 of joining members
by Steven Dake
Use Network Security Services SHA1 hash to calculate hash of joining members.
This hash is printed to the user in hex.
Signed-off-by: Steven Dake <sdake(a)redhat.com>
---
src/d1htedra.c | 23 +++++++++++++++++++----
1 files changed, 19 insertions(+), 4 deletions(-)
diff --git a/src/d1htedra.c b/src/d1htedra.c
index 68eef63..1bd72f3 100644
--- a/src/d1htedra.c
+++ b/src/d1htedra.c
@@ -41,6 +41,7 @@
#include <sys/time.h>
#include <sys/poll.h>
#include <limits.h>
+#include <nss3/sechash.h>
#include "logsys.h"
@@ -148,6 +149,8 @@ struct d1htedra_instance {
void *d1htnet_context;
struct d1ht_config *d1ht_config;
+
+ HASHContext *hash;
};
struct message_handlers {
@@ -342,6 +345,8 @@ int d1htedra_initialize (
main_deliver_fn,
main_iface_change_fn);
+ instance->hash = HASH_Create (HASH_AlgSHA1);
+
*srp_context = instance;
return (0);
@@ -393,7 +398,10 @@ static inline void member_add (
struct d1ht_ip_address *addr)
{
uint32_t i;
-
+ unsigned char result[20];
+ uint32_t result_len;
+ char result_string[128];
+
/*
* Prevent duplicate membership additions
*/
@@ -402,9 +410,16 @@ static inline void member_add (
return;
}
}
-
- log_printf (LOGSYS_LEVEL_NOTICE, "JOINED member (%s)\n",
- d1htip_print (addr));
+ HASH_Begin (instance->hash);
+ HASH_Update (instance->hash, (const unsigned char*)addr,
+ sizeof (struct d1ht_ip_address));
+ HASH_End (instance->hash, result, &result_len, 20);
+ for (i = 0; i < 20; i++) {
+ sprintf (&result_string[i*2], "%02x", result[i]);
+ }
+
+ log_printf (LOGSYS_LEVEL_NOTICE, "JOINED member (%s) hash(%s)\n",
+ d1htip_print (addr), result_string);
memcpy (&instance->my_member_list[instance->my_member_count],
addr, sizeof (struct d1ht_ip_address));
instance->my_member_count++;
--
1.6.2.5
13 years, 10 months
[PATCH] Initialize NSS crypto in UDP layer
by Steven Dake
This patch initializes Network Security Services in the UDP layer
Signed-off-by: Steven Dake <sdake(a)redhat.com>
---
src/d1htudp.c | 2 --
1 files changed, 0 insertions(+), 2 deletions(-)
diff --git a/src/d1htudp.c b/src/d1htudp.c
index 0f64992..3f2e2b4 100644
--- a/src/d1htudp.c
+++ b/src/d1htudp.c
@@ -582,9 +582,7 @@ static void init_crypto(struct d1htudp_instance *instance)
log_printf(LOGSYS_LEVEL_NOTICE,
"Initializing transmit/receive security: not using security (mode 0).\n");
}
-#ifdef HAVE_NSS
init_nss_crypto (instance);
-#endif
}
int d1htudp_crypto_set (
--
1.6.2.5
13 years, 10 months
[PATCH] Build binary with Network Security Services library dependency
by Steven Dake
We will be using Network Security Services for our encryption and
authentication. This patch adds the required library includes and ldflags.
It also fixes broken compile UDP code when nss is enabled.
Signed-off-by: Steven Dake <sdake(a)redhat.com>
---
configure.ac | 7 ++-----
src/Makefile.am | 5 ++---
src/d1htudp.c | 25 +++++++++++--------------
3 files changed, 15 insertions(+), 22 deletions(-)
diff --git a/configure.ac b/configure.ac
index f5fe4e2..16a35c3 100644
--- a/configure.ac
+++ b/configure.ac
@@ -214,11 +214,8 @@ else
fi
# Look for libnss
-if test "x${enable_nss}" = xyes; then
- PKG_CHECK_MODULES([nss],[nss])
- AC_DEFINE_UNQUOTED([HAVE_LIBNSS], 1, [have libnss])
- PACKAGE_FEATURES="$PACKAGE_FEATURES nss"
-fi
+PKG_CHECK_MODULES([nss],[nss])
+AC_DEFINE_UNQUOTED([HAVE_LIBNSS], 1, [have libnss])
if test "x${enable_testagents}" = xyes; then
AC_DEFINE_UNQUOTED([HAVE_TESTAGENTS], 1, [have testagents])
diff --git a/src/Makefile.am b/src/Makefile.am
index d356004..603a1cd 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -19,14 +19,13 @@ MAINTAINERCLEANFILES = Makefile.in
AM_CFLAGS = -fPIC
-INCLUDES = -I$(top_builddir)/include -I$(top_srcdir)/include $(nss_CFLAGS) $(rdmacm_CFLAGS) $(ibverbs_CFLAGS)
-
+INCLUDES = -I$(top_builddir)/include -I$(top_srcdir)/include $(nss_CFLAGS)
sbin_PROGRAMS = vinzvault
vinzvault_SOURCES = d1htpoll.c d1htip.c d1htnet.c d1htudp.c \
d1htedra.c logsys.c main.c
-vinzvault_LDADD =
+vinzvault_LDADD = $(nss_LIBS)
vinzvault_DEPENDENCIES =
noinst_HEADERS = tlist.h d1htnet.h d1htudp.h d1htip.h d1htedra.h d1ht.h
diff --git a/src/d1htudp.c b/src/d1htudp.c
index ee30dd0..0f64992 100644
--- a/src/d1htudp.c
+++ b/src/d1htudp.c
@@ -238,17 +238,15 @@ static void init_nss_crypto (struct d1htudp_instance *instance)
goto out;
}
- aes_slot = PK11_GetBestSlot(instance->d1ht_config->crypto_crypt_type, NULL);
- if (aes_slot == NULL)
- {
+ aes_slot = PK11_GetBestSlot(instance->d1ht_config->crypto_type, NULL);
+ if (aes_slot == NULL) {
log_printf(LOGSYS_LEVEL_NOTICE, "Unable to find security slot (err %d)\n",
PR_GetError());
goto out;
}
sha1_slot = PK11_GetBestSlot(CKM_SHA_1_HMAC, NULL);
- if (sha1_slot == NULL)
- {
+ if (sha1_slot == NULL) {
log_printf(LOGSYS_LEVEL_NOTICE, "Unable to find security slot (err %d)\n",
PR_GetError());
goto out;
@@ -261,13 +259,12 @@ static void init_nss_crypto (struct d1htudp_instance *instance)
key_item.len = 32; /* Use 128 bits */
instance->nss_sym_key = PK11_ImportSymKey(aes_slot,
- instance->d1ht_config->crypto_crypt_type,
+ instance->d1ht_config->crypto_type,
PK11_OriginUnwrap, CKA_ENCRYPT|CKA_DECRYPT,
&key_item, NULL);
- if (instance->nss_sym_key == NULL)
- {
+ if (instance->nss_sym_key == NULL) {
log_printf(LOGSYS_LEVEL_NOTICE,
- intf(instance->d1htudp_log_level_security, "Failure to import key into NSS (err %d)\n",
+ "Failure to import key into NSS (err %d)\n",
PR_GetError());
goto out;
}
@@ -341,7 +338,7 @@ static int encrypt_and_sign_nss (
iv_item.len = sizeof (nss_iv_data);
nss_sec_param = PK11_ParamFromIV (
- instance->d1ht_config->crypto_crypt_type,
+ instance->d1ht_config->crypto_type,
&iv_item);
if (nss_sec_param == NULL) {
log_printf(LOGSYS_LEVEL_NOTICE,
@@ -354,7 +351,7 @@ static int encrypt_and_sign_nss (
* Create cipher context for encryption
*/
enc_context = PK11_CreateContextBySymKey (
- instance->d1ht_config->crypto_crypt_type,
+ instance->d1ht_config->crypto_type,
CKA_ENCRYPT,
instance->nss_sym_key,
nss_sec_param);
@@ -364,7 +361,7 @@ static int encrypt_and_sign_nss (
err[PR_GetErrorTextLength()] = 0;
log_printf(LOGSYS_LEVEL_NOTICE,
"PK11_CreateContext failed (encrypt) crypt_type=%d (err %d): %s\n",
- instance->d1ht_config->crypto_crypt_type,
+ instance->d1ht_config->crypto_type,
PR_GetError(), err);
return -1;
}
@@ -390,7 +387,7 @@ static int encrypt_and_sign_nss (
PR_GetErrorText(err);
err[PR_GetErrorTextLength()] = 0;
log_printf(LOGSYS_LEVEL_NOTICE,
- ,"encrypt: PK11_CreateContext failed (digest) err %d: %s\n",
+ "encrypt: PK11_CreateContext failed (digest) err %d: %s\n",
PR_GetError(), err);
return -1;
}
@@ -504,7 +501,7 @@ static int authenticate_and_decrypt_nss (
ivdata.len = sizeof(header->salt);
enc_context = PK11_CreateContextBySymKey(
- instance->d1ht_config->crypto_crypt_type,
+ instance->d1ht_config->crypto_type,
CKA_DECRYPT,
instance->nss_sym_key, &ivdata);
if (!enc_context) {
--
1.6.2.5
13 years, 10 months
[PATCH] Verify events are from predecessor before resetting tdetect
by Steven Dake
In a 4 node cluster, stopping two nodes at about the same time would result
in one node leave event not being detected. This is caused by tdetect being
reset in all cases prior to this patch. This patch only resets the tdetect
timer when the event is from the predecessor on the ring.
Further, if tdetect expires, a node is unable to detect a new failure in the
predecessor. This is because tdetect is not reset in the tdetect timer
handler. As long as there is more then one member, tdetect is reset when it
expires to potentially catch a failure of two or more nodes.
Signed-off-by: Steven Dake <sdake(a)redhat.com>
---
src/d1htedra.c | 14 +++++++++++++-
1 files changed, 13 insertions(+), 1 deletions(-)
diff --git a/src/d1htedra.c b/src/d1htedra.c
index 60a4e6b..68eef63 100644
--- a/src/d1htedra.c
+++ b/src/d1htedra.c
@@ -586,6 +586,9 @@ static void timer_function_tdetect (void *data)
target);
}
reset_events (instance);
+ if (instance->my_member_count > 1) {
+ reset_timer_tdetect (instance);
+ }
}
static void cancel_timer_lookup (struct d1htedra_instance *instance)
@@ -638,8 +641,17 @@ static int message_handler_event (
int i;
uint32_t my_ttl;
+ struct d1ht_ip_address *pred;
+
log_printf (LOGSYS_LEVEL_DEBUG, "message_handler_event\n");
- reset_timer_tdetect (instance);
+
+ /*
+ * Verify that event is from pred before reseting the tdetect timer
+ */
+ member_pred (instance, 1, &pred);
+ if (pred && (d1htip_equal (pred, &msg_event->source) == 1)) {
+ reset_timer_tdetect (instance);
+ }
for (i = 0; i < msg_event->stored_events_count; i++) {
if (msg_event->stored_events[i].type == EVENT_TYPE_JOIN) {
member_add (instance, &msg_event->stored_events[i].source);
--
1.6.2.5
13 years, 10 months
[PATCH] Fix make distcheck
by Steven Dake
The operation make distcheck was failing. Add d1ht.h to header list
in src/Makefile.am.
Signed-off-by: Steven Dake <sdake(a)redhat.com>
---
src/Makefile.am | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/src/Makefile.am b/src/Makefile.am
index 19d298f..d356004 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -29,7 +29,7 @@ vinzvault_SOURCES = d1htpoll.c d1htip.c d1htnet.c d1htudp.c \
vinzvault_LDADD =
vinzvault_DEPENDENCIES =
-noinst_HEADERS = tlist.h d1htnet.h d1htudp.h d1htip.h d1htedra.h
+noinst_HEADERS = tlist.h d1htnet.h d1htudp.h d1htip.h d1htedra.h d1ht.h
EXTRA_DIST =
--
1.6.2.5
13 years, 10 months
Initial D1HT implementation
by Steven Dake
Hello
Find attached a patch which formulates the start of the vinzvault repo.
I have gone ahead and committed it to our repository so people can
easily clone the source tree with git. This first patch implements
automake and the D1HT EDRA algorithm. Please read the README file for
more details on software configuration.
In the future each change will be mailed as a separate patch to allow
people to follow our progress and review changes.
To clone the repo and look at source directly:
git clone git://git.fedorahosted.org/git/vinzvault.git
Regards
-steve
13 years, 10 months
Re: [vinzvault] Vinzvault consistency?
by Steven Dake
ccing vinzvault ml since question is generally applicable there
On Mon, 2010-05-10 at 11:11 -0400, Jeff Darcy wrote:
> One of the things that I was surprised not to see in your list of
> requirements is consistency. I know that you don't need instantaneous
> fine-grain serializable consistency, but it seems like there still has
> to be some assurance that an instantiated image won't be subject to
> arbitrary data-retrieval failures or staleness even though the image was
> written somewhere else quite recently with an unknown network
> environment in between. As soon as there's a need for any level of
> consistency at all, there's an issue of how to prevent/avoid conflicts
> or reconcile results after they've occurred - e.g. Hinted Handoff, Read
> Repair, and Anti-Entropy in the Dynamo/Cassandra model - without
> creating a bottleneck at the coordination point(s). An image-storage
> system that's "best effort" and either fails or delivers incorrect
> (stale) data under conditions that are impossible to predict or to
> identify post-mortem might not be very broadly accepted among the people
> who have thousands of machines to look after. Given enough machines and
> enough time, even exotic failures do occur. Do you have some thoughts
> on what easily-checked rules vinzvault will apply to ensure data
> consistency (and BTW integrity as well)?
As there is only one writer in every case, a simple lamport time stamp
would do the trick. The reason those full database systems you mention
have all that complicated consistency management is because they
potentially have multiple writers. With multiple writers, after a
failure during an update, a consistent view must be created out of the
mess. With one writer, the worst case that could happen is that *some*
of the blocks were not written. This happens with normal storage in a
block system all the time during failure and is well tolerated by
journal filesystems.
Regards
-steve
13 years, 10 months