Encryption

Thursday, 24 February 2011

There are three basic issues that need to be addressed in the encryption 
module: type of cipher used, initialization-vector handling, and 
conflict management.  Each is non-trivial, so I'll address them in turn.

= Cipher

The main factor affecting our choice of ciphers (or APIs to them) is 
that we need to be able to deal efficiently with updates both in the 
middle of the file and at the end.  At EOF, the problem is that we need 
a whole cipher-block in order to decrypt, but the file might actually 
end at any byte boundary within that cipher-block.  Therefore, we have 
to deal with the "residue" somehow.

* Store the residue in an xattr.

* Store a whole cipher-block at the end, record the amount of padding in 
an xattr.

* Use a stream cipher (or block cipher converted to a stream cipher).

This problem is further compounded by the striping case, where EOF for a 
stripe component (local file stored on one brick) might not be EOF for 
the entire file (union of all stripe components).

Since the two xattr-based approaches both require extra calls, the 
stream-cipher approach has been used, with the cipher resetting at block 
(e.g. 4KB) boundaries to allow efficient middle-of-file updates.  As it 
turns out, pure stream ciphers are relatively uncommon.  More often, 
CFB/OFB/CTR methods are used to convert a block cipher into a stream 
cipher.  The OpenSSL documentation is *amazingly* bad, but it looks like 
it should be pretty easy to use any of these techniques with AES as well 
as with DES.

= Initialization vector

Right now, the code uses a constant IV, which is totally unacceptable 
from a security standpoint and was always meant to be changed before 
release.  The question is: what should we use for an IV?  GlusterFS does 
attach a supposedly unique "gfid" as an xattr on each file, so that 
might be usable as a basis for the IV so long as we can verify that it's 
universal and stable enough to be sure that data won't become 
unrecoverable because a gfid is missing or changed.

= Conflict management

For partial-block writes, the encryption module needs to do the 
following atomically.

* Read the current block contents.

* Decrypt.

* Overlay the new partial block on the old whole block.

* Encrypt.

* Write the entire block.

There's some additional complexity to do with EOF, but that's the basic 
idea.  The current code eschews locks in favor of "optimistic" 
concurrency control in which a server-side "oplock" translator maintains 
a generation number for each inode.  Clients can start a "transaction" 
before they read, associating the current inode generation with their 
connection.  The next write on that connection will compare the stored 
generation number vs. the current one.  If they're not the same, that 
means there was another write since the transaction started, and the 
write is rejected so the client can start over.  Unfortunately, this 
does not account for "self conflicts" when one client sends multiple 
writes to the same file in parallel.  The standard 
performance/write-behind translator does this constantly, which is why 
it has to be disabled when using cloudfs encryption, and there are many 
other ways for it to happen.

My first inclination would be to add client code which detects and 
avoids such self-conflict, but I have a sneaking suspicion that will be 
pretty complex and have to be tweaked a lot to avoid compromising 
performance.  I kind of suspect that server-side queuing might be the 
right answer here.  If a transaction is begun which conflicts with 
another already in progress, then the new one is simply queued behind 
the old one and the transaction-begin call (actually a special setxattr) 
will be resumed when the old ones complete.  This also addresses 
fairness/forward-progress issues inherent in both the locking and retry 
models, though we'll need to put some thought into recovery from faults.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Encryption