Hi, all,

I'm applying for the GSOC project "Implement a Cassandra/NoSQL Connector or Translator for GlusterFS".
Since I have completed my GSOC proposal, I would like to post it here, any sugggestions will be welcome.

Here is my application in fedora project wiki:

https://fedoraproject.org/wiki/GSOC_2013/Student_Application_Jilinxpd

Here is my application with proposal in google-melange:

https://google-melange.appspot.com/gsoc/proposal/review/google/gsoc2013/jilinxpd/18001

Best regards,

Peidong

---------- Forwarded message ----------
From: Jilin Xpd <jilinxpd@gmail.com>
Date: 2013/4/25
Subject: Fwd: [GSoC] Implement a Cassandra/NoSQL Connector or Translator for GlusterFS
To: avati@redhat.com, Anand Babu Periasamy <abperiasamy@gmail.com>, johnmark@redhat.com
Cc: Buddhike Kurera <bckurera@fedoraproject.org>

Dear mentors,

I'm Peidong, the guy applying for the GSOC project "Implement a Cassandra/NoSQL Connector or Translator for GlusterFS".

I have finished my proposal, I hope you can help review it, thanks very much!

Here is my application in fedora project wiki:

https://fedoraproject.org/wiki/GSOC_2013/Student_Application_Jilinxpd

Here is my application with proposal in google-melange:

https://google-melange.appspot.com/gsoc/proposal/review/google/gsoc2013/jilinxpd/18001

Best Regards,

Peidong

---------- Forwarded message ----------
From: Jilin Xpd <jilinxpd@gmail.com>

Date: 2013/4/23
Subject: Fwd: [GSoC] Implement a Cassandra/NoSQL Connector or Translator for GlusterFS
To: avati@redhat.com, abperiasamy@gmail.com, johnmark@redhat.com
Cc: Buddhike Kurera <bckurera@fedoraproject.org>

Dear mentors,

I'm a student willing to apply for the GSOC project "Implement a Cassandra/NoSQL Connector or Translator for GlusterFS".

I have contacted with Mr Walker before, he hasn't reply yet.

As I'm now writing my proposal, I have some questions about this project.

Would you kindly help me solving my questions? Thanks very much!

My questions is as follows:

(1) As I understand it, the project is to write a storage translator for GlusterFS, so that GlusterFS can use Cassandra as its backend storage.

One of the benefits is that legacy applications which are incompatible with NoSQL can now store key-value pairs into Cassandra indirectly.

Am I right?

(2) Since the users will only store key-value pairs as a file into our system, they may not use directory, file attribute and extended file attribute, do we need to provide fops to support these features?

If we do, then as for the directory, I find it not very difficult to support it, since directory can map to the super column and column family in Cassandra.

That's all my questions. Thanks for your time!

I'm still designing and writing my proposal, I will post to your all as soon as I finish.

Best regards,

Peidong

---------- Forwarded message ----------
From: Jilin Xpd <jilinxpd@gmail.com>
Date: 2013/4/22
Subject: [GSoC] Implement a Cassandra/NoSQL Connector or Translator for GlusterFS
To: johnmark@redhat.com

Hi, Mr Walker,

I'm Peidong Xie, a third year master student from Institute of Software, Chinese Academy of Sciences.

Sorry to communicate with you so late, I want to express my interest in the idea "Implement a Cassandra/NoSQL Connector or Translator for GlusterFS ".

I have read the documents in the GlusterFS website, from where I got the knowledge of GlusterFS architecture and the way of writing translators.
Also, I roughly read the code of posix translator and bdb translator, and figured out the skeleton of a storage translator.

I noticed that GlusterFS had bdb as one of its storage backends, but it's obsoleted. To implement a Cassandra translator for Glusterfs, I think the bdb translator is a good reference.
Cassandra doesn't provide native interface for C, there is a C++ client (libQtCassandra) which involves 3rd party libraries, so I think it's better to use raw Thrift API in Glusterfs.

I have participated in some projects, most of my work is related with file system:

(1) In 2011, I together with another student, developed a shared fs based on FUSE, it's used to store libvirt checkpoint file and image file, then multiple VMs could read/write a checkpoint or image simultaneously. The key idea is parting the whole file into small blocks and cache them in memory, so that VMs could share the file blocks. COW is used to make sure a VM's write won't influence others.

(2) During last year's GSoC, I made the smbfs(CIFS client) in illumos support mmap. Firstly, I implemented mmap with block i/o, the main work it to implement the VFS interfaces, such as smbfs_mmap, smbfs_getpage, smbfsputpage. Secondly, I add page cache support to file i/o, mainly modified smbfs_read, smbfs_write. With mmap, smbfs could cache file in memory and reduce the i/o request over the wire, so the efficiency of i/o increases.

(3) In last year, I spent some time porting ecryptfs-utils to RedFlag Linux, making it work with ecryptfs, to support encrypted home directory.

Currently, I concentrate on the storage issues in big data. I have done study on some distributed systems such as hdfs, hbase, mongodb, cassandra, and storage engines such as bdb and leveldb.

I hope my project experience and background knowledge could help in "Implement a Cassandra/NoSQL Connector or Translator for GlusterFS ".
I haven't finished my proposal yet, I will finish it in one or two days.

Best regards,
Peidong