From Fedora Project Wiki
No edit summary
No edit summary
 
Line 1: Line 1:
<p class="MsoNormal" align="left"><strong><span style="text-decoration: underline;"><span style="font-size: 10pt; font-family: Arial, sans-serif;" lang="EN-US">Contact Information</span></span></strong></p>
<p class="MsoNormal" align="left"><span style="font-size: 9pt; font-family: Verdana, sans-serif;" lang="EN-US">1.<span style="font-size: 7pt; font-family: 'Times New Roman';">&nbsp;&nbsp;&nbsp; </span></span><span style="font-size: 10pt; font-family: Arial, sans-serif;" lang="EN-US">Email Address: </span><span style="font-size: 9pt; font-family: Verdana, sans-serif;" lang="EN-US">jilinxpd@gmail.com</span></p>
<p class="MsoNormal" align="left"><span style="font-size: 9pt; font-family: Verdana, sans-serif;" lang="EN-US">2.<span style="font-size: 7pt; font-family: 'Times New Roman';">&nbsp;&nbsp;&nbsp; </span></span><span style="font-size: 10pt; font-family: Arial, sans-serif;" lang="EN-US">Telephone: </span><span style="font-size: 9pt; font-family: Verdana, sans-serif;" lang="EN-US">+8615201294712</span></p>
<p class="MsoNormal" align="left"><span style="font-size: 9pt; font-family: Verdana, sans-serif;" lang="EN-US">3.<span style="font-size: 7pt; font-family: 'Times New Roman';">&nbsp;&nbsp;&nbsp; </span></span><span style="font-size: 10pt; font-family: Arial, sans-serif;" lang="EN-US">Blog URL: </span><span style="font-size: 9pt; font-family: Verdana, sans-serif;" lang="EN-US">http://www.cnblogs.com/zszmhd</span></p>
<p class="MsoNormal" align="left"><span style="font-size: 9pt; font-family: Verdana, sans-serif;" lang="EN-US">4.<span style="font-size: 7pt; font-family: 'Times New Roman';">&nbsp;&nbsp;&nbsp; </span></span><span style="font-size: 10pt; font-family: Arial, sans-serif;" lang="EN-US">Freenode IRC Nick: </span><span style="font-size: 9pt; font-family: Verdana, sans-serif;" lang="EN-US">jilinxpd</span></p>
<p class="MsoNormal" align="left"><span style="font-size: 10pt; font-family: Arial, sans-serif;" lang="EN-US">NOTE: We require all students to blog about the progress of their project. You are strongly encouraged to register on the Freenode network and participate in our IRC channels. For more information and other instructions, see: http://groups.google.com/group/redhat-summer/web/gsoc-getting-started</span></p>
<p class="MsoNormal" align="left"><span style="font-size: 9pt; font-family: Verdana, sans-serif;" lang="EN-US">1.<span style="font-size: 7pt; font-family: 'Times New Roman';">&nbsp;&nbsp;&nbsp; </span></span><strong><span style="font-size: 10pt; font-family: Arial, sans-serif;" lang="EN-US">Why do you want to work with the Fedora Project?</span></strong></p>
<p class="MsoNormal" style="margin-left: 36pt;" align="left"><span style="font-size: 9pt; font-family: Verdana, sans-serif;" lang="EN-US">Fedora is my favorite distribution of Linux. I have been using it since 2010. I think it's helpful. Now I'm doing some research on hadoop, which is running on Fedora 16. I want to participate in the Fedora Project, I will be proud if I could make little changes to it to make it better.</span></p>
<p class="MsoNormal" align="left"><span style="font-size: 9pt; font-family: Verdana, sans-serif;" lang="EN-US">2.<span style="font-size: 7pt; font-family: 'Times New Roman';">&nbsp;&nbsp;&nbsp; </span></span><strong><span style="font-size: 10pt; font-family: Arial, sans-serif;" lang="EN-US">Do you have any past involvement with the Fedora project or another open source project as a contributor?</span></strong></p>
<p class="MsoNormal" style="margin-left: 36pt;" align="left"><span style="font-size: 9pt; font-family: Verdana, sans-serif;" lang="EN-US">I participated in the illumos Project during the GSoC of last year. My work is to implement mmap support for smbfs(CIFS client) in illumos. (http://cr.illumos.org/~webrev/jilinxpd/)</span></p>
<p class="MsoNormal" align="left"><span style="font-size: 9pt; font-family: Verdana, sans-serif;" lang="EN-US">3.<span style="font-size: 7pt; font-family: 'Times New Roman';">&nbsp;&nbsp;&nbsp; </span></span><strong><span style="font-size: 10pt; font-family: Arial, sans-serif;" lang="EN-US">Did you participate with the past GSoC programs, if so which years, which organizations?</span></strong></p>
<p class="MsoNormal" style="margin-left: 36pt;" align="left"><span style="font-size: 9pt; font-family: Verdana, sans-serif;" lang="EN-US">Yes, 2012 with illumos project. (http://www.google-melange.com/gsoc/project/google/gsoc2012/jilinxpd/13001)</span></p>
<p class="MsoNormal" align="left"><span style="font-size: 9pt; font-family: Verdana, sans-serif;" lang="EN-US">4.<span style="font-size: 7pt; font-family: 'Times New Roman';">&nbsp;&nbsp;&nbsp; </span></span><strong><span style="font-size: 10pt; font-family: Arial, sans-serif;" lang="EN-US">Will you continue contributing/ supporting the Fedora project after the GSoC 2012 program, if yes, which team(s), you are interested with?</span></strong></p>
<p class="MsoNormal" style="margin-left: 36pt;" align="left"><span style="font-size: 9pt; font-family: Verdana, sans-serif;" lang="EN-US">Yes, I will continue contributing the Fedora project. I'm interested in file system and nosql, so I choose the GSOC idea "Implement a Cassandra/NoSQL Connector or Translator for GlusterFS". I think the Cloud SIG and NoSQL SIG may fit me.</span></p>
<p class="MsoNormal" align="left"><span style="font-size: 9pt; font-family: Verdana, sans-serif;" lang="EN-US">5.<span style="font-size: 7pt; font-family: 'Times New Roman';">&nbsp;&nbsp;&nbsp; </span></span><strong><span style="font-size: 10pt; font-family: Arial, sans-serif;" lang="EN-US">Why should we choose you over other applicants?</span></strong></p>
<p class="MsoNormal" style="margin-left: 36pt;" align="left"><span style="font-size: 9pt; font-family: Verdana, sans-serif;" lang="EN-US">(1) I major in the Storage, I have study the linux kernel, file system, distributed system and nosql for several years, I have the required fundamental knowledge.</span></p>
<p class="MsoNormal" style="margin-left: 36pt;" align="left"><span style="font-size: 9pt; font-family: Verdana, sans-serif;" lang="EN-US">(2) I have participated in several projects related with the file system, I know how the file system behaviors.</span></p>
<p class="MsoNormal" style="margin-left: 36pt;" align="left"><span style="font-size: 9pt; font-family: Verdana, sans-serif;" lang="EN-US">(3) I have experience participating in the Open Source Project. I'm familiar with the procedure, such as setting up development environment, coding, debugging, committing and requesting for reviews, etc.</span></p>
<p class="MsoNormal" style="margin-left: 36pt;" align="left"><span style="font-size: 9pt; font-family: Verdana, sans-serif;" lang="EN-US">(4) I'm a hard-working guy and I'm going to try my best to do the things I love.</span></p>
<p class="MsoNormal" align="left"><strong><span style="text-decoration: underline;"><span style="font-size: 10pt; font-family: Arial, sans-serif;" lang="EN-US">Proposal Description</span></span></strong></p>
<p class="MsoNormal" align="left"><strong><span style="text-decoration: underline;"><span style="font-size: 10pt; font-family: Arial, sans-serif;" lang="EN-US">Proposal Description</span></span></strong></p>
<p class="MsoNormal" align="left"><span style="font-size: 10pt; font-family: Arial, sans-serif;" lang="EN-US">Please describe your proposal in detail. Include:</span></p>
<p class="MsoNormal" align="left"><span style="font-size: 10pt; font-family: Arial, sans-serif;" lang="EN-US">Please describe your proposal in detail. Include:</span></p>
Line 76: Line 57:
<p><span style="font-family: Arial, 'Helvetica Neue', Helvetica, sans-serif; font-size: 13px; line-height: 19px;"><strong>Have you communicated with a potential mentor? If so, who?</strong></span></p>
<p><span style="font-family: Arial, 'Helvetica Neue', Helvetica, sans-serif; font-size: 13px; line-height: 19px;"><strong>Have you communicated with a potential mentor? If so, who?</strong></span></p>
<p><span style="font-family: Arial, 'Helvetica Neue', Helvetica, sans-serif; font-size: 13px; line-height: 19px;"><strong><br /></strong></span></p>
<p><span style="font-family: Arial, 'Helvetica Neue', Helvetica, sans-serif; font-size: 13px; line-height: 19px;"><strong><br /></strong></span></p>
<p><strong style="font-family: Arial, 'Helvetica Neue', Helvetica, sans-serif; font-size: 13px; line-height: 19px;">Note:&nbsp;</strong><span style="font-family: Arial, 'Helvetica Neue', Helvetica, sans-serif; font-size: small;"><span style="line-height: 19px;">Make sure you have completed following task to get qualified,&nbsp;failing&nbsp;to complete any task will results in rejecting your application.</span></span></p>
<pre>x You have subscribed with the summer-coding mailing list
x Create a FAS account
x Your application is available on the Fedora project wiki
x Your application is submitted to google-melange</pre>

Latest revision as of 05:49, 3 May 2013

Proposal Description

Please describe your proposal in detail. Include:

  •  An overview of your proposal

I will implement a new storage translator for GlusterFS. With this translator, GlusterFS can use Cassandra as its backend storage.The translator can be partitioned into two layers, one is the translating layer which provides POSIX File API, and the other is the connecting layer which interacts with Cassandra. Since the user only want to store key-value pair through POSIX File API, so GlusterFS only need support Regular File and Directory, excluding symlink, file attribute, etc. The File is used to represent a key-value pair from the user, and then mapped to a row in Cassandra. The Directory is used to represent a database or a table from the user, and then mapped to a keyspace or a column family in Cassandra.

  • The need you believe it fulfills

The legacy applications are coded many years ago when NoSQL didn’t exist. They can only access data through POSIX File API, even if they want to store key-value data which are very common nowadays, they have to store them in general-purpose file systems. As we know, the general-purpose file system is not the best place to store key-value data, while the key-value store is. In order to allow the legacy applications to store key-value data into the KV store, we need to add an intermediate layer between the legacy applications and the KV store. The intermediate layer translates POSIX File API into Key-value API, that’s what GlusterFS does. GlusterFS can’t do this right now, so I will implement a new NoSQL translator for GlusterFS.

  • Any relevant experience you have

1.      I have the required fundamental knowledge.

(1)    I major in the Storage, I have study the linux kernel especially the file system for several years.

(2)    Now I concentrate on the storage issues in Big Data. I have done study and research on some distributed systems such as hdfs, hbase, mongodb, cassandra, dynamo, and storage engines such as bdb and leveldb.

2.      I have participated in several projects related with file system:

(1)    In 2011, I with another student developed a shared file system based on FUSE, it's used to store libvirt checkpoint file and image file, and then multiple VMs could read/write a checkpoint or image simultaneously. The key idea is parting the whole file into small blocks and caching them in memory, so that VMs could share the file blocks. COW is used to make sure a VM's write won’t influence others.

(2)    In 2012, as a GSOC project, I made the smbfs(CIFS client) in illumos support mmap. Firstly, I implemented mmap with block i/o, the main work it to implement the VFS interfaces, such as smbfs_mmap, smbfs_getpage, smbfsputpage. Secondly, I add page cache support to file i/o, mainly modified smbfs_read, smbfs_write. With mmap, smbfs could cache file in memory and reduce the i/o request over the wire, so the efficiency of i/o increases.

(3)    Also in 2012, I spent some time porting ecryptfs-utils to RedFlag Linux, making it work with ecryptfs, to support encrypted home directory.

  • How you intend to implement your proposal

Principle

What the legacy applications need is to store key-value data through POSIX File API,  so I think that the legacy applications will only require GlusterFS to support Regular File and Directory, they don’t need symlink, file attribute, file extended attribute.

The File is used to represent a key-value pair from the user, and then mapped to a row in Cassandra. There are at least 2 columns in a row: file_name and file_content.

key-value <---> file <---> row

The Directory is used to represent a database or a table from the user, and then mapped to a keyspace or a column family in Cassandra.

database <---> directory <---> keyspace

table <---> sub-directory <---> column family

To put a key-value pair, the user need to make three File API calls in order: create() with the key as the file name, write() with the value as the file content, close(). Write-Behind is used here. Until the file is closed, the translator will then make a RPC to Cassandra to insert the key-value pair.

To get the value associated with a key, the user also make three calls: open() with the key as the file name, read(), close(). Read-Ahead is used here. The value will be gotten from Cassandra at the first read() and be cached in GlusterFS.

 

Architecture

Two systems involve in my project, the GlusterFS and the Cassandra Key-value Store. Cassandra acts as the backend storage for GlusterFS. To make it come true, I will need to develop a storage translator for GlusterFS to connect GlusterFS to Cassandra.

The GlusterFS with the translator acts as an intermediate layer between the legacy applications and the Cassandra Key-value store. So the translator actually does two things, one is to provides the POSIX File API, I call it translating layer, the other is to connects to Cassandra, I call it connecting layer.

translator(translating layer, connecting layer) <---> Cassandra

The translating layer implements the necessary File API, translates the File API to the Key-value API that provided by the connecting layer. The connecting layer is actually a Cassandra client. It interacts with the Cassandra server using Thrift API.

 

  • A rough timeline for your progress

Preparation period:

1.      ~May 21. Reconsider the design, especially how to map key-value pair to file, and how to map file/directory to Cassandra terminology.

2.      ~May 31. Continue to learn Glusterfs internal, how translator works, how storage translator communicates with backend storage. At the same time, set up Glusterfs and use it.

3.      ~June 6. Write a simplest translator, compile, debug and run it.

4.      ~June 13. Learn Thrift API of Cassandra. Try to write some programs to interact with Cassandra server.

5.      ~June 16. Setup development and test environment, such as text editor, debug tools, code cross reference, code repository, virtual machines to install GlusterFS and Cassandra.

Coding Period:

6.      ~July 25. Implement a simple storage translator, including a simple translating layer and a simple connecting layer:

The translating layer only supports operations on regular file: create, open, read, write, close, lookup, unlink, etc.

The connecting layer handle open, close, put, get, delete operations on Cassandra.

7.      ~August 1. Write test suites and test the simple storage translator.

8.      ~September 14. Add directory operations to the translating layer, such as mkdir, remdir, opendir, readdir, etc. Add batch or range operations to the connecting layer, such as batch_insert, multi_get, etc.

9.      ~September 26. Write tests and documents.


Have you communicated with a potential mentor? If so, who?