From Fedora Project Wiki

Project Description

BTRFS is a new, actively developed file system with various advanced features. I wish to implement content-based-storage mode for btrfs file system. In fact, this project is also mentioned in the TODO-list of the BTRFS ideas page.

In some applications, such as Internet content-caches, most often than not, the data is read-only. For such cases, the lookup time is the most important metric. It is very inefficient for such applications to store data in a conventional file-path based manner. In content-based-storage mode, the data is stored on the disk only on the basis of "hash" of its content. The lookup is also hash based - thus extremely quick. Another advantage of hash-based storage is that data duplication is not possible.

My research at CMU aims at building content-caches for routers https://github.com/harshadjs/xia-content-cache. It demands a file system that allows such a storage mode. I think it would be ideal for the interests of BTRFS community and the research at CMU if I could work on this project in the summer.

Biography and Technical Background:

I am a Computer Science Graduate student at Carnegie Mellon University with research interest primarily in Computer Networks. I use Linux daily and am passionate about Open source software development.

In my undergraduate years, I worked on a open-source Linux kernel project "Snapshots for Ext4 filesystem". Patches were sent to the Ext4 community for review. I received a mention for the contribution to the project at http://lwn.net/Articles/442078/ .

We were interested in extend Ext4 snapshots project, and so I participated in Google Summer of Code 2011. My proposal for "Snapshot revert feature for Ext4" was accepted by The Fedora Project and I successfully completed the project back then. I look forward to continue my interest and be associated with the Fedora project by applying the proposal "Content-storage mode for BTRFS" for the year 2015.

I have worked for a Wi-Fi technology startup "AirTight Networks" for 3 years (2011-2014), where I was working in the Linux device drivers team.

I then joined Carnegie Mellon University in May 2014, where my main area of studies is Computer Networks.

You can expect a very high level of fluency with C and Kernel programming from me. This is something that I love to do.

Goals

75% Goal Create a new "Content" tree. This tree should store hashes of all the extents in the file system. Provide option to enable / disable content-storage-mode at mount-time or mkfs-time (TBD). Implement all the reference counting mechanisms for extents in this content-tree. 100% Goal Intercept writes and check if the data that is being written is already in the content tree. Enhance debugging methods available in btrfs (I am not sure which ones are available) to support debugging content-trees. 125% Goal Provide various mount-time configuration options, such as: Remove or Don't remove extents if reference count becomes 0. (Especially useful for our routing application.) Verify or Trust the checksum of extents.

Milestones of the Project:

M1: Understand the design and code of Btrfs. Especially focus on how the current extent-trees, subvolume trees, snapshot trees are setup initially. Study on-disk data structures, most likely, we are going to need to add some bits in the super-block: For example "content-storage-mode-on/off". M2: Understand and identify the code areas wherein the hooks are to be applied. Need to find hooks for: Intercepting writes Reading extents Debugging interfaces M3: Write a detailed design draft which will talk about all the overall goal, required on-disk-changes, functions to be modified. Share the draft with BTRFS community and get their views. M4: Implementation and testing of the code: 75% M5: Implementation and testing of the code: 100% M6: Implementation and testing of the code: 125% (If time permits) M7: Write documentation of the final product

Plan of action

By the end of the week 1: M1, M2 By the end of the week 2: M3 (Midterm) By the end of the week 5: M4 By the end of the week 7: M5 By the end of the week 9: M6 (End) By the end of the week 10: M7

Why choose me?

Past successful GSoC student (2011). Past experience of working with the open source community. Strong understanding of file systems, C programming language, the UNIX philosophy, Linux. Passionate about contributing to Linux.

Time commitment

Apart from this project, I have research commitment at CMU. So, I expect to spend at least 30 hrs / week on this project. My final exams end on 13th May 2015 and I hope to start right after that. I will be visiting my hometown (Pune, India) towards the May-End / June first week. That is the only time when I could be a little slacked. Rest of the summer, I will be on top of the project.