No edit summary |
m (Formatted the page, updated proposal a little.) |
||
Line 1: | Line 1: | ||
[[Category:Summer coding 2015]] | [[Category:Summer coding 2015]] | ||
==Project Description== | |||
BTRFS is a new, actively developed file system with various advanced features. I wish to implement content-based-storage mode for btrfs file system. In fact, this project is also mentioned in the TODO-list of the BTRFS ideas page. | BTRFS is a new, actively developed file system with various advanced features. I wish to implement content-based-storage mode for btrfs file system. In fact, this project is also mentioned in the TODO-list of the BTRFS ideas page. | ||
Line 8: | Line 9: | ||
My research at CMU aims at building content-caches for routers https://github.com/harshadjs/xia-content-cache. It demands a file system that allows such a storage mode. I think it would be ideal for the interests of BTRFS community and the research at CMU if I could work on this project in the summer. | My research at CMU aims at building content-caches for routers https://github.com/harshadjs/xia-content-cache. It demands a file system that allows such a storage mode. I think it would be ideal for the interests of BTRFS community and the research at CMU if I could work on this project in the summer. | ||
==Biography and Technical Background== | |||
I am a Computer Science Graduate student at Carnegie Mellon University with research interest primarily in Computer Networks. I use Linux daily and am passionate about Open source software development. | I am a Computer Science Graduate student at Carnegie Mellon University with research interest primarily in Computer Networks. I use Linux daily and am passionate about Open source software development. | ||
Line 22: | Line 23: | ||
You can expect a very high level of fluency with C and Kernel programming from me. This is something that I love to do. | You can expect a very high level of fluency with C and Kernel programming from me. This is something that I love to do. | ||
==Goals== | |||
75% Goal | * '''75% Goal''' | ||
Create a new "Content" tree. This tree should store hashes of all the extents in the file system. | ** Create a new "Content" tree. This tree should store hashes of all the extents in the file system. | ||
Provide option to enable / disable content-storage-mode at mount-time or mkfs-time (TBD). | ** Create a "File Hash" tree. This tree should will store the mapping from hash of a file to its inode. | ||
Implement all the reference counting mechanisms for extents in this content-tree. | ** Provide option to enable / disable content-storage-mode at mount-time or mkfs-time (TBD). | ||
100% Goal | ** Implement all the reference counting mechanisms for extents in this content-tree. | ||
Intercept writes and check if the data that is being written is already in the content tree. | * '''100% Goal''' | ||
Enhance debugging methods available in btrfs (I am not sure which ones are available) to support debugging content-trees. | ** Intercept writes and check if the data that is being written is already in the content tree. | ||
125% Goal | ** Intercept reads | ||
Provide various mount-time configuration options, such as: | *** Given the hash of file, lookup inode for a file from "File Hash" tree. | ||
Remove or Don't remove extents if reference count becomes 0. (Especially useful for our routing application.) | ** Enhance debugging methods available in btrfs (I am not sure which ones are available) to support debugging content-trees. | ||
Verify or Trust the checksum of extents. | * '''125% Goal''' | ||
** Provide various mount-time configuration options, such as: | |||
** Remove or Don't remove extents if reference count becomes 0. (Especially useful for our routing application.) | |||
** Verify or Trust the checksum of extents. | |||
==Milestones of the Project== | |||
M1: Understand the design and code of Btrfs. Especially focus on how the current extent-trees, subvolume trees, snapshot trees are setup initially. Study on-disk data structures, most likely, we are going to need to add some bits in the super-block: For example "content-storage-mode-on/off". | * M1: Understand the design and code of Btrfs. Especially focus on how the current extent-trees, subvolume trees, snapshot trees are setup initially. Study on-disk data structures, most likely, we are going to need to add some bits in the super-block: For example "content-storage-mode-on/off". | ||
M2: Understand and identify the code areas wherein the hooks are to be applied. Need to find hooks for: | * M2: Understand and identify the code areas wherein the hooks are to be applied. Need to find hooks for: | ||
Intercepting writes | ** Intercepting writes | ||
Reading extents | ** Reading extents | ||
Debugging interfaces | ** Debugging interfaces | ||
M3: Write a detailed design draft which will talk about all the overall goal, required on-disk-changes, functions to be modified. Share the draft with BTRFS community and get their views. | * M3: Write a detailed design draft which will talk about all the overall goal, required on-disk-changes, functions to be modified. Share the draft with BTRFS community and get their views. | ||
M4: Implementation and testing of the code: 75% | * M4: Implementation and testing of the code: 75% | ||
M5: Implementation and testing of the code: 100% | * M5: Implementation and testing of the code: 100% | ||
M6: Implementation and testing of the code: 125% (If time permits) | * M6: Implementation and testing of the code: 125% (If time permits) | ||
M7: Write documentation of the final product | * M7: Write documentation of the final product | ||
==Plan of action== | |||
By the end of the week 1: M1, M2 | * By the end of the week 1: M1, M2 | ||
By the end of the week 2: M3 | * By the end of the week 2: M3 | ||
(Midterm) By the end of the week 5: M4 | * (Midterm) By the end of the week 5: M4 | ||
By the end of the week 7: M5 | * By the end of the week 7: M5 | ||
By the end of the week 9: M6 | * By the end of the week 9: M6 | ||
(End) By the end of the week 10: M7 | * (End) By the end of the week 10: M7 | ||
==Why choose me?== | |||
Past successful GSoC student (2011). | * Past successful GSoC student (2011). | ||
Past experience of working with the open source community. | * Past experience of working with the open source community. | ||
Strong understanding of file systems, C programming language, the UNIX philosophy, Linux. | * Strong understanding of file systems, C programming language, the UNIX philosophy, Linux. | ||
Passionate about contributing to Linux. | * Passionate about contributing to Linux. | ||
==Time commitment== | |||
Apart from this project, I have research commitment at CMU. So, I expect to spend at least 30 hrs / week on this project. My final exams end on 13th May 2015 and I hope to start right after that. I will be visiting my hometown (Pune, India) towards the May-End / June first week. That is the only time when I could be a little slacked. Rest of the summer, I will be on top of the project. | Apart from this project, I have research commitment at CMU. So, I expect to spend at least 30 hrs / week on this project. My final exams end on 13th May 2015 and I hope to start right after that. I will be visiting my hometown (Pune, India) towards the May-End / June first week. That is the only time when I could be a little slacked. Rest of the summer, I will be on top of the project. |
Latest revision as of 01:42, 11 April 2015
Project Description
BTRFS is a new, actively developed file system with various advanced features. I wish to implement content-based-storage mode for btrfs file system. In fact, this project is also mentioned in the TODO-list of the BTRFS ideas page.
In some applications, such as Internet content-caches, most often than not, the data is read-only. For such cases, the lookup time is the most important metric. It is very inefficient for such applications to store data in a conventional file-path based manner. In content-based-storage mode, the data is stored on the disk only on the basis of "hash" of its content. The lookup is also hash based - thus extremely quick. Another advantage of hash-based storage is that data duplication is not possible.
My research at CMU aims at building content-caches for routers https://github.com/harshadjs/xia-content-cache. It demands a file system that allows such a storage mode. I think it would be ideal for the interests of BTRFS community and the research at CMU if I could work on this project in the summer.
Biography and Technical Background
I am a Computer Science Graduate student at Carnegie Mellon University with research interest primarily in Computer Networks. I use Linux daily and am passionate about Open source software development.
In my undergraduate years, I worked on a open-source Linux kernel project "Snapshots for Ext4 filesystem". Patches were sent to the Ext4 community for review. I received a mention for the contribution to the project at http://lwn.net/Articles/442078/ .
We were interested in extend Ext4 snapshots project, and so I participated in Google Summer of Code 2011. My proposal for "Snapshot revert feature for Ext4" was accepted by The Fedora Project and I successfully completed the project back then. I look forward to continue my interest and be associated with the Fedora project by applying the proposal "Content-storage mode for BTRFS" for the year 2015.
I have worked for a Wi-Fi technology startup "AirTight Networks" for 3 years (2011-2014), where I was working in the Linux device drivers team.
I then joined Carnegie Mellon University in May 2014, where my main area of studies is Computer Networks.
You can expect a very high level of fluency with C and Kernel programming from me. This is something that I love to do.
Goals
- 75% Goal
- Create a new "Content" tree. This tree should store hashes of all the extents in the file system.
- Create a "File Hash" tree. This tree should will store the mapping from hash of a file to its inode.
- Provide option to enable / disable content-storage-mode at mount-time or mkfs-time (TBD).
- Implement all the reference counting mechanisms for extents in this content-tree.
- 100% Goal
- Intercept writes and check if the data that is being written is already in the content tree.
- Intercept reads
- Given the hash of file, lookup inode for a file from "File Hash" tree.
- Enhance debugging methods available in btrfs (I am not sure which ones are available) to support debugging content-trees.
- 125% Goal
- Provide various mount-time configuration options, such as:
- Remove or Don't remove extents if reference count becomes 0. (Especially useful for our routing application.)
- Verify or Trust the checksum of extents.
Milestones of the Project
- M1: Understand the design and code of Btrfs. Especially focus on how the current extent-trees, subvolume trees, snapshot trees are setup initially. Study on-disk data structures, most likely, we are going to need to add some bits in the super-block: For example "content-storage-mode-on/off".
- M2: Understand and identify the code areas wherein the hooks are to be applied. Need to find hooks for:
- Intercepting writes
- Reading extents
- Debugging interfaces
- M3: Write a detailed design draft which will talk about all the overall goal, required on-disk-changes, functions to be modified. Share the draft with BTRFS community and get their views.
- M4: Implementation and testing of the code: 75%
- M5: Implementation and testing of the code: 100%
- M6: Implementation and testing of the code: 125% (If time permits)
- M7: Write documentation of the final product
Plan of action
- By the end of the week 1: M1, M2
- By the end of the week 2: M3
- (Midterm) By the end of the week 5: M4
- By the end of the week 7: M5
- By the end of the week 9: M6
- (End) By the end of the week 10: M7
Why choose me?
- Past successful GSoC student (2011).
- Past experience of working with the open source community.
- Strong understanding of file systems, C programming language, the UNIX philosophy, Linux.
- Passionate about contributing to Linux.
Time commitment
Apart from this project, I have research commitment at CMU. So, I expect to spend at least 30 hrs / week on this project. My final exams end on 13th May 2015 and I hope to start right after that. I will be visiting my hometown (Pune, India) towards the May-End / June first week. That is the only time when I could be a little slacked. Rest of the summer, I will be on top of the project.