From Fedora Project Wiki

Project Title : Shumgrepper

Contact Information

About Me

  • I’m a Junior student at Indian Institute of Information Technology, Allahabad majoring in Information Technology.
  • I have worked on various projects in C/C++, Java and Python. I am also comfortable with Javascript, PHP, CSS and HTML and several python web development frameworks like flask, Django.
  • I have been using Open Source Softwares from many years and always interested in working for the community. So, initially I started with localization and community building for Mozilla foundation. Later I got selected in Outreach Program for Women internship(Round 7) for Fedora project Datagrepper.


Why do you want to work with the Fedora Project?

Fedora is my favorite Linux distro because I love its user friendly interface, support forums and best suited for my web application development environment. Recently I have worked on fedora-infra project for three months and I must say "it was an awesome experience". I also enjoy fedora community a lot, find everyone helping and always encouraging. Till now, I have contributed to only one project and I am looking forward to work more, contribute more and support more community projects.

Do you have any past involvement with the Fedora project or another open source project as a contributor?

Yes, I have contributed to Datagrepper project and Fedora-Packages.

Why should we choose you over other applicants?

I have 3 years experience of coding in Python language which is basic required skill of the project. Project Shumgrepper will integrate most of the features of project Datagrepper. I have already worked on Datagrepper project and have better understanding of it. This project will involve developing the web-front end using python flask frameworks and I am already familiar with it. This makes me a strong candidate for this project.

Did you participate with the past GSoC programs, if so which years, which organizations?

No, I am applying to GSoC for the first time.

Will you continue contributing/ supporting the Fedora project after the GSoC 2013 program, if yes, which team(s), you are interested with?

I would love to work more with Fedora-Infra team because projects are really cool and they completely intersect my line of interest. I was working on fedora badges for some times and this project will be my next checkpoint for contribution after gsoc.

Will you have any other time commitments, such as school work, another job, planned vacation, etc., during the duration of the program?

In early May our summer vacation of college will start and ends by late of July; I can give my full time commitment to this project,. I assure dedication of at least 40 hours per week to the work and that I do not have any other obligations from early May till mid August.

Past Experience

I am Firefox Student Ambassador of my college and is involved in raising awareness about Open Source Software, FOSS organisation, Mozilla and its various products. I have always been involved in inspiring friends and juniors to contribute to open source projects. I have also given a talk in my institute to encourage people to apply for 'GSoc and OPW'. Besides this I have also organized many other coding events.

Apart from this, I am always interested to get more and more women involved in free and open source software.

OPW internship: Fedora-Datagrepper

  • Worked on the front-end of Datagrepper i.e. made html cards, improvement in CSS.1
  • Added an odometer on the front-page that displays the total number of messages. 2
  • Integrated datagrepper-messages into fedora-packages. 3
  • Made a Mediawiki extension that enables user to add datagrepper-messages on his/her fedora wiki user page.4

Link to pull requests:

Blog: http://honeycoding.wordpress.com

Goal

Shumgrepper is a web app built on the top of Summershum. Summershum is a project that collects md5sum, sha1sum and sha521sum of every file present in every package. This can be used to check how many packages have the full GPL license, how many files contains a particular hash sum in all Fedora or to check the database in taskotron test, etc.

It will bring up lot of new possibilities to developers, system administers and commiters by allowing them to verify the integrity and authenticity of a tarball, package or a individual source file. Not only this, it will help them in identifying changes made in packages and bundles by comparing check-sums.

I will develop JSON/API along with Web front-end for the data generated by summershum. It will integrate most of the features of datagrepper along with user friendly real time visualization tool that will show ongoing changes in package hashes. The final web-app developed will be vulnerable to various attack like DDos, So I will ensure security against it by taking appropriate security preventive measures.

Project Details

The project will mainly be divided into 5 phases.

Phase 1: Query building for Database

In this phase of development I will add methods that will query summershum database to get required data.

  • Adding several arguments to request data from summershum database accordingly.

-- One of the filtering arguments can be ‘filename’ which takes filename as value.
-- Some other arguments can be ‘pkg_name’ (takes package_name as value), tar_file, tar_sum, date,md5sum, sha256sum, sha1sum.<br\> e.g. query for a file can be done as <br\>

     
http get https://apps.fedoraproject.org/shumgrepper?pkg_name=java-1.8.0-openjdk
&filename=/jdk8/jdk/test/java/rmi/reliability/juicer/OrangeImpl.java

The above query will return output in json form. Sample output will be:

{"sha265sum":36828180b4cc683a13c2192fa0cf20aad0cdbc6ef194a40f6d20047b68a02c1c,"sha1sum":
00014cdb3441f22fc1bcece2f94f29c29ab77fb2,"md5sum":5aaa3caaf74ff0dbe00d85e113253480,"tar_file":"aarch64-port-rc4.tar.xz",
"tar_sum":8f359f63b6b09ecfaa40259427f133ad,"created_on":2014-03-08 21:36:14.421454}
  • Add methods in summershum that will return data according to the arguments in query.
  • Format returned data in human and machine readable output like csv, json.

Phase 2: Web API Wrapper of the app

The development in this part is involved with defining the web API wrapper which involves:

  • Defining the directory structure of the app.
  • Enabling responses as 'application/json' or 'text/html' as per accept header.
  • Defining various end-points of the app.

The implementation will be done using Flask framework as flask implements a WSGI application that allows you to easily create RESTful web services and make GET(as shown in phase 1) and POST request.

Sample directory structure of the app:

 shumgrepper\
      shumgrepper\
        static\                               # contains web app images, css and javascript files. 
                                              # Files within it are accessible to users via HTTP.
            css\
                views.css
        templates\                            # store app's web templates 
            index.html
        docs
        __init__.py
        views.py                               # contains definition for various end-points.
        util.py
      summershum.models\                       # files containing methods that will query summershum
      fedmsg.d\                                # display fedmsg messages in human readable format
      run.py                                   # used to run the server. 
      setup.py

Flask has a great similarity with MVC architecture as it contains view (templates), controller (views.py) and the model (summershum).

Phase 3: Web-Frontend

In this part of development my aim is to build user-friendly web interface. For this, I will use Jinja2 template library to build the web-front end for end-points defined in the earlier phase. Apart from this I will also design and build the front page of the app.

As discussed earlier the website is suspected to various attack. One of the prominent can be DDoS attack. In order to prevent it, it is important to distinguish between legitimate website visitors and automated or malicious clients.

The above mentioned strategies can be implemented in the following steps:

  • Designing the front-end of the page and deciding what information to be displayed.
  • Converting json data into human readable strings using fedmsg.meta functions.
  • Improving styling(css) of the page as per design.
  • Developing front page of the app. I have hosted a sample front page on divshot
  • Front page may contain an odometer for count of different GPL Licence.
  • Graph that will show the changes in hashes for files of a package(files of a package v/s number of times it is changed).

Figure 1: This is a sample graph showing the frequency of changes made in hashes of the file.
y-axis: file-name
x-axis: time

Phase 4: Deployment

In this phase i will deploy the package Shumgrepper in fedora production environment. This will enable users to install it via yum.

It involves the following steps:

  • Packaging of Shumgrepper as rpm
  • Deploy the rpm file in fedora staging infrastructure to check if it works there.
  • Pushing everything out to the fedora production environment.
  • Documentation of the work done till now.

Phase 5: Integration

It involves integrating the app on other applications and testing it

  • The graph shown in Figure 1 can be integrated in Fedora Packages in the description of each package.
  • It can also be integrated in koji on the information page of each package.
  • If time permits, i would try to integrate and test the app on other applications as well.

Deriverables

  • JSON/API of summershum
  • Web-front end of summershum
  • Deployment of the app
  • Manual and Documentation
  • Testing and integrating the app on other applications

Timeline

Period Task
April 22 - May 5 Community bonding, reading documentation and getting familiar with all the codes.
May 6 - May 18 Designing Phase - Work on Designing of the entire application.
May 19 Official GSoC coding period begins.
May 19 - June 9 (3 weeks) This time would be utilized for Phase I(Query building for Database).
June 10 - June 14 (2 weeks) Working on Phase II(Web API Wrapper of the app) of the project.
June 15 - June 23 (9 days) Begin working on Phase III(Web-Front end).
June 24 - June 27 Mid term evaluation period.
June 28 - July 3 (1 week) Continue Phase III(Web-Front end) which involve improving the front page and embedding security features.
July 4 - July 10 (1 week) Working on Phase IV which involves deployment of the app.
July 11 - August 4 (3 weeks) Working on Phase V which involves integration of the app in other applications and documentation.
August 5 - August 12 (1 week) Final phase of the project i.e. cleaning codes, documenting everything, reviewing all the functionalities

and fixing bugs.

August 12 - August 18 Pencils down period. Submitting the project for final evaluation.