GSOC 2015/Student Application Ethcelon

Contact Information

Name: Om Bhallamudi
Email Address: om DOT bhallamudi AT gmail DOT com
IRC Nick: Ethcelon in freenode/oftc/mozilla
GitHub: https://github.com/Ethcelon
Blog URL: omb.svbtle.com
FAS Account: Ethcelon
Fedora userpage: https://fedoraproject.org/wiki/User:Ethcelon
Location: Visakhapatnam, India, UTC +5:30

About Me

I am a third year Undergraduate computer science engineering student at GITAM University, Vizag, India.
I program in Python and C/C++ and little in Java. I like building web apps with Flask and writing tests. Apart from this, I am also comfortable with Javascript, CSS and HTML.
Testing is something I've learnt to love while contributing to Firefox-ui-tests, while I discovered Flask while writing tests and database code for centinel-server.

Questions & Answers

Why do you want to work with the Fedora Project?

I find fedora to be a good community with really helpful people. I want to be a part of this and contribute to the Fedora Project.

Do you have any past involvement with the Fedora project or any other open source project as a contributor?

I contribute to the Mozilla automation team:

Porting Firefox Ui Tests to python:

https://github.com/mozilla/firefox-ui-tests/

Added a feature & wrote tests:

https://github.com/mozilla/mozdownload

Added a feature (using mozlog):

https://github.com/mozilla/mozmill-automation

I wrote code and tests for centinel-server:

Centinel is a tool used to detect network interference and internet censorship. Centinel-server is a flask app used to communicate with centinel nodes.

I have written tests and used sqlalchemy for database.

https://github.com/iclab/centinel-server

Libpynexmo: Python nexmo API

https://github.com/marcuz/libpynexmo

I also opensource my own code:

API server for android app written using Flask. (poll for updates)

https://github.com/Ethcelon/carnival

Extracting university results data for data analytics using Beautiful Soup:

https://github.com/Ethcelon/gitam-results-scraping

Did you participate with the past GSoC programs, if so which years, which organizations?

No I did not.

Will you continue contributing/ supporting the Fedora project after the GSoC 2015 program, if yes, which team(s), you are interested with?

Yes! I am interested in working with the fedora-infra team.

Why should we choose you over other applicants?

I have experience of writing Python and developing using Flask. I have used FLask to write apps for myself as well as contributed to exiting flask projects. I have written tests for Flask apps and also worked with databases using sqlalchemy. Apart from this I am also well versed in front end of HTML, CSS and Javascript.

Deployment of the shumgrepper is a major task of this project and I am confident I can do it as I have deployed flask apps with both apache & nginx before. I am a quick learner and I am sure I will learn how to write ansible configurations before the time for coding starts and finish the deployment successfully.

During my time at the University I have studied programming, data structures, networks, databases, software engineering and computer architecture to name a few things. With interest in becoming a developer I have taught myself Python and learnt Flask to make web applications. I have learnt python by building scripts for myself to solve my own problems. I am currently attempting to teach myself a lisp.

I am interested to work with the fedora-infra team after the GSoC.

Will you have any other time commitments, such as school work, another job, planned vacation, etc., during the duration of the program?

2 days of planned travel in June 2nd week. 3 days of examinations in july 3rd-4th week. My summer vacation starts in may and ends during the second week of June. Apart from the above mentioned days, I am ready to put in 35-40 hours a week towards completing my GSoC project.

Proposal

My project is to finish and deploy shumgrepper.

I have written some mockup code: https://github.com/Ethcelon/gsocprototype

Summary of idea: Finish and deploy the shumgrepper project

OUTLINE

shumgrepper was started last year and offers an API to query the data stored by summershum. This data corresponds to the md5, sha1, sh256 and sha512 of every files in every packages in Fedora, allowing to easily find out files duplicated in multiple packages.

My idea is to Finish and deploy the shumgrepper project. Along with this, I wish to write a tool that uses the shumgrepper API and will be used as part of the taskotron toolkit to find red flags in packages.

I have split my idea/process of completion into Steps:

1. Understand the codebase and learn the basics of ansible

2. Improving the existing codebase:

Split the views in api.py and init.py using flask blueprints

We currently have the views in 2 files, one is api.py and other is __init__.py . I would like to split these into two blueprints, and register the api on /api url prefix. The documentation on flask.pocoo.org suggests that: Blueprints can greatly simplify how large applications work and provide a central means for Flask extensions to register operations on applications.

After doing this, we have a scalable structure for the flask application

.
├── shumgrepper
│   ├── api
│   │   ├── __init__.py
│   │   └── views.py
│   ├── __init__.py
│   └── web
│       ├── __init__.py
│       ├── static
│       ├── templates
│       │   └── hello.html
│       └── views.py
├── README.md
├── run.py
└── tests.py

Testing:

This involves writing tests for the endpoints for all the views and the methods in utils. We can write tests for this flask app using unittest. We can also choose to use Flask-testing extension.

class myTest(unittest.TestCase):

    def setUp(self):
        self.app = app.test_client()

    def test_api_mock(self):
        url = '/api/packages'
        response = self.app.get(url)
        self.assertEquals(response.status, '200 OK')
        self.assertEquals(response.data, '["package1", "package2"]')

    def test_web_mock(self):
        url = '/hello'
        response = self.app.get(url)
        self.assertEquals(response.status, '200 OK')

if __name__ == '__main__':
    unittest.main()

Mockup: https://github.com/Ethcelon/gsocprototype

Documentation:

Write documentation for the views and utils.

3. I would like to add the following features to shumgrepper:

Adding comparison visualisation at /compare

Shumgrepper uses the database of summershum which has details of all the packages. The idea is to not only use shumgrepper to query using the sums but also visualise the structure of all the packages. We already have API methods to see difference among packages, it would be nice to have a good visualization too.

1. Have another way of viewing the filenames like a tree at /tarball/<name>/filenames

The visual can look like the linux tree command. This is more intuitive and shows the folder structure in a way that is easy to understand.

Put a [+] in front of the folder names to open and close folders. This will make it easier to see the structure and files when a package has many files.

2. Using shumgrepper we can compare two different versions to see the differences. I want to improve this by providing an option to providing visual cue using different colour such as green for unchanged and orange for changed. This can be done by having a page at /compare.

Essentially, comparison is done among 2 tarballs. We can have common files and different files.
When in the context of the same package, we compare 2 different versions.
- When a new version is released, we have changes in the files.
- We can show this 'git-diff-way' by having all the filenames from both the packages, and using colours to show modified files.
Otherwise, we are comparing two different tarballs.
So the comparison, can be done by having a visualisation of 3 things:
- Same sums. These are identical files.
- Same name but different sums. These are files which have been changed.
- Different sums & names.

3. Usage from /compare:

The selection can be done by having two lists to select from.

|packages||versions|  vs |packages||versions|
|        ||        |     |        ||        |

On clicking on a package name, the versions are displayed in the version list (options).
User selects the two tarballs to compare and clicks compare.

4. Usage from the /package/<package-name> page:

This page has a list of all versions of a package.
The addition will be an option to select two versions and click "compare".
This will redirect them to /compare, where they will see the comparison.

Give option to order the list of files at /tarball/<name>/filenames by name/date modified

A small tweak to add an option to order the list of file names at /tarball/<name>/filenames

4. Deployment

Shumgrepper is to be deployed using apache and mod_wsgi.

THis is again divided into these steps:

Package the dependencies for YUM, if they aren't already packaged.

note: We don't use pip to install dependencies

Using Ansible:
Ansible is used to automate IT tasks.
Ansible will be used to setup and manage the following:
- Server and it's configuration files (Apache)
- flask, fedmsg, summershum, and the other dependencies
  - Install dependencies using YUM.
  - For this we can use the package management module of ansible.
- Run shumgrepper

In this step I shall be writing the necessary .yml config files required by ansible, white taking cues from the ansible config repository of fedora.

3 Documentation

Finish the documentation on how to set up the stack

Tool for taskotron

Along with changes to shumgrepper, I want to write a tool that will be used as part of the taskotron toolkit.

Taskotron uses automated tasks which are triggered by fedmsg. I want to write a tool that uses the shumgrepper API to find reg flags in packages. This will put the database of summershum to good use in the taskotron toolkit.

Idea:

When a new package is sent for an update to taskotron, it will be usefull to check for the following:

1. Bundled dependencies that have nor been updated

2. Bundled source files that are known to cause problems.

3. Check for missing/non-existant/improper/old licences.

To do this, I want to write a tool that:

1. Will be called by taskotron slaves with the package as an argument.

2. Parses the package to find the things mentioned above and identifies them by querying shumgrepper's API

4. Gives results back, that are used by taskotron.

An example workflow of the tool:

task: to verify the LICENCE files in a package:

1. Tool is called by tasktron slave

2. Tool takes md5 of all the LICENCE files in the package

3. Now the following things can happen:

Find a LICENCE file that does not have an md5 that matches any known LICENCE. This is a red flag. It might be a modified/improper licence.
Find a LICENCE file that has a LICENCE that is known but is not supposed to be used. This is a red flag.
Find a LICENCE that matches the set of "good" licences.

Note: To be honest, I have discovered taskotron on 26th march. I still have to figure out how we can identify the bundled sources and build a database of files that can cause problems. I will work on this during the weeks ahead and get a good idea of what needs to be done.

Timeline (2015)

March, April & May:

Learn ansible and understand the current codebase.

25th May to 2nd June:

Refractor directory structure to support blueprints

Split shumgrepper into respective blueprints

Write documentation (continuous)

2nd June to 30th June:

Write tests

I can also use Flask.ext.testing to make this easier.

Add the new features

Finish documentation for the new features.

26th June:

Mid term evaluation

30th June to 10th July:

Write tests

Write ansible .yml files and setup server

Package python modules for installation with yum

Start deployment

Community feedback & make requested changes

26th July to August 17:

Write tool that uses shumgrepper API as a part of the taskotron toolchain.

Community feedback & make requested changes

Finish documentation

21st August:

Firm 'pencils down' date.

Next steps

Improve on the tool that used shumgrepper.

Find other potential usecases for shumgrepper.

Integrate features into other tools of fedota-infra and fedora-qa that use shumgrepper API

Search