No edit summary |
No edit summary |
||
(26 intermediate revisions by the same user not shown) | |||
Line 16: | Line 16: | ||
;Why do you want to work with the Fedora Project? | ;Why do you want to work with the Fedora Project? | ||
I have worked on various fedora projects and had a great experience while working on them. I already had participated in GSoC last year and worked on the same project Shumgrepper and this year I would like to bring this project to its completion. Besides all this, fedora is my favorite linux distro and it gives me immense pleasure in contributing to its projects. | I have worked on various fedora projects and had a great experience while working on them. I already had participated in GSoC last year and worked on the same project Shumgrepper and this year I would like to bring this project to its completion. Besides all this, fedora is my favorite linux distro and it gives me immense pleasure in contributing to its projects. I also met with few fedora contributors this year in a conference and I must say I had a great time with them and learned many new stuff. They are energetic folks who loves what they are doing and this inspires me a lot. | ||
;Do you have any past involvement with the Fedora project or another open source project as a contributor? | ;Do you have any past involvement with the Fedora project or another open source project as a contributor? | ||
Yes, I have contributed to [https://github.com/fedora-infra/datagrepper/ Datagrepper] | Yes, I have mainly contributed to | ||
* [https://github.com/fedora-infra/datagrepper/ Datagrepper] I mainly contributed to built the web-frontend of the app | |||
* [https://github.com/fedora-infra/fedora-packages Fedora-Packages] I integrated datagrepper package feed on the package description page, | |||
* [https://github.com/fedora-infra/shumgrepper/ Shumgrepper] I wrote this app from start, setting up the the flask app, built api and web frontend for it, deploying its dev instance and | |||
* [https://github.com/fedora-infra/summershum/ Summershum] I did database modification, added various sqlalchemy queries as per required in shumgrepper. | |||
;Did you participate with the past GSoC programs, if so which years, which organizations? | ;Did you participate with the past GSoC programs, if so which years, which organizations? | ||
Yes, I participated last year i.e. in year 2014 with Fedora | Yes, I participated last year i.e. in year 2014 and worked with Fedora organisation. | ||
;Will you continue contributing/ supporting the Fedora project after the GSoC 2015 program, if yes, which team(s), you are interested with? | ;Will you continue contributing/ supporting the Fedora project after the GSoC 2015 program, if yes, which team(s), you are interested with? | ||
Yes, I will keep contributing to the Fedora Project in my spare time even after the GSoC 2015 program. I would prefer contributing to more projects under Fedora-infra team as its projects completely intersect with my area of interest. | Yes, I will keep contributing to the Fedora Project in my spare time even after the GSoC 2015 program. I would prefer contributing to more projects under Fedora-infra team as its projects completely intersect with my area of interest. I would be interested in doing contributions to pkgdb2 and fedocal as I have been looking at their codebase for references and would love to solve any bugs in them. | ||
;Why should we choose you over other applicants? | ;Why should we choose you over other applicants? | ||
I have already been actively involved in contributing to fedora projects and have worked on this project last year thereby, having a good understanding of project codebase and its requirements. I am pretty much sure that this time i will be able to | I have already been actively involved in contributing to fedora projects and have worked on this project last year thereby, having a good understanding of project codebase and its requirements. I am pretty much sure that this time i will be able to complete the project. I have created the following pull requests on summershum: | ||
* [https://github.com/fedora-infra/summershum/pull/44 Add unit-tests for sqlalchemy database] | |||
* [https://github.com/fedora-infra/summershum/pull/42 Add method for new table creation] | |||
* [https://github.com/fedora-infra/summershum/pull/40 Write alembic script to update current models to database schema in branch pkg_table] | |||
==Proposal Description== | ==Proposal Description== | ||
===Overview and The Need=== | ===Overview and The Need=== | ||
[https://github.com/fedora-infra/shumgrepper Shumgrepper] is a webapp which is built on top of [https://github.com/fedora-infra/summershum Summershum]. Summershum collects md5sum, shasum and sha256sum of every file in every package. Shumgrepper uses this information to check the integrity and duplication among different packages. It can be used to find the common or | [https://github.com/fedora-infra/shumgrepper Shumgrepper] is a webapp which is built on top of [https://github.com/fedora-infra/summershum Summershum]. Summershum collects md5sum, shasum and sha256sum of every file in every package. Shumgrepper uses this information to check the integrity and duplication among different packages. It can be used to find the common or different files among various packages by comparing their sha256 values. It also let you to query files by their shum values, files bundled within a package and compare different packages and tar_files. | ||
* Dev instance: http://209.132.184.120/ | :* Dev instance: http://209.132.184.120/ | ||
;Any relevant experience you have | ;Any relevant experience you have | ||
I worked on Shumgrepper last year as a GSoC project and built UI and API for it. Before this, I had contributed to Datagrepper | I worked on Shumgrepper last year as a GSoC project and built UI and API for it. Before this, I had contributed to Datagrepper. Besides this, I have been writing codes in Python for more than 3 years. Also, I have built many applications in Flask, webapp2, used jinja2 template and have good experience of working as a backend developer. | ||
===How do you intend to implement your proposal=== | |||
1. '''Database Migrations''': We had made some changes in the summershum [https://github.com/charulagrl/summershum/blob/pkg_table/summershum/model.py schema] so that [http://209.132.184.120/packages Packages list page] can be rendered fast. As a first step, I would be writing an alembic migration script to update current database according to the new schema. After this, it is important to check and compare the time taken and query results. | |||
* I have started working on this and wrote alembic script to [https://github.com/fedora-infra/summershum/pull/40 add and modify] few tables. Now, I have to figure out how should I run the queries so that data import can take place. | |||
* It was suggested to me by pingou that we can create files using createdb file, I have created a [https://github.com/fedora-infra/summershum/pull/42 pull request] that will add a method to create new database tables. | |||
* According to me, it would be wise to keep a copy of existing database somewhere and then doing all the migrations, so that in case the existing model doesn’t work out we can roll back. I can keep the data by compressing it. | |||
2. '''Running unit-tests''': In order to make sure that our code does not break its existing functionality while it grows in size, we need to write some tests that checks for all the different features for the application and check for possible features. This test is done on different functionalities one by one and checking if it returns expected results. It's important to run unit-tests before launching the product into production in order to minimise failures. | |||
=== | * I will be writing tests using [https://docs.python.org/dev/library/unittest.html#module-unittest unittest] package. | ||
* Recently, I wrote unit-tests for summershum to check if sqlalchemy database is working. I have pushed it [https://github.com/fedora-infra/summershum/pull/44 here]. Here I have written unit-test to check sqlalchemy database. First it check for creation of database and adding one file object to the database and then deletion of that database. test_query_file function will check if it is able to query that object from the database and if we are getting expected results. | |||
* For shumgrepper I will be writing unit-test for every end-point both for api and ui. As an example to check if /package/<package_name> returns right results or not. | |||
import json | |||
import unittest | |||
class FlaskTestCase(unittest.TestCase): | |||
def test_package(self): | |||
tester = app.test_client(self) | |||
response = tester.get(‘/package/fotoxx’, content_type=’application/json’) | |||
self.assertEqual(response.status_code, 200) | |||
self.assertEqual(json.loads(response.data), ["fotoxx-14.03.1.tar.gz", "fotoxx-14.05.tar.gz", "fotoxx-14.04.2.tar.gz", "fotoxx-14.05.1.tar.gz", "fotoxx-14.04.tar.gz"] | |||
* '''Custom HTTP Error Handlers''': This is not necessary, as Flask already its own error pages. But as per requirements we may write custom error pages for some errors. | |||
3. '''Deployment''': Shumgrepper codebase will transformed into a (non-development) deploy through three stages: | |||
* Build Stage: In this stage shumgrepper code repo will be transformed into an executable bundle known as build, in python i.e. rpm build. Using a version of the code at a commit specified by the deployment process, the build stage fetches and vendors dependencies and compiles binaries and assets. | |||
* Release Stage: This takes the build produced by the build stage (in our case it is the rpm package of the app) and combines it with the deploy’s current config. For shumgrepper, we will test the build on develop, production config and will provision it using ansible accordingly. The resulting release of the app will contains both the build and the config and will ready for immediate execution in the execution environment. | |||
* Run Stage: it (also known as “runtime”) runs the app in the execution environment, by launching some set of the app’s processes against a selected release. The build and along with the config will deployed on the fedora apps server. | |||
;Individual Pieces | |||
As discussed above deployment stages, there are some steps and tools which we use to simplify deployment and here we will be discussing about them in context to shumgrepper. | |||
As a part of the first step in deployment, we have to create a rpm package for shumgrepper. It will involve the following steps: | |||
First is to install development tools and setup account. | |||
Then, create a .spec file containing all the necessary information about the software being packaged. | |||
Add license to the package and other small steps to make it ready for packaging. | |||
To identify common errors in SPEC files, RPMS, SRPMS and fix them we can use rpmlint | |||
Then create binary RPMS and SRPMS from the SPEC file and check binary rpms with rpmlint. | |||
Use Mock to check if we had accurately listed build dependencies and Koji to test SRPM on other platforms. | |||
After all this, l can create a review request on bugzilla. | |||
After the request being approved, I can make a SCM request and upload the package to SCM and then push package to public repository. | |||
Here, I have created a spec file for shumgrepper. | |||
Name: shumgrepper | |||
Version: 0.0.1 | |||
Release: 1%{?dist} | |||
Summary: A webapp of summershum | |||
License: GPLv2+ | |||
URL: https://github.com/fedora-infra/shumgrepper | |||
Source0: https://pypi.python.org/packages/source/d/%{name}/%{name}-%{version}.tar.gz | |||
BuildArch: noarch | |||
BuildRequires: python2-devel | |||
BuildRequires: python-setuptools | |||
BuildRequires: fedmsg | |||
BuildRequires: python-flask | |||
BuildRequires: python-docutils | |||
BuildRequires: python-fedmsg-meta-fedora-infrastructure | |||
BuildRequires: m2crypto | |||
BuildRequires: python-m2ext | |||
BuildRequires: python-flask-wtf | |||
BuildRequires: python-summershum | |||
Requires: fedmsg >= 0.7.0 | |||
Requires: python-flask | |||
Requires: python-docutils | |||
Requires: python-fedmsg-meta-fedora-infrastructure | |||
Requires: m2crypto | |||
Requires: python-m2ext | |||
Requires: python-flask-wtf | |||
Requires: python-summershum | |||
%description | |||
Shumgrepper is a webapp that queries from summershum's database which collects | |||
the md5sum, sha1sum, sha256sum of every file present in every package in | |||
Fedora. Shumgrepper will allow you to query by shum values like sha1sum, | |||
sha256sum, md5sum and tar_sum, find the files bundled within a package and | |||
compare different packages and tar_files. | |||
%prep | |||
%setup -q | |||
%build | |||
%{__python} setup.py build | |||
%install | |||
%{__python} setup.py install -O1 --skip-build \ | |||
--install-data=%{_datadir} --root %{buildroot} | |||
mkdir -p %{buildroot}%{_datadir}/%{name}/apache/ | |||
install -m 644 apache/%{name}.wsgi %{buildroot}%{_datadir}/%{name}/apache/%{name}.wsgi | |||
mkdir -p %{buildroot}%{_sysconfdir}/%{name} | |||
install -m 644 apache/%{name}.cfg %{buildroot}%{_sysconfdir}/%{name}/%{name}.cfg | |||
mkdir -p %{buildroot}%{_sysconfdir}/httpd/conf.d | |||
install -m 644 apache/%{name}.conf %{buildroot}%{_sysconfdir}/httpd/conf.d/%{name}.conf | |||
%files | |||
%doc README.md LICENSE | |||
%config(noreplace) %{_sysconfdir}/httpd/conf.d/shumgrepper.conf | |||
%config(noreplace) %{_sysconfdir}/%{name}/%{name}.cfg | |||
%{_datadir}/%{name}/ | |||
%{python_sitelib}/%{name}/ | |||
%{python_sitelib}/%{name}-%{version}-py%{python_version}.egg-info/ | |||
%changelog | |||
'''Ansible''': Ansible is an automation system that makes deploying much easier and faster and doesn’t require any external agent. So the shumgrepper app will have `/ansible` folder on its root level constituting below components: | |||
* hosts: This file contains information about the hosts. It maps the ip to different deployment configs. | |||
<nowiki> | |||
<pre> | |||
[remote:children] | |||
production | |||
staging | |||
[servers:children] | |||
production | |||
staging | |||
vagrant | |||
[production] | |||
shumgrepper.devel.fedora-apps.com nickname=production vm=0 branch=master | |||
[staging] | |||
shumgrepper.staging.fedora-apps.com=staging vm=0 branch=develop | |||
[vagrant] | |||
default ansible_ssh_host=127.0.0.1 ansible_ssh_port=2222 nickname=local vm=1 branch=develop | |||
[local] | |||
localhost | |||
</pre> | |||
</nowiki> | |||
* setup.yml: This file will have information about setting up the Flask environment on the server | |||
<nowiki> | |||
<pre> | |||
--- | |||
- hosts: local | |||
connection: local | |||
vars_files: | |||
- vars.yml | |||
gather_facts: false | |||
tasks: | |||
- name: Create the SSH directory. | |||
file: state=directory path=~/.ssh/conf.d/{{ shumgrepper }} | |||
- name: Copy over config | |||
template: src=files/conf/ssh.local.tpl dest=~/.ssh/conf.d/{{ shumgrepper }}/config | |||
- name: Copy over private key | |||
copy: mode=700 src=files/ssl/{{ shumgrepper }}_id_rsa.encrypted dest=~/.ssh/conf.d/{{ project_name }}/id_rsa | |||
- name: Compile SSH config | |||
shell: ./compile.sh chdir=~/.ssh | |||
</pre> | |||
</nowiki> | |||
* vars.yml: This file contains the dependencies for the sever as well as the shumgrepper. | |||
<nowiki> | |||
<pre> | |||
--- | |||
project_name: changeme | |||
project_home: /home/changeme | |||
project_root: /var/projects/changeme | |||
project_repo: git@bitbucket.org:myuser/changeme.git | |||
system_packages: | |||
- build-essential | |||
- git | |||
- libevent-dev | |||
- nginx | |||
- libmysqlclient-dev | |||
- mysql-server | |||
- python-dev | |||
- python-setuptools | |||
- postfix | |||
- python-pip | |||
- nodejs | |||
pip_packages: | |||
- uwsgi | |||
- virtualenvwrapper | |||
- Flask | |||
- SQLAlchemy | |||
<pre> | |||
</nowiki> | |||
* setupdatabase.yml | |||
<nowiki> | |||
<pre> | |||
--- | |||
- hosts: local | |||
user: root | |||
sudo: yes | |||
sudo_user: postgres | |||
tasks: | |||
- name: create a test database user | |||
action: postgresql_user user=testuser password=test1ng | |||
- name: create test database | |||
action: postgresql_db name=test_ansible_db owner=testuser | |||
handlers: | |||
- name: restart postgresql | |||
action: service name=postgresql state=restarted | |||
</pre> | |||
</nowiki> | |||
: * These playbooks can be run using command ansible-playbook playbook.yml | |||
: * Ansible consist of tasks and handlers, where tasks are commands which get executed on the server ad handlers are just like task but they only run when told by tasks that some changes have occured on the client system. | |||
* After the package is built, the next major step would be to configure shumgrepper on fedoraproject.org server. The fedora server is like a linux operating system where we can setup shumgrepper similarily to how we do on fedora OS. IT involves: | |||
: * First installing summershum package by its tarfile and setting up a sqlalchemy database by running the summershum-cli command. | |||
: * Installing shumgrepper package and making changes to sqlalchemy.url so that it uses the same database created by summershum. | |||
: * Setting up apache server and starting the server. | |||
: * For any errors, we can check apache log to debug the errors. | |||
4. '''Testing and optimisation''': It has been observed that on remote server, when it comes to compare among different packages, it does so by comparing each file of one package with each and every file of other packages to find out common or different files; thereby queries take too long to return results. I need to find some ways by which we can plan to optimise these queries. | |||
* As an example, to get the common files among two tarballs i.e. fedora-release-21.tar.bz2 (I) and fedora-release-22.tar.bz2 (II) [http://209.132.184.120/compare/common?tar_file=fedora-release-21.tar.bz2&%20\%20tar_file=fedora-release-22.tar.bz2 (link)], it roughly takes around 58 sec to get the results. | |||
* It does so by first querying all the data of (I) and (II) tarball [https://github.com/fedora-infra/summershum/blob/develop/summershum/model.py#L112-L117 code] and then using [https://github.com/fedora-infra/shumgrepper/blob/master/shumgrepper/__init__.py#L253-L266 loops] to return results. | |||
* Instead we can just put a query like this: | |||
SELECT table1.filename, table2.filename, table1.sha256sum | |||
FROM files table1, files table2 | |||
WHERE table1.tarball = 'fedora-release-21.tar.bz2' AND table2.tarball = 'fedora-release-22.tar.bz2' | |||
AND table1.sha256sum == table2.sha256sum | |||
5. '''GPL License''': As we already have the information about shum values of files within packages. This can be used to find if a package is having a genuine GPL license or which version of GPL license does that package have. If we know the value of original GPL license, we can calculate its checksum and then compare it with the checksum values of the license that packages has. | |||
6. '''Querying by GPL license''': We can add a filter to query those packages which have a genuine GPL license. All the packages have a LICENSE file and if we know the hash values(sha1, sha256 or md5) of original license, then we can compare and find out if the package export real GPL license or not. | |||
* This will also involve adding one more attribute to the package table i.e. License which will have boolean values to specify the presence/ absence of genuine license. For this, again I will write a migration script and run database migrations. | |||
* While giving overview of package on '''/package/<package>''', we can mention which license does that package have. | |||
* We can also display the count of total packages having GPL license on /packages page. | * We can also display the count of total packages having GPL license on /packages page. | ||
Line 76: | Line 314: | ||
7. ''' Testing & Documentation''': This will involve testing all the end-points and their results. Also documenting everything implemented so far. | 7. ''' Testing & Documentation''': This will involve testing all the end-points and their results. Also documenting everything implemented so far. | ||
* It will require keeping track how much time queries are taking and trying to find more optimisations in case of excessive delays. | |||
* It will also involve maintaining the package and updating it for further changes. | * It will also involve maintaining the package and updating it for further changes. | ||
Line 83: | Line 322: | ||
* We can have some visualisation (in the form of bar charts) which will give an overview of the changes among different packages. | * We can have some visualisation (in the form of bar charts) which will give an overview of the changes among different packages. | ||
* While comparing among three packages for finding different files among the three. We can provides some stats where we list the count of differences between every two packages. | * While comparing among three packages for finding different files among the three. We can provides some stats where we list the count of differences between every two packages. | ||
* On [http://209.132.184.120/package/fotoxx/filenames /package/filenames], it list filenames present in each package. We can give add a link to each file which will contain information specific to every file. This may include: | |||
:: * Sha1sum, sha256sum and md5sum values of the file. | |||
:: * No of other packages which contains that file | |||
:: * Link to '''/filename/<filename>''' page. | |||
:: * Now, user can only compare among different packages on the basis of sha256sum, we should let user compare on the basis of md5sum or sha1sum. | |||
* Currently if we query by checksum, then it returns a table showing all the information related to that checksum. It also diplays the checksum by which we are querying. Ideally it shouldn’t be there because it is same for all the rows. | |||
* If we query for all the files present in a package on /package/<package>/filenames, it just listed all the files of that package. It would look more readable if it can list the filenames each version wise. Also we can differentiate the repeating ones by marking them with different colours. | |||
* I have seen that most of the other projects have a beautiful logo, till now shumgrepper doesn’t have any. We can plan to have one with collaboration from design team. | |||
==Deliverables== | ==Deliverables== | ||
Line 90: | Line 336: | ||
* Deployment of the app | * Deployment of the app | ||
* Manual or Documentation | * Manual or Documentation | ||
==Timeline== | ==Timeline== | ||
Line 98: | Line 343: | ||
|- | |- | ||
| May 25 | | May 25 | ||
| Official GSoC coding period begins. | | Official GSoC coding period begins. I will discuss about the deployment plans with fedora team. | ||
|- | |- | ||
| May 25 - May 31 (6 days) | | May 25 - May 31 (6 days) | ||
| | | Data Migration: Will implement procedures for migrating the database using alembic scripts. | ||
|- | |- | ||
| June 1 - June | | June 1 - June 04 (5 days) | ||
| Writing unit-tests | | Writing unit-tests for different components of the shumgrepper like Database Query test, Flask End Point tests | ||
|- | |- | ||
| June | | June 5 - June 30 (25 days) (This may take more time) | ||
| Deployment of app | | Deployment of app will involve package builds and provisioning using ansible. | ||
|- | |- | ||
| | | July 1 - July 10 (10 days) | ||
| | | Will work on optimisation of queries as by implementing better queries as well as will work on multithreading means. | ||
|- | |- | ||
| July | | July 10 - July 19 (10 days) | ||
| Implementing check of GPL License and querying by it | | Implementing check of GPL License and querying by it | ||
|- | |- | ||
| July 20 - August 3 (13 days) | | July 20 - August 3 (13 days) | ||
| | | Example cases and documentation of projects. Will use sphinx docs generator for this. | ||
|- | |- | ||
| August 4 - August 14 (10 days) | | August 4 - August 14 (10 days) | ||
| | | Improvement of the GUI for new features. | ||
|- | |- | ||
| August 15 - August 21 (1 week) | | August 15 - August 21 (1 week) | ||
Line 126: | Line 371: | ||
|- | |- | ||
| August 21 | | August 21 | ||
| Pencil down date | | Pencil down date and submission of final package. |
Latest revision as of 18:51, 27 March 2015
Project Title : Shumgrepper
Personal Information
- Name : charul
- Fedora Profile : charul
- GitHub : charulagrl
- Timezone : India, UTC +5:30
Contact Information
- E-mail : charul.agrl@gmail.com
- Phone : 91-8879018082
- IRC nick : charul at irc.freenode.net
- Blog url : https://honeycoding.wordpress.com
- Why do you want to work with the Fedora Project?
I have worked on various fedora projects and had a great experience while working on them. I already had participated in GSoC last year and worked on the same project Shumgrepper and this year I would like to bring this project to its completion. Besides all this, fedora is my favorite linux distro and it gives me immense pleasure in contributing to its projects. I also met with few fedora contributors this year in a conference and I must say I had a great time with them and learned many new stuff. They are energetic folks who loves what they are doing and this inspires me a lot.
- Do you have any past involvement with the Fedora project or another open source project as a contributor?
Yes, I have mainly contributed to
- Datagrepper I mainly contributed to built the web-frontend of the app
- Fedora-Packages I integrated datagrepper package feed on the package description page,
- Shumgrepper I wrote this app from start, setting up the the flask app, built api and web frontend for it, deploying its dev instance and
- Summershum I did database modification, added various sqlalchemy queries as per required in shumgrepper.
- Did you participate with the past GSoC programs, if so which years, which organizations?
Yes, I participated last year i.e. in year 2014 and worked with Fedora organisation.
- Will you continue contributing/ supporting the Fedora project after the GSoC 2015 program, if yes, which team(s), you are interested with?
Yes, I will keep contributing to the Fedora Project in my spare time even after the GSoC 2015 program. I would prefer contributing to more projects under Fedora-infra team as its projects completely intersect with my area of interest. I would be interested in doing contributions to pkgdb2 and fedocal as I have been looking at their codebase for references and would love to solve any bugs in them.
- Why should we choose you over other applicants?
I have already been actively involved in contributing to fedora projects and have worked on this project last year thereby, having a good understanding of project codebase and its requirements. I am pretty much sure that this time i will be able to complete the project. I have created the following pull requests on summershum:
- Add unit-tests for sqlalchemy database
- Add method for new table creation
- Write alembic script to update current models to database schema in branch pkg_table
Proposal Description
Overview and The Need
Shumgrepper is a webapp which is built on top of Summershum. Summershum collects md5sum, shasum and sha256sum of every file in every package. Shumgrepper uses this information to check the integrity and duplication among different packages. It can be used to find the common or different files among various packages by comparing their sha256 values. It also let you to query files by their shum values, files bundled within a package and compare different packages and tar_files.
- Dev instance: http://209.132.184.120/
- Any relevant experience you have
I worked on Shumgrepper last year as a GSoC project and built UI and API for it. Before this, I had contributed to Datagrepper. Besides this, I have been writing codes in Python for more than 3 years. Also, I have built many applications in Flask, webapp2, used jinja2 template and have good experience of working as a backend developer.
How do you intend to implement your proposal
1. Database Migrations: We had made some changes in the summershum schema so that Packages list page can be rendered fast. As a first step, I would be writing an alembic migration script to update current database according to the new schema. After this, it is important to check and compare the time taken and query results.
- I have started working on this and wrote alembic script to add and modify few tables. Now, I have to figure out how should I run the queries so that data import can take place.
- It was suggested to me by pingou that we can create files using createdb file, I have created a pull request that will add a method to create new database tables.
- According to me, it would be wise to keep a copy of existing database somewhere and then doing all the migrations, so that in case the existing model doesn’t work out we can roll back. I can keep the data by compressing it.
2. Running unit-tests: In order to make sure that our code does not break its existing functionality while it grows in size, we need to write some tests that checks for all the different features for the application and check for possible features. This test is done on different functionalities one by one and checking if it returns expected results. It's important to run unit-tests before launching the product into production in order to minimise failures.
- I will be writing tests using unittest package.
- Recently, I wrote unit-tests for summershum to check if sqlalchemy database is working. I have pushed it here. Here I have written unit-test to check sqlalchemy database. First it check for creation of database and adding one file object to the database and then deletion of that database. test_query_file function will check if it is able to query that object from the database and if we are getting expected results.
- For shumgrepper I will be writing unit-test for every end-point both for api and ui. As an example to check if /package/<package_name> returns right results or not.
import json import unittest class FlaskTestCase(unittest.TestCase): def test_package(self): tester = app.test_client(self) response = tester.get(‘/package/fotoxx’, content_type=’application/json’) self.assertEqual(response.status_code, 200) self.assertEqual(json.loads(response.data), ["fotoxx-14.03.1.tar.gz", "fotoxx-14.05.tar.gz", "fotoxx-14.04.2.tar.gz", "fotoxx-14.05.1.tar.gz", "fotoxx-14.04.tar.gz"]
- Custom HTTP Error Handlers: This is not necessary, as Flask already its own error pages. But as per requirements we may write custom error pages for some errors.
3. Deployment: Shumgrepper codebase will transformed into a (non-development) deploy through three stages:
- Build Stage: In this stage shumgrepper code repo will be transformed into an executable bundle known as build, in python i.e. rpm build. Using a version of the code at a commit specified by the deployment process, the build stage fetches and vendors dependencies and compiles binaries and assets.
- Release Stage: This takes the build produced by the build stage (in our case it is the rpm package of the app) and combines it with the deploy’s current config. For shumgrepper, we will test the build on develop, production config and will provision it using ansible accordingly. The resulting release of the app will contains both the build and the config and will ready for immediate execution in the execution environment.
- Run Stage: it (also known as “runtime”) runs the app in the execution environment, by launching some set of the app’s processes against a selected release. The build and along with the config will deployed on the fedora apps server.
- Individual Pieces
As discussed above deployment stages, there are some steps and tools which we use to simplify deployment and here we will be discussing about them in context to shumgrepper.
As a part of the first step in deployment, we have to create a rpm package for shumgrepper. It will involve the following steps: First is to install development tools and setup account. Then, create a .spec file containing all the necessary information about the software being packaged. Add license to the package and other small steps to make it ready for packaging. To identify common errors in SPEC files, RPMS, SRPMS and fix them we can use rpmlint Then create binary RPMS and SRPMS from the SPEC file and check binary rpms with rpmlint. Use Mock to check if we had accurately listed build dependencies and Koji to test SRPM on other platforms. After all this, l can create a review request on bugzilla. After the request being approved, I can make a SCM request and upload the package to SCM and then push package to public repository.
Here, I have created a spec file for shumgrepper.
Name: shumgrepper Version: 0.0.1 Release: 1%{?dist} Summary: A webapp of summershum License: GPLv2+ URL: https://github.com/fedora-infra/shumgrepper Source0: https://pypi.python.org/packages/source/d/%{name}/%{name}-%{version}.tar.gz BuildArch: noarch BuildRequires: python2-devel BuildRequires: python-setuptools BuildRequires: fedmsg BuildRequires: python-flask BuildRequires: python-docutils BuildRequires: python-fedmsg-meta-fedora-infrastructure BuildRequires: m2crypto BuildRequires: python-m2ext BuildRequires: python-flask-wtf BuildRequires: python-summershum Requires: fedmsg >= 0.7.0 Requires: python-flask Requires: python-docutils Requires: python-fedmsg-meta-fedora-infrastructure Requires: m2crypto Requires: python-m2ext Requires: python-flask-wtf Requires: python-summershum %description Shumgrepper is a webapp that queries from summershum's database which collects the md5sum, sha1sum, sha256sum of every file present in every package in Fedora. Shumgrepper will allow you to query by shum values like sha1sum, sha256sum, md5sum and tar_sum, find the files bundled within a package and compare different packages and tar_files. %prep %setup -q %build %{__python} setup.py build %install %{__python} setup.py install -O1 --skip-build \ --install-data=%{_datadir} --root %{buildroot} mkdir -p %{buildroot}%{_datadir}/%{name}/apache/ install -m 644 apache/%{name}.wsgi %{buildroot}%{_datadir}/%{name}/apache/%{name}.wsgi mkdir -p %{buildroot}%{_sysconfdir}/%{name} install -m 644 apache/%{name}.cfg %{buildroot}%{_sysconfdir}/%{name}/%{name}.cfg mkdir -p %{buildroot}%{_sysconfdir}/httpd/conf.d install -m 644 apache/%{name}.conf %{buildroot}%{_sysconfdir}/httpd/conf.d/%{name}.conf %files %doc README.md LICENSE %config(noreplace) %{_sysconfdir}/httpd/conf.d/shumgrepper.conf %config(noreplace) %{_sysconfdir}/%{name}/%{name}.cfg %{_datadir}/%{name}/ %{python_sitelib}/%{name}/ %{python_sitelib}/%{name}-%{version}-py%{python_version}.egg-info/ %changelog
Ansible: Ansible is an automation system that makes deploying much easier and faster and doesn’t require any external agent. So the shumgrepper app will have /ansible
folder on its root level constituting below components:
- hosts: This file contains information about the hosts. It maps the ip to different deployment configs.
<pre> [remote:children] production staging [servers:children] production staging vagrant [production] shumgrepper.devel.fedora-apps.com nickname=production vm=0 branch=master [staging] shumgrepper.staging.fedora-apps.com=staging vm=0 branch=develop [vagrant] default ansible_ssh_host=127.0.0.1 ansible_ssh_port=2222 nickname=local vm=1 branch=develop [local] localhost </pre>
- setup.yml: This file will have information about setting up the Flask environment on the server
<pre> --- - hosts: local connection: local vars_files: - vars.yml gather_facts: false tasks: - name: Create the SSH directory. file: state=directory path=~/.ssh/conf.d/{{ shumgrepper }} - name: Copy over config template: src=files/conf/ssh.local.tpl dest=~/.ssh/conf.d/{{ shumgrepper }}/config - name: Copy over private key copy: mode=700 src=files/ssl/{{ shumgrepper }}_id_rsa.encrypted dest=~/.ssh/conf.d/{{ project_name }}/id_rsa - name: Compile SSH config shell: ./compile.sh chdir=~/.ssh </pre>
- vars.yml: This file contains the dependencies for the sever as well as the shumgrepper.
<pre> --- project_name: changeme project_home: /home/changeme project_root: /var/projects/changeme project_repo: git@bitbucket.org:myuser/changeme.git system_packages: - build-essential - git - libevent-dev - nginx - libmysqlclient-dev - mysql-server - python-dev - python-setuptools - postfix - python-pip - nodejs pip_packages: - uwsgi - virtualenvwrapper - Flask - SQLAlchemy <pre>
- setupdatabase.yml
<pre> --- - hosts: local user: root sudo: yes sudo_user: postgres tasks: - name: create a test database user action: postgresql_user user=testuser password=test1ng - name: create test database action: postgresql_db name=test_ansible_db owner=testuser handlers: - name: restart postgresql action: service name=postgresql state=restarted </pre>
- * These playbooks can be run using command ansible-playbook playbook.yml
- * Ansible consist of tasks and handlers, where tasks are commands which get executed on the server ad handlers are just like task but they only run when told by tasks that some changes have occured on the client system.
- After the package is built, the next major step would be to configure shumgrepper on fedoraproject.org server. The fedora server is like a linux operating system where we can setup shumgrepper similarily to how we do on fedora OS. IT involves:
- * First installing summershum package by its tarfile and setting up a sqlalchemy database by running the summershum-cli command.
- * Installing shumgrepper package and making changes to sqlalchemy.url so that it uses the same database created by summershum.
- * Setting up apache server and starting the server.
- * For any errors, we can check apache log to debug the errors.
4. Testing and optimisation: It has been observed that on remote server, when it comes to compare among different packages, it does so by comparing each file of one package with each and every file of other packages to find out common or different files; thereby queries take too long to return results. I need to find some ways by which we can plan to optimise these queries.
- As an example, to get the common files among two tarballs i.e. fedora-release-21.tar.bz2 (I) and fedora-release-22.tar.bz2 (II) (link), it roughly takes around 58 sec to get the results.
- It does so by first querying all the data of (I) and (II) tarball code and then using loops to return results.
- Instead we can just put a query like this:
SELECT table1.filename, table2.filename, table1.sha256sum FROM files table1, files table2 WHERE table1.tarball = 'fedora-release-21.tar.bz2' AND table2.tarball = 'fedora-release-22.tar.bz2' AND table1.sha256sum == table2.sha256sum
5. GPL License: As we already have the information about shum values of files within packages. This can be used to find if a package is having a genuine GPL license or which version of GPL license does that package have. If we know the value of original GPL license, we can calculate its checksum and then compare it with the checksum values of the license that packages has.
6. Querying by GPL license: We can add a filter to query those packages which have a genuine GPL license. All the packages have a LICENSE file and if we know the hash values(sha1, sha256 or md5) of original license, then we can compare and find out if the package export real GPL license or not.
- This will also involve adding one more attribute to the package table i.e. License which will have boolean values to specify the presence/ absence of genuine license. For this, again I will write a migration script and run database migrations.
- While giving overview of package on /package/<package>, we can mention which license does that package have.
- We can also display the count of total packages having GPL license on /packages page.
7. Testing & Documentation: This will involve testing all the end-points and their results. Also documenting everything implemented so far.
- It will require keeping track how much time queries are taking and trying to find more optimisations in case of excessive delays.
- It will also involve maintaining the package and updating it for further changes.
8. Improving the GUI: We can improve user experiences with the app by making considerable changes in the UI. This could involve:
- We can have some visualisation (in the form of bar charts) which will give an overview of the changes among different packages.
- While comparing among three packages for finding different files among the three. We can provides some stats where we list the count of differences between every two packages.
- On /package/filenames, it list filenames present in each package. We can give add a link to each file which will contain information specific to every file. This may include:
- * Sha1sum, sha256sum and md5sum values of the file.
- * No of other packages which contains that file
- * Link to /filename/<filename> page.
- * Now, user can only compare among different packages on the basis of sha256sum, we should let user compare on the basis of md5sum or sha1sum.
- Currently if we query by checksum, then it returns a table showing all the information related to that checksum. It also diplays the checksum by which we are querying. Ideally it shouldn’t be there because it is same for all the rows.
- If we query for all the files present in a package on /package/<package>/filenames, it just listed all the files of that package. It would look more readable if it can list the filenames each version wise. Also we can differentiate the repeating ones by marking them with different colours.
- I have seen that most of the other projects have a beautiful logo, till now shumgrepper doesn’t have any. We can plan to have one with collaboration from design team.
Deliverables
- Migration of current data according to new schema.
- Testing, debugging and finishing off project.
- Deployment of the app
- Manual or Documentation
Timeline
Period | Task |
---|---|
May 25 | Official GSoC coding period begins. I will discuss about the deployment plans with fedora team. |
May 25 - May 31 (6 days) | Data Migration: Will implement procedures for migrating the database using alembic scripts. |
June 1 - June 04 (5 days) | Writing unit-tests for different components of the shumgrepper like Database Query test, Flask End Point tests |
June 5 - June 30 (25 days) (This may take more time) | Deployment of app will involve package builds and provisioning using ansible. |
July 1 - July 10 (10 days) | Will work on optimisation of queries as by implementing better queries as well as will work on multithreading means. |
July 10 - July 19 (10 days) | Implementing check of GPL License and querying by it |
July 20 - August 3 (13 days) | Example cases and documentation of projects. Will use sphinx docs generator for this. |
August 4 - August 14 (10 days) | Improvement of the GUI for new features. |
August 15 - August 21 (1 week) | Final phase of the project i.e. cleaning codes, documenting everything, reviewing all the functionalities
and fixing bugs. |
August 21 | Pencil down date and submission of final package. |