(Raw dump of ideas from mailing list discussion) |
(Formatting round 1) |
||
Line 206: | Line 206: | ||
=== Deployment === | === Deployment === | ||
We commonly perform three types of deployment: hotfixes, minor deployments, and major deployments. | |||
Hotfixes are handled in puppet and ansible. That will be written up in a separate section. | |||
The other upgrades are handled via an rpm upgrade, config file updates, and database migrations (depending on how big the changes are). The difference between major and minor deployments are: | |||
* API breakage: If API is known to be broken, it's probably a major update. | |||
* Minor, incremental releases (hopefully these would correlate with not having API breaks :-) | |||
* Leaf services that are not essential to releasing Fedora are minor deployments | |||
** ask, nuancier, elections, easyfix, badges, paste, nuancier | |||
** There's a lot of boderline cases too -- is fedocal essential enough to warrant being under this policy? Since the wiki is used via its API should that fall under this as well? | |||
==== Major Deployments ==== | |||
* | |||
* We have to make people aware when a new deployment means API breaks. | |||
** Be clear that the new deployment means API breaks in every call for testing. Send announcements to infrastructure list and depending on the service to devel list. | |||
** Have a separate announcement besides the standard outage notification that says that an API breaking update is planned for $date | |||
** When we set a date for the new deployment, discuss it at least once in a weekly infrastructure meeting. | |||
** See also the solution in #3 below | |||
* It would be really nice for people to do more testing in stg. | |||
** Increase rube coverage. rube does end-to-end testing so it's better at catching cross-app issues where API changes better than unittests which try to be small and self-contained | |||
*** A flock session where everyone/dev in infra gets to write one rube test so we get to know the framework | |||
*** Run rube daily- Could we run rube in an Xvfb on an infrastructure host? | |||
*** Continue to work towards a complete replica of production in the stg environment. | |||
* "Mean time to repair is more important than mean time between failure." It seems like anytime there's a major update there's unexpected things that break. Let's anticipate the unexpected happening. | |||
** Explicitly plan for everyone to spend their day firefighting when we make a major new deployment. If you've already found all the places your code is affected and pre-ported it and the deployment goes smoothly then hey, you've got 6 extra working hours to shift back to doing other things. If it's not smooth, then we've planned to have the attention of the right people for the unexpected difficulties that arise. | |||
** As part of this, we need to identify people outside of infrastructure that should also be ready for breakage. Reach out to rel-eng, docs, qa, cvsadmins, etc if there's a chance that they will be affected. | |||
* Buggy code happens. How can we make it happen less? | |||
** More unittests would be good however we know from experience with bodhi that unittests don't catch a lot of things that are changes in behaviour rather than true "bugs". Unexpected API changes that cause people porting pain can be as simple as returning None instead of an empty list which causes a no-op iteration in running code to fail while the unittests survive because they're checking that "no results were returned". | |||
** Pingou has championed making API calls and WebUI calls into separate URL endpoints. I think that coding style makes it easier to control bugs related to updating the webui while trying to preserve the API so we probably want to move to that model as we move onto the next major version of our apps. | |||
** Not returning json-ified versions of internal data structures (like database tables) but instead parsing the results and returning a specific structure would also help divorce internal changes from external API. | |||
* Version our APIs | |||
** What's our goal here? To allow for clients to port ahead of time (Just need server to provide a version and the client checks the version and invokes appropriate code). Or do we want to aid in identifying out what apps are using old api? (Could have clients query for a set of API versions and the server replies with the one it supports. Or could have the whole API behind a version number). | |||
== Security Best Practices == | == Security Best Practices == |
Revision as of 23:04, 17 June 2014
Coding style
- Python code should be PEP8 compliant
Tools to improve style
You can run pylint over your code once in a while to help catch PEP8 violations. pyflakes and pep8 (the program) are fast enough that you could setup your editor to run them whenever you save a python file (but they don't catch nearly as many things as pylint does).
Minimum Python Version
Except in very specific cases (certain libraries that may need to be run on RHEL5 clients), python code should target compatibility with python-2.6 (RHEL6 and all Fedora releases.) You should accept patches for compatibility with the latest version of python3 but at this time we are not writing code that runs solely on python3 (bloating the systems with both python2 and python3; inability to run mod_wsgi for both python2 and python3 on a single machine, etc).
Projects with lower minimum python2 versions:
- python-kitchen -- This needs to target RHEL5 until RHEL5 goes EOL.
python-fedora-- This now targets RHEL6. However, if some people are still communicating with Fedora Services on RHEL5, they may need fixes backported to the 0.3.29.x release.
Common Programming Issues
Centralized logging
Most of our apps use the standard Python logging module, which usually ends up logging to /var/log/httpd/error_log on the app server.
To have your app logs shipped to our central logging server, you can configure the SysLogHandler to do so.
For config-file based logging setups, you can do something like the following:
[logging] [[handlers]] [[[syslog_out]]] class='handlers.SysLogHandler' args="('/dev/log', handlers.SysLogHandler.LOG_LOCAL4, socket.SOCK_STREAM)" formatter='full_content' [[loggers]] [[[bodhi]]] level='DEBUG' qualname='bodhi' handlers=['syslog_out'] propagate=0
(Note that the socket.SOCK_STREAM argument is only available on python 2.7+ and can be omitted on RHEL6 and earlier.)
Here is an example of doing it in pure Python:
import logging import logging.handlers syslogger = logging.getLogger('bodhi') syslogger.setLevel(logging.DEBUG) handler = logging.handlers.SysLogHandler(address='/dev/log', facility=logging.handlers.SysLogHandler.LOG_LOCAL4) syslogger.addHandler(handler) syslogger.info('Hello SysLog!')
The app logs will then appear in /var/log/hosts/<HOST>/<YEAR>/<MONTH>/<DAY>/apps.log as well as the merged log /var/log/merged/apps.log on our central rsyslog server.
Translations
Use transifex to manage translations and add the projects to the Fedora Project Team so the Fedora i18n team can translate.
Do not mark Exception messages for translation. We want those to remain untranslated for several reasons:
- It's easier to do a web search for the exceptions if the messages aren't translated.
- Translators don't know what all the technical information in exception messages are so the translations aren't always accurate.
- The exception messages aren't intended for end users -- they're to help with debugging or other coders. So it's not very useful to translate them.
fedmsg
Think about how to add fedmsg hooks into every event that changes data. However, also make it so that fedmsg is optional (ie: the app doesn't fail if fedmsg is not available/installed). This makes it easier to test apps outside of infrastructure and makes us more robust in case fedmsg starts failing sometime.
Libraries to use
- kitchen: A library of common code that we've needed in Fedora. Lots of code to deal with unicode issues and localization. A few other things as well. Docs
yum install python-kitchen
Web Frameworks
Use one of Flask, Pyramid, or TurboGears2. Limiting the number of web frameworks we're responsible for will greatly help with our long term maintenance burden.
Version numbering
Version numbers for releases should generally be three incrementing numbers separated by dots ("."). Each number has a name:
MAJOR.MINOR.MICRO
There is a fourth number that can be added after MICRO
called PATCHLEVEL
. We'll talk about that in the #Immature_versions section.
The MAJOR
number should be incremented anytime there is a backwards incompatible change (examples: deleting methods, removing attributes from the return value of a method, sometimes changing the ordering of a list that is publically available). The MINOR
number should be incremented whenever there's an addition to API or other change in behaviour that is compatible but especially noteworthy (For instance, adding a method. Adding an entry to a dict that's being returned.). The MICRO
number should be incremented when a new release just changes internals in a way that isn't visible to the user (Bugfixes generally fall into this category).
One thing to note is that there's a small amount of flexibility in this scheme. Some bugfixes change API -- for instance if a field wasn't supposed to be public but was being returned as part of a dict by mistake. Removing this on the grounds that no one should be using it and only bumping the MINOR
number (ordinarily, removals would bump the MAJOR
number) may be okay. Err on the side of incrementing numbers as it's better for consumers of libraries to be warned by the version number bump to check their code and find that the change does not affect them than for them to not be warned of a change that breaks things for them.
Immature Versions
Much of our code can be considered immature. We release relatively frequently and may be updating the API often. For these pieces of software we should still be adhering to a versioning scheme to help consumers know that changes to the API they are consuming may be coming. The scheme for those is very similar to the MAJOR.MINOR.MICRO
given above but shifted over one position. Start by setting the MAJOR
number to zero and then using MINOR
for the same purpose as MAJOR
in the previous section. Releases, where there are not API incompatible changes can bump MICRO
whether or not there are forwards compatible changes to the API or just bugfixes. In the special case where there is a problem with a newly released version (usually within a few hours to one day) is found to have a defect that must be fixed, add a fourth number, PATCHLEVEL
to show that you are replacing the previous release with a new one. The version for one of these would look like this: 0.3.32.1
Prereleases
There are two ways to do pre-release versioning. I have a slight preference for the first but I'll mention both:
Make the releases use the version number of the previous release with a MINOR
(or PATCHLEVEL
if it's an #Immature_Version) of .90
. The next prerelease would use .91
. And so on. The advantage of this is that computers can properly parse the order of these compared to the previous release and the next release. The disadvantage is that we are reusing a field that we use for other purposes for this.
The other method is to append a1
for the first alpha release, a2
for the second alpha, b1
for the first beta, and so on. This method is not parsed correctly by computers without giving them knowledge of the pre-release numbering scheme, however, we do have well documented methods of dealing with them. For instance, the Packaging:NamingGuidelines#Non-Numeric_Version_in_Release Prerelease versioning scheme for Fedora rpms. Just be sure that you follow those guidelines when building rpms otherwise you won't be able to upgrade to your final release properly. If you use this strategy, you should also be sure that your numbers and letters comply with the meanings assigned in PEP 386 for versions. (See, if you'd just use the first method, you could avoid having to read all of these standards documents ;-)
Common Development Tasks
Use git flow
We would like to have our upstream repository reflect what's deployed in infrastructure as well as what we are working on at the moment. git flow's ideas of separate branches maps well to this vision. The master branch in git flow should be very close to the code we have deployed in production. It should contain the code from the latest release plus any hotfixes that we've applied over the top (in infrastructure we use a puppet module to deploy the hotfix. In the upstream repo, we can use git flow hotfixes to manage applying the hotfix to the master and develop branch).
Getting started
Quickstart to using git flow.
yum install gitflow
.
Then get clone the repository and initialize it for use:
# First thing when you initially clone a repo $ git clone $URL $ git flow init # This will ask you questions. The defaults are almost always # fine because we've setup the repositories to work with it
You should always work on the 'devel' or 'develop' branch if just doing small maintenance. Changes on develop will get pushed out as the next major release.
New major feature
# Working on a major feature # git flow creates feature branches that are prefixed with feature/ $ git flow feature start my-new-feature # make edits, commit # Share that feature with others for review. This pushes your branch up to github $ git flow feature publish my-new-feature
Go to the github interface and open a pull request. (In the future, perhaps we could package http://defunkt.io/hub/ and get a more automated way to do this step).
Ping someone(s) to review the change for you and +1 on the pull request. We aren't very strict about how thorough a code review must be at this point. The code review is partly to catch errors in the code but it's also to expose new contributors to how our code works. So a perfunctory code review by someone who's new to this infrastructure project has value just as a review by someone who's a better coder than you. If you want the change to be reviewed by someone specific for specific things that you're unsure about, be sure to ping them specifically.
# When you're done.. you can merge it on github with a click, or if you'd like to do so from the CLI $ git flow feature finish my-new-feature
Hotfix
hotfixes end up on both the master and develop branches. They branch off of master.
$ git flow hotfix start fix-traceback-on-login # make some edits $ git flow hotfix publish fix-traceback-on-login
Ping the #fedora-apps channel or specific people to review and +1 your change. Note that small hotfixes are especially good for getting people who aren't core contributors to this code base to start becoming familiar with it whereas core committers might be a little bored with looking at them. If we have new contributors wanting to join, this is a good way to introduce them to the project.
hotfixes sometimes have a time element so remember that we deploy hotfixes in infrastructure independently of when the code is merged upstream. If you're confident of the fix, you may want to commit the hotfix to our production (or staging) deployments in infrastructure's puppet repo and create the hotfix ticket in trac while you're waiting for the hotfix to be reviewed.
# When you're done.. you can merge it from the CLI like this: $ git flow hotfix finish my-new-feature # If you want to do it on github, you need to both merge the pull request to # the develop and the master branches.
Making a release
# Releases end up on the master branch $ git flow release start 0.3.32 $ # make some edits, bump version numbers, commit, release to pypi. $ git flow release finish 0.3.32 $ git push --all $ git push --tags
Further Reading
You can also read this quick overview and a more in depth overview of what the various branches are.
Using Vanilla git
If you're more comfortable using vanilla git without the gitflow addon, the things to be aware of are:
- new commits should go to the devel branch (the default on the repos we set up on github).
- master is used to make releases from and should reflect what's in production (or in an rpm, ready to be pushed to production) at any given time.
- hotfixes have their own branches that need to be merged into both master (for release) and devel.
Removing git branches
When a feature, release, etc has been merged or abandoned we can clean up the old branches. Removing them will make it easier for contributors (both new and old) to see what branches are currently important for the project. The commits that make up the branches will still be around if the changes were merged to devel. It's just that the nice way of referencing them as a separate branch will be gone.
If you merge on the github interface, there will be a button on the pull request "delete this branch". Pushing that will remove the branch.
From the git cli, you can do git push origin :$NAME_OF_FEATURE_BRANCH
For instance, git push origin :feature/timeout-fix
.
Ordinarily, when you use git push
, git figures out where you want to push to (usually origin
for the github server). It then pushes the current local branch to a branch of the same name on the origin. If you were explicit about this, the command line syntax would be: git push origin develop:develop
. When we delete, we're using this explicit syntax but we're telling git that instead of a local branch to the origin, we want to push nothing (thus, having nothing on the left side of the colon).
Deployment
We commonly perform three types of deployment: hotfixes, minor deployments, and major deployments.
Hotfixes are handled in puppet and ansible. That will be written up in a separate section.
The other upgrades are handled via an rpm upgrade, config file updates, and database migrations (depending on how big the changes are). The difference between major and minor deployments are:
- API breakage: If API is known to be broken, it's probably a major update.
- Minor, incremental releases (hopefully these would correlate with not having API breaks :-)
- Leaf services that are not essential to releasing Fedora are minor deployments
- ask, nuancier, elections, easyfix, badges, paste, nuancier
- There's a lot of boderline cases too -- is fedocal essential enough to warrant being under this policy? Since the wiki is used via its API should that fall under this as well?
Major Deployments
- We have to make people aware when a new deployment means API breaks.
- Be clear that the new deployment means API breaks in every call for testing. Send announcements to infrastructure list and depending on the service to devel list.
- Have a separate announcement besides the standard outage notification that says that an API breaking update is planned for $date
- When we set a date for the new deployment, discuss it at least once in a weekly infrastructure meeting.
- See also the solution in #3 below
- It would be really nice for people to do more testing in stg.
- Increase rube coverage. rube does end-to-end testing so it's better at catching cross-app issues where API changes better than unittests which try to be small and self-contained
- A flock session where everyone/dev in infra gets to write one rube test so we get to know the framework
- Run rube daily- Could we run rube in an Xvfb on an infrastructure host?
- Continue to work towards a complete replica of production in the stg environment.
- Increase rube coverage. rube does end-to-end testing so it's better at catching cross-app issues where API changes better than unittests which try to be small and self-contained
- "Mean time to repair is more important than mean time between failure." It seems like anytime there's a major update there's unexpected things that break. Let's anticipate the unexpected happening.
- Explicitly plan for everyone to spend their day firefighting when we make a major new deployment. If you've already found all the places your code is affected and pre-ported it and the deployment goes smoothly then hey, you've got 6 extra working hours to shift back to doing other things. If it's not smooth, then we've planned to have the attention of the right people for the unexpected difficulties that arise.
- As part of this, we need to identify people outside of infrastructure that should also be ready for breakage. Reach out to rel-eng, docs, qa, cvsadmins, etc if there's a chance that they will be affected.
- Buggy code happens. How can we make it happen less?
- More unittests would be good however we know from experience with bodhi that unittests don't catch a lot of things that are changes in behaviour rather than true "bugs". Unexpected API changes that cause people porting pain can be as simple as returning None instead of an empty list which causes a no-op iteration in running code to fail while the unittests survive because they're checking that "no results were returned".
- Pingou has championed making API calls and WebUI calls into separate URL endpoints. I think that coding style makes it easier to control bugs related to updating the webui while trying to preserve the API so we probably want to move to that model as we move onto the next major version of our apps.
- Not returning json-ified versions of internal data structures (like database tables) but instead parsing the results and returning a specific structure would also help divorce internal changes from external API.
- Version our APIs
- What's our goal here? To allow for clients to port ahead of time (Just need server to provide a version and the client checks the version and invokes appropriate code). Or do we want to aid in identifying out what apps are using old api? (Could have clients query for a set of API versions and the server replies with the one it supports. Or could have the whole API behind a version number).
Security Best Practices
- Make yourself familiar with the top ten Web Application vulnerabilities and take measures against them.
- Always use prepared statements for SQL queries or a framework's abstraction that does this
- Only if no input from a user is added to a SQL query or if it is not possible to use prepared statements, build the query yourself. Then let someone review the query and add warnings about this to the source code
- Use a template engine to build web pages that takes care of proper encoding of user input to avoid Cross-Site Scripting
- Handy ressources: Cross-Site Scripting Prevention Cheat Sheet, HTML5 security cheat sheet
- Protect authentication cookies with the flag
HttpOnly
- Enable CSRF protection in your framework
- Follow the REST model, do use GET requests to change state of the application, because GET requests might not be covered by CSRF protection
- Always use HTTPS for web apps that allow to login
- Check that the certificate is valid for the intended/published hostname
- Do not load external JavaScript code
- If you need to do this, do it via HTTPS
- Protect authentication cookies with the flag
Secure
- If the service runs on a server where all web applications should only be accessed via HTTPS enable HTTP Strict Transport Security
- Set the X-Frame-Options header unless your applications needs to be included in a frame
- Whenever someone logs in, create a new session ID and authenticate only the new session ID
- If someone logs out, invalidate the session ID on the server side