From Fedora Project Wiki
(Mention that the SysLogHandler socktype option was added in python2.7)
(Raw dump of ideas from mailing list discussion)
Line 204: Line 204:


Ordinarily, when you use <code>git push</code>, git figures out where you want to push to (usually <code>origin</code> for the github server).  It then pushes the current local branch to a branch of the same name on the origin.  If you were explicit about this, the command line syntax would be: <code>git push origin develop:develop</code>.  When we delete, we're using this explicit syntax but we're telling git that instead of a local branch to the origin, we want to push nothing (thus, having nothing on the left side of the colon).
Ordinarily, when you use <code>git push</code>, git figures out where you want to push to (usually <code>origin</code> for the github server).  It then pushes the current local branch to a branch of the same name on the origin.  If you were explicit about this, the command line syntax would be: <code>git push origin develop:develop</code>.  When we delete, we're using this explicit syntax but we're telling git that instead of a local branch to the origin, we want to push nothing (thus, having nothing on the left side of the colon).
=== Deployment ===
1: We have to make people aware when a new deployment means API breaks.
  * Be clear that the new deployment means API breaks in every call for
    testing.  Send announcements to infrastructure list and depending on the
    service to devel list.
  * Have a separate announcement besides the standard outage notification
    that says that an API breaking update is planned for $date
  * When we set a date for the new deployment, discuss it at least once in
    a weekly infrastructure meeting.
  * See also the solution in #3 below
2: It would be really nice for people to do more testing in stg.
  * Increase rube coverage.  rube does end-to-end testing so it's better at
    catching cross-app issues where API changes better than unittests which
    try to be small and self-contained
    - A flock session where everyone/dev in infra gets to write one rube
      test so we get to know the framework
  * Run rube daily
    - Could we run rube in an Xvfb on an infrastructure host?
  * Continue to work towards a complete replica of production in the stg
    environment.
3: "Mean time to repair is more important than mean time between failure."
  It seems like anytime there's a major update there's unexpected things that
  break.  Let's anticipate the unexpected happening.
  * Explicitly plan for everyone to spend their day firefighting when we
    make a major new deployment.  If you've already found all the places
    your code is affected and pre-ported it and the deployment goes smoothly
    then hey, you've got 6 extra working hours to shift back to doing other
    things.  If it's not smooth, then we've planned to have the attention of
    the right people for the unexpected difficulties that arise.
  * As part of this, we need to identify people outside of infrastructure
    that should also be ready for breakage.  Reach out to rel-eng, docs, qa,
    cvsadmins, etc if there's a chance that they will be affected.
4: Related to the FAS release: Buggy code happens.  How can we make it
  happen less?
  * More unittests would be good however we know from experience with bodhi
    that unittests don't catch a lot of things that are changes in behaviour
    rather than true "bugs".  Unexpected API changes that cause people
    porting pain can be as simple as returning None instead of an empty list
    which causes a no-op iteration in running code to fail while the
    unittests survive because they're checking that "no results were
    returned".
  * Pingou has championed making API calls and WebUI calls into separate
    URL endpoints. I think that coding style makes it easier to control
    bugs related to updating the webui while trying to preserve the API so
    we probably want to move to that model as we move onto the next major
    version of our apps.
  * Not returning json-ified versions of internal data structures (like
    database tables) but instead parsing the results and returning
    a specific structure would also help divorce internal changes from
    external API.
What should we apply this to?
* Probably can skip if:
  - Things that we don't think have API breaks
  - Things that are minor releases (hopefully these would correlate with not having API breaks :-)
  - Leaf services that are not essential to releasing Fedora.
    + ask, nuancier, elections, easyfix, badges, paste, nuancier
    + There's a lot of boderline cases too -- is fedocal essential enough
      to warrant being under this policy?  Since the wiki is used via its
      API should that fall under this as well?


== Security Best Practices ==
== Security Best Practices ==

Revision as of 22:49, 17 June 2014

Coding style

Tools to improve style

You can run pylint over your code once in a while to help catch PEP8 violations. pyflakes and pep8 (the program) are fast enough that you could setup your editor to run them whenever you save a python file (but they don't catch nearly as many things as pylint does).

Minimum Python Version

Except in very specific cases (certain libraries that may need to be run on RHEL5 clients), python code should target compatibility with python-2.6 (RHEL6 and all Fedora releases.) You should accept patches for compatibility with the latest version of python3 but at this time we are not writing code that runs solely on python3 (bloating the systems with both python2 and python3; inability to run mod_wsgi for both python2 and python3 on a single machine, etc).

Projects with lower minimum python2 versions:

  • python-kitchen -- This needs to target RHEL5 until RHEL5 goes EOL.
  • python-fedora -- This now targets RHEL6. However, if some people are still communicating with Fedora Services on RHEL5, they may need fixes backported to the 0.3.29.x release.

Common Programming Issues

Centralized logging

Most of our apps use the standard Python logging module, which usually ends up logging to /var/log/httpd/error_log on the app server.

To have your app logs shipped to our central logging server, you can configure the SysLogHandler to do so.

For config-file based logging setups, you can do something like the following:

[logging]                                                                                                                                                             
                                                                                                                                                                          
[[handlers]]                                                                                                                                                          
                                                                                                                                                                         
[[[syslog_out]]]                                                                                                                                                      
class='handlers.SysLogHandler'                                                                                                                                        
args="('/dev/log', handlers.SysLogHandler.LOG_LOCAL4, socket.SOCK_STREAM)"                                                                                                               
formatter='full_content'                                                                                                                                              
                                                                                                                                                                          
[[loggers]]                                                                                                                                                           
                                                                                                                                                                          
[[[bodhi]]]                                                                                                                                                           
level='DEBUG'                                                                                                                                                         
qualname='bodhi'                                                                                                                                                      
handlers=['syslog_out']                                                                                                                                               
propagate=0                                                                                                                                                           

(Note that the socket.SOCK_STREAM argument is only available on python 2.7+ and can be omitted on RHEL6 and earlier.)

Here is an example of doing it in pure Python:

import logging
import logging.handlers

syslogger = logging.getLogger('bodhi')
syslogger.setLevel(logging.DEBUG)
handler = logging.handlers.SysLogHandler(address='/dev/log', facility=logging.handlers.SysLogHandler.LOG_LOCAL4)
syslogger.addHandler(handler)
syslogger.info('Hello SysLog!')

The app logs will then appear in /var/log/hosts/<HOST>/<YEAR>/<MONTH>/<DAY>/apps.log as well as the merged log /var/log/merged/apps.log on our central rsyslog server.

Translations

Use transifex to manage translations and add the projects to the Fedora Project Team so the Fedora i18n team can translate.

Do not mark Exception messages for translation. We want those to remain untranslated for several reasons:

  1. It's easier to do a web search for the exceptions if the messages aren't translated.
  2. Translators don't know what all the technical information in exception messages are so the translations aren't always accurate.
  3. The exception messages aren't intended for end users -- they're to help with debugging or other coders. So it's not very useful to translate them.

fedmsg

Think about how to add fedmsg hooks into every event that changes data. However, also make it so that fedmsg is optional (ie: the app doesn't fail if fedmsg is not available/installed). This makes it easier to test apps outside of infrastructure and makes us more robust in case fedmsg starts failing sometime.

Libraries to use

  • kitchen: A library of common code that we've needed in Fedora. Lots of code to deal with unicode issues and localization. A few other things as well. Docs yum install python-kitchen

Web Frameworks

Use one of Flask, Pyramid, or TurboGears2. Limiting the number of web frameworks we're responsible for will greatly help with our long term maintenance burden.

Version numbering

The version numbering for libraries and applications that provide an API (almost all of our web apps) needs to be much stricter than application version numbering for other applications but the following can be considered best practices for both.

Version numbers for releases should generally be three incrementing numbers separated by dots ("."). Each number has a name: MAJOR.MINOR.MICRO There is a fourth number that can be added after MICRO called PATCHLEVEL. We'll talk about that in the #Immature_versions section.

The MAJOR number should be incremented anytime there is a backwards incompatible change (examples: deleting methods, removing attributes from the return value of a method, sometimes changing the ordering of a list that is publically available). The MINOR number should be incremented whenever there's an addition to API or other change in behaviour that is compatible but especially noteworthy (For instance, adding a method. Adding an entry to a dict that's being returned.). The MICRO number should be incremented when a new release just changes internals in a way that isn't visible to the user (Bugfixes generally fall into this category).

One thing to note is that there's a small amount of flexibility in this scheme. Some bugfixes change API -- for instance if a field wasn't supposed to be public but was being returned as part of a dict by mistake. Removing this on the grounds that no one should be using it and only bumping the MINOR number (ordinarily, removals would bump the MAJOR number) may be okay. Err on the side of incrementing numbers as it's better for consumers of libraries to be warned by the version number bump to check their code and find that the change does not affect them than for them to not be warned of a change that breaks things for them.

Immature Versions

Much of our code can be considered immature. We release relatively frequently and may be updating the API often. For these pieces of software we should still be adhering to a versioning scheme to help consumers know that changes to the API they are consuming may be coming. The scheme for those is very similar to the MAJOR.MINOR.MICRO given above but shifted over one position. Start by setting the MAJOR number to zero and then using MINOR for the same purpose as MAJOR in the previous section. Releases, where there are not API incompatible changes can bump MICRO whether or not there are forwards compatible changes to the API or just bugfixes. In the special case where there is a problem with a newly released version (usually within a few hours to one day) is found to have a defect that must be fixed, add a fourth number, PATCHLEVEL to show that you are replacing the previous release with a new one. The version for one of these would look like this: 0.3.32.1

Prereleases

There are two ways to do pre-release versioning. I have a slight preference for the first but I'll mention both:

Make the releases use the version number of the previous release with a MINOR (or PATCHLEVEL if it's an #Immature_Version) of .90. The next prerelease would use .91. And so on. The advantage of this is that computers can properly parse the order of these compared to the previous release and the next release. The disadvantage is that we are reusing a field that we use for other purposes for this.

The other method is to append a1 for the first alpha release, a2 for the second alpha, b1 for the first beta, and so on. This method is not parsed correctly by computers without giving them knowledge of the pre-release numbering scheme, however, we do have well documented methods of dealing with them. For instance, the Packaging:NamingGuidelines#Non-Numeric_Version_in_Release Prerelease versioning scheme for Fedora rpms. Just be sure that you follow those guidelines when building rpms otherwise you won't be able to upgrade to your final release properly. If you use this strategy, you should also be sure that your numbers and letters comply with the meanings assigned in PEP 386 for versions. (See, if you'd just use the first method, you could avoid having to read all of these standards documents ;-)

Common Development Tasks

Use git flow

We would like to have our upstream repository reflect what's deployed in infrastructure as well as what we are working on at the moment. git flow's ideas of separate branches maps well to this vision. The master branch in git flow should be very close to the code we have deployed in production. It should contain the code from the latest release plus any hotfixes that we've applied over the top (in infrastructure we use a puppet module to deploy the hotfix. In the upstream repo, we can use git flow hotfixes to manage applying the hotfix to the master and develop branch).

Getting started

Quickstart to using git flow.

yum install gitflow.

Then get clone the repository and initialize it for use:

# First thing when you initially clone a repo
$ git clone $URL
$ git flow init
# This will ask you questions.  The defaults are almost always
# fine because we've setup the repositories to work with it

You should always work on the 'devel' or 'develop' branch if just doing small maintenance. Changes on develop will get pushed out as the next major release.

New major feature

# Working on a major feature
# git flow creates feature branches that are prefixed with feature/

$ git flow feature start my-new-feature
# make edits, commit

# Share that feature with others for review.  This pushes your branch up to github
$ git flow feature publish my-new-feature

Go to the github interface and open a pull request. (In the future, perhaps we could package http://defunkt.io/hub/ and get a more automated way to do this step).

Ping someone(s) to review the change for you and +1 on the pull request. We aren't very strict about how thorough a code review must be at this point. The code review is partly to catch errors in the code but it's also to expose new contributors to how our code works. So a perfunctory code review by someone who's new to this infrastructure project has value just as a review by someone who's a better coder than you. If you want the change to be reviewed by someone specific for specific things that you're unsure about, be sure to ping them specifically.

# When you're done.. you can merge it on github with a click, or if you'd like to do so from the CLI
$ git flow feature finish my-new-feature

Hotfix

hotfixes end up on both the master and develop branches. They branch off of master.

$ git flow hotfix start fix-traceback-on-login
# make some edits
$ git flow hotfix publish fix-traceback-on-login

Ping the #fedora-apps channel or specific people to review and +1 your change. Note that small hotfixes are especially good for getting people who aren't core contributors to this code base to start becoming familiar with it whereas core committers might be a little bored with looking at them. If we have new contributors wanting to join, this is a good way to introduce them to the project.

hotfixes sometimes have a time element so remember that we deploy hotfixes in infrastructure independently of when the code is merged upstream. If you're confident of the fix, you may want to commit the hotfix to our production (or staging) deployments in infrastructure's puppet repo and create the hotfix ticket in trac while you're waiting for the hotfix to be reviewed.

# When you're done.. you can merge it from the CLI like this:
$ git flow hotfix finish my-new-feature

# If you want to do it on github, you need to both merge the pull request to
# the develop and the master branches.

Making a release

# Releases end up on the master branch
$ git flow release start 0.3.32
$ # make some edits, bump version numbers, commit, release to pypi.
$ git flow release finish 0.3.32
$ git push --all
$ git push --tags

Further Reading

You can also read this quick overview and a more in depth overview of what the various branches are.

Using Vanilla git

If you're more comfortable using vanilla git without the gitflow addon, the things to be aware of are:

  • new commits should go to the devel branch (the default on the repos we set up on github).
  • master is used to make releases from and should reflect what's in production (or in an rpm, ready to be pushed to production) at any given time.
  • hotfixes have their own branches that need to be merged into both master (for release) and devel.

Removing git branches

When a feature, release, etc has been merged or abandoned we can clean up the old branches. Removing them will make it easier for contributors (both new and old) to see what branches are currently important for the project. The commits that make up the branches will still be around if the changes were merged to devel. It's just that the nice way of referencing them as a separate branch will be gone.

If you merge on the github interface, there will be a button on the pull request "delete this branch". Pushing that will remove the branch.

From the git cli, you can do git push origin :$NAME_OF_FEATURE_BRANCH For instance, git push origin :feature/timeout-fix.

Ordinarily, when you use git push, git figures out where you want to push to (usually origin for the github server). It then pushes the current local branch to a branch of the same name on the origin. If you were explicit about this, the command line syntax would be: git push origin develop:develop. When we delete, we're using this explicit syntax but we're telling git that instead of a local branch to the origin, we want to push nothing (thus, having nothing on the left side of the colon).

Deployment

1: We have to make people aware when a new deployment means API breaks.

 * Be clear that the new deployment means API breaks in every call for
   testing.  Send announcements to infrastructure list and depending on the
   service to devel list.
 * Have a separate announcement besides the standard outage notification
   that says that an API breaking update is planned for $date
 * When we set a date for the new deployment, discuss it at least once in
   a weekly infrastructure meeting.
 * See also the solution in #3 below

2: It would be really nice for people to do more testing in stg.

 * Increase rube coverage.  rube does end-to-end testing so it's better at
   catching cross-app issues where API changes better than unittests which
   try to be small and self-contained
   - A flock session where everyone/dev in infra gets to write one rube
     test so we get to know the framework
 * Run rube daily
   - Could we run rube in an Xvfb on an infrastructure host?
 * Continue to work towards a complete replica of production in the stg
   environment.

3: "Mean time to repair is more important than mean time between failure."

  It seems like anytime there's a major update there's unexpected things that
  break.  Let's anticipate the unexpected happening.
 * Explicitly plan for everyone to spend their day firefighting when we
   make a major new deployment.  If you've already found all the places
   your code is affected and pre-ported it and the deployment goes smoothly
   then hey, you've got 6 extra working hours to shift back to doing other
   things.  If it's not smooth, then we've planned to have the attention of
   the right people for the unexpected difficulties that arise.
 * As part of this, we need to identify people outside of infrastructure
   that should also be ready for breakage.  Reach out to rel-eng, docs, qa,
   cvsadmins, etc if there's a chance that they will be affected.

4: Related to the FAS release: Buggy code happens. How can we make it

  happen less?
 * More unittests would be good however we know from experience with bodhi
   that unittests don't catch a lot of things that are changes in behaviour
   rather than true "bugs".  Unexpected API changes that cause people
   porting pain can be as simple as returning None instead of an empty list
   which causes a no-op iteration in running code to fail while the
   unittests survive because they're checking that "no results were
   returned".
  * Pingou has championed making API calls and WebUI calls into separate
    URL endpoints. I think that coding style makes it easier to control
    bugs related to updating the webui while trying to preserve the API so
    we probably want to move to that model as we move onto the next major
    version of our apps.
  * Not returning json-ified versions of internal data structures (like
    database tables) but instead parsing the results and returning
    a specific structure would also help divorce internal changes from
    external API.

What should we apply this to?

  • Probably can skip if:
 - Things that we don't think have API breaks
 - Things that are minor releases (hopefully these would correlate with not having API breaks :-)
 - Leaf services that are not essential to releasing Fedora.
   + ask, nuancier, elections, easyfix, badges, paste, nuancier
   + There's a lot of boderline cases too -- is fedocal essential enough
     to warrant being under this policy?  Since the wiki is used via its
     API should that fall under this as well?

Security Best Practices

  • Make yourself familiar with the top ten Web Application vulnerabilities and take measures against them.
  • Always use prepared statements for SQL queries or a framework's abstraction that does this
    • Only if no input from a user is added to a SQL query or if it is not possible to use prepared statements, build the query yourself. Then let someone review the query and add warnings about this to the source code
  • Use a template engine to build web pages that takes care of proper encoding of user input to avoid Cross-Site Scripting
  • Enable CSRF protection in your framework
    • Follow the REST model, do use GET requests to change state of the application, because GET requests might not be covered by CSRF protection
  • Always use HTTPS for web apps that allow to login
    • Check that the certificate is valid for the intended/published hostname
    • Do not load external JavaScript code
      • If you need to do this, do it via HTTPS
    • Protect authentication cookies with the flag Secure
    • If the service runs on a server where all web applications should only be accessed via HTTPS enable HTTP Strict Transport Security
  • Set the X-Frame-Options header unless your applications needs to be included in a frame
  • Whenever someone logs in, create a new session ID and authenticate only the new session ID
  • If someone logs out, invalidate the session ID on the server side