Revision as of 16:54, 24 August 2010

Some speed optimization ideas for yum

use Cython to compile one or more of the .py files to .c code and compile them into DSOs
use PyPy; would require building out a full PyPy stack: an alternative implementation of Python. Last time I looked a the generated .c code, I wasn't comfortable debugging the result (I didn't feel that debugging a crash in the result would be feasible at 3am)
use Unladen Swallow for Python 2: would require porting the US 2.6 stack to 2.7, and a separate Python stack
use Unladen Swallow for Python 3: wait until it gets merged (in Python 3.3); port yum to python 3

Using Cython seems to be the least invasive approach.

I've looked at the generated code and it seems debuggable; I'd be able to debug issues arising.

Notes on Cython

From upstream, on one simple example: "Simply compiling this in Cython merely gives a 35% speedup. This is better than nothing, but adding some static types can make a much larger difference."

The approach I'm currently thinking of is to simply compile the .py files with Cython, without using any of the non-Python extensions that Cython adds.

This would ensure that yum can continue to be usable without Cython: no non-standard language features.

It might perhaps be appropriate to add targeted optimizations to Cython, so that it does a better job when handling yum as input.

Going further, we could perhaps introduce pragma-style comments to yum sources e.g

  # cython: isinstance(foo, dict)

which Cython could take as an optimization hint to generate specialized code for handling the case when foo is a PyDictObject (might be more productive to support exact type matches here, though).

Example of generated code

See depsolve.html. You can see the generated .c code by clicking on the yellow-colored .py code. This was generated using the "-a" option to Cython. Note that this was generated using a development copy of Cython, which has the support for some lambdas.

Note in particular that the generated C code has comments throughout indicating which line of .py code (in context) it corresponds to: this is important for my own comfort level, in feeling that this is supportable.

Optimization potential

Why would it be faster to simply compile with Cython?

it avoids the bytecode dispatch loop: operations by the module are "unrolled" directly into libpython API calls, and this can be individually optimized by the C compiler: error handling can be marked as unlikely, and the unrolled .c code should give us better CPU branch prediction; the result should also be more directly amenable to further optimization work: C-level profiling tools such as oprofile would indicate specifically where we're spending in the .py code.
it avoids some stack manipulation: temporaries are stored directly as locals within the .c code, and twiddling of ob_refcnt fields can be reduced

Using Cython "bakes in" some values for builtins: calls to the builtin "len" are turned directly into calls to PyObject_Length, rather than doublechecking each time what the value of __builtins__.len is, and calling it. So this is a semantic difference from regular Python, and some monkey-patching is ruled out, but I think it's a reasonable optimization.

Pessimization risks

looking at _sort_reqs(), the line

mapper = {'EQ' : 1, 'LT' : 2, 'LE' : 3, 'GT' : 4, 'GE' : 5,

uses a dictionary which is never modified. The Cython-generated C code is rebuilding this dictionary every time the function is called. As it turns out, Python 2.6's peephole optimizer doesn't optimize this either:

>>> dis(depsolve.Depsolve._sort_reqs)
811           0 BUILD_MAP                6
              3 LOAD_CONST               1 (1)
              6 LOAD_CONST               2 ('EQ')
              9 STORE_MAP           
             10 LOAD_CONST               3 (2)
             13 LOAD_CONST               4 ('LT')
             16 STORE_MAP           
             17 LOAD_CONST               5 (3)
             20 LOAD_CONST               6 ('LE')
             23 STORE_MAP           
             24 LOAD_CONST               7 (4)
             27 LOAD_CONST               8 ('GT')
             30 STORE_MAP           
             31 LOAD_CONST               9 (5)
             34 LOAD_CONST              10 ('GE')
             37 STORE_MAP

TODO:

measure the impact of using a Cython .c build of depsolve.py
- try building with Cython

Search

DaveMalcolm/YumOptimizations: Difference between revisions