From Fedora Project Wiki
(initial page creation)
 
(flesh out the page)
Line 8: Line 8:


I've looked at the generated code and it seems debuggable; I'd be able to debug issues arising.
I've looked at the generated code and it seems debuggable; I'd be able to debug issues arising.
= Example of generated code =
See [http://dmalcolm.fedorapeople.org/python-packaging/depsolve.html depsolve.html].  You can see the generated .c code by clicking on the yellow-colored .py code.  This was generated using the "-a" option to Cython.  Note that this was generated using a development copy of Cython.
= Notes on Cython =
In theory this avoids both bytecode dispatch and stack manipulation, and should give us better CPU branch prediction; the result should also be more directly amenable to further optimization work: C-level profiling tools such as oprofile would indicate specifically where we're spending in the .py code.
From upstream, on one simple example: "Simply compiling this in Cython merely gives a 35% speedup. This is better than nothing, but adding some static types can make a much larger difference."
Using Cython "bakes in" some values for builtins: calls to the builtin "len" are turned directly into calls to PyObject_Length, rather than doublechecking each time what the value of __builtins__.len is, and calling it.  So this is a semantic difference from regular Python, and some monkey-patching is ruled out, but I think it's a reasonable optimization.
TODO:
* measure the impact of using a Cython .c build of depsolve.py
** try building with Cython

Revision as of 16:35, 24 August 2010

Some speed optimization ideas for yum

  • use Cython to compile one or more of the .py files to .c code and compile them into DSOs
  • use PyPy; would require building out a full PyPy stack: an alternative implementation of Python. Last time I looked a the generated .c code, I wasn't comfortable debugging the result (I didn't feel that debugging a crash in the result would be feasible at 3am)
  • use Unladen Swallow for Python 2: would require porting the US 2.6 stack to 2.7, and a separate Python stack
  • use Unladen Swallow for Python 3: wait until it gets merged (in Python 3.3); port yum to python 3

Using Cython seems to be the least invasive approach.

I've looked at the generated code and it seems debuggable; I'd be able to debug issues arising.

Example of generated code

See depsolve.html. You can see the generated .c code by clicking on the yellow-colored .py code. This was generated using the "-a" option to Cython. Note that this was generated using a development copy of Cython.

Notes on Cython

In theory this avoids both bytecode dispatch and stack manipulation, and should give us better CPU branch prediction; the result should also be more directly amenable to further optimization work: C-level profiling tools such as oprofile would indicate specifically where we're spending in the .py code.

From upstream, on one simple example: "Simply compiling this in Cython merely gives a 35% speedup. This is better than nothing, but adding some static types can make a much larger difference."

Using Cython "bakes in" some values for builtins: calls to the builtin "len" are turned directly into calls to PyObject_Length, rather than doublechecking each time what the value of __builtins__.len is, and calling it. So this is a semantic difference from regular Python, and some monkey-patching is ruled out, but I think it's a reasonable optimization.

TODO:

  • measure the impact of using a Cython .c build of depsolve.py
    • try building with Cython