Revision as of 16:35, 24 August 2010

Some speed optimization ideas for yum

use Cython to compile one or more of the .py files to .c code and compile them into DSOs
use PyPy; would require building out a full PyPy stack: an alternative implementation of Python. Last time I looked a the generated .c code, I wasn't comfortable debugging the result (I didn't feel that debugging a crash in the result would be feasible at 3am)
use Unladen Swallow for Python 2: would require porting the US 2.6 stack to 2.7, and a separate Python stack
use Unladen Swallow for Python 3: wait until it gets merged (in Python 3.3); port yum to python 3

Using Cython seems to be the least invasive approach.

I've looked at the generated code and it seems debuggable; I'd be able to debug issues arising.

Example of generated code

See depsolve.html. You can see the generated .c code by clicking on the yellow-colored .py code. This was generated using the "-a" option to Cython. Note that this was generated using a development copy of Cython.

Notes on Cython

In theory this avoids both bytecode dispatch and stack manipulation, and should give us better CPU branch prediction; the result should also be more directly amenable to further optimization work: C-level profiling tools such as oprofile would indicate specifically where we're spending in the .py code.

From upstream, on one simple example: "Simply compiling this in Cython merely gives a 35% speedup. This is better than nothing, but adding some static types can make a much larger difference."

Using Cython "bakes in" some values for builtins: calls to the builtin "len" are turned directly into calls to PyObject_Length, rather than doublechecking each time what the value of __builtins__.len is, and calling it. So this is a semantic difference from regular Python, and some monkey-patching is ruled out, but I think it's a reasonable optimization.

TODO:

measure the impact of using a Cython .c build of depsolve.py
- try building with Cython

@@ Line 8: / Line 8: @@
 I've looked at the generated code and it seems debuggable; I'd be able to debug issues arising.
+= Example of generated code =
+See [http://dmalcolm.fedorapeople.org/python-packaging/depsolve.html depsolve.html].  You can see the generated .c code by clicking on the yellow-colored .py code.  This was generated using the "-a" option to Cython.   Note that this was generated using a development copy of Cython.
+= Notes on Cython =
+In theory this avoids both bytecode dispatch and stack manipulation, and should give us better CPU branch prediction; the result should also be more directly amenable to further optimization work: C-level profiling tools such as oprofile would indicate specifically where we're spending in the .py code.
+From upstream, on one simple example: "Simply compiling this in Cython merely gives a 35% speedup. This is better than nothing, but adding some static types can make a much larger difference."
+Using Cython "bakes in" some values for builtins: calls to the builtin "len" are turned directly into calls to PyObject_Length, rather than doublechecking each time what the value of __builtins__.len is, and calling it.  So this is a semantic difference from regular Python, and some monkey-patching is ruled out, but I think it's a reasonable optimization.
+TODO:
+* measure the impact of using a Cython .c build of depsolve.py
+** try building with Cython

Search

DaveMalcolm/YumOptimizations: Difference between revisions

Revision as of 16:35, 24 August 2010

Example of generated code

Notes on Cython