(initial page creation) |
(flesh out the page) |
||
Line 8: | Line 8: | ||
I've looked at the generated code and it seems debuggable; I'd be able to debug issues arising. | I've looked at the generated code and it seems debuggable; I'd be able to debug issues arising. | ||
= Example of generated code = | |||
See [http://dmalcolm.fedorapeople.org/python-packaging/depsolve.html depsolve.html]. You can see the generated .c code by clicking on the yellow-colored .py code. This was generated using the "-a" option to Cython. Note that this was generated using a development copy of Cython. | |||
= Notes on Cython = | |||
In theory this avoids both bytecode dispatch and stack manipulation, and should give us better CPU branch prediction; the result should also be more directly amenable to further optimization work: C-level profiling tools such as oprofile would indicate specifically where we're spending in the .py code. | |||
From upstream, on one simple example: "Simply compiling this in Cython merely gives a 35% speedup. This is better than nothing, but adding some static types can make a much larger difference." | |||
Using Cython "bakes in" some values for builtins: calls to the builtin "len" are turned directly into calls to PyObject_Length, rather than doublechecking each time what the value of __builtins__.len is, and calling it. So this is a semantic difference from regular Python, and some monkey-patching is ruled out, but I think it's a reasonable optimization. | |||
TODO: | |||
* measure the impact of using a Cython .c build of depsolve.py | |||
** try building with Cython |
Revision as of 16:35, 24 August 2010
Some speed optimization ideas for yum
- use Cython to compile one or more of the .py files to .c code and compile them into DSOs
- use PyPy; would require building out a full PyPy stack: an alternative implementation of Python. Last time I looked a the generated .c code, I wasn't comfortable debugging the result (I didn't feel that debugging a crash in the result would be feasible at 3am)
- use Unladen Swallow for Python 2: would require porting the US 2.6 stack to 2.7, and a separate Python stack
- use Unladen Swallow for Python 3: wait until it gets merged (in Python 3.3); port yum to python 3
Using Cython seems to be the least invasive approach.
I've looked at the generated code and it seems debuggable; I'd be able to debug issues arising.
Example of generated code
See depsolve.html. You can see the generated .c code by clicking on the yellow-colored .py code. This was generated using the "-a" option to Cython. Note that this was generated using a development copy of Cython.
Notes on Cython
In theory this avoids both bytecode dispatch and stack manipulation, and should give us better CPU branch prediction; the result should also be more directly amenable to further optimization work: C-level profiling tools such as oprofile would indicate specifically where we're spending in the .py code.
From upstream, on one simple example: "Simply compiling this in Cython merely gives a 35% speedup. This is better than nothing, but adding some static types can make a much larger difference."
Using Cython "bakes in" some values for builtins: calls to the builtin "len" are turned directly into calls to PyObject_Length, rather than doublechecking each time what the value of __builtins__.len is, and calling it. So this is a semantic difference from regular Python, and some monkey-patching is ruled out, but I think it's a reasonable optimization.
TODO:
- measure the impact of using a Cython .c build of depsolve.py
- try building with Cython