Systemtap Static Probes
WORK IN PROGRESS: Currently a copy of Features/SystemtapStaticProbes, will turn into a new feature page for f14.
Summary
Systemtap allows event tracing of programs when they have static probes inserted. This allows for tracing specifics of an application on a higher level that is meaningful to the application user so they don't have to know the exact source code details for tracing what is happening.
Owner
- Name: Mark Wielaard
- email: mjw@redhat.com
Current status
- Targeted release: Fedora 13
- Last updated: 11 Feb 2009
- Percentage of completion: 75%, in a testable state for java, python and postresql, needs work for other targets (could reduce scope to just those ready).
- systemtap-sdt-devel 1.1 is now available.
- java (since 1:1.6.0-21.b16 and since 34.b17 also jni and jstack support), postgresql (since 8.3.6-4) and python (2.6.4-19) have had static probes enabled.
- Tracking bug: https://bugzilla.redhat.com/show_bug.cgi?id=546295
- See under scope for individual package status.
TODO
- Some experiments showed some greater than expected impact on performance in the dormant case. So for each package identified we need to do some benchmarking to make sure there is zero overhead in to normal case. (Add any relevants benchmarks to the scopes section under the relevant package please).
- elfutils cfi reading bugs https://bugzilla.redhat.com/show_bug.cgi?id=563528 - Fixed upstream
- semantic error: No cfa_ops supplied, but needed by DW_OP_call_frame_cfa https://bugzilla.redhat.com/show_bug.cgi?id=556944 - Fixed upstream
DONE
- Identified deficiencies upstream:
* [PR10013] support ENABLED sdt probe macro - Fixed in 1.1 * [PR10601] user-space deref/registers in loc2c (i386 mainly, but could affect x86_64 and other arches also) - Fixed in 1.1
Detailed Description
By packaging a new version of systemtap, that enables programs that already have static dtrace probe markers in their sources and by making those packages build depend on the new systemtap-sdt-devel package and recompiling them with probe points enabled, users of those packages will be able to trace any high level events that these packages provide.
Benefit to Fedora
It will be easier for developers and users to observe what is really happening on their system on a higher (application) level.
Scope
- Work with upstream to identify any issues with the new capabilities while we activate probes in packages.
- Package new version of Systemtap (including new subpackage systemtap-sdt-devel).
- Identify packages that already include static user probes (see below)
- Work with package maintainer to enable them in the Fedora build spec file.
- Add documentation on enabled probes and how to use them with a systemtap tapset.
Currently identified packages:
postgresql
Tracking bug: https://bugzilla.redhat.com/show_bug.cgi?id=488941 Already able to build something that works with current rpm. Documentation: upstream docs Example: example trace Screencast: video presentation
xorg-x11-server
- Need tweaks to systemtap to gen proper header from .d file (should be fixed with systemtap 1.1 package).
- Add patch from upstream docs
java-1.6.0-openjdk
Tracking bug: https://bugzilla.redhat.com/show_bug.cgi?id=498109
- upstream docs
- Static probes ready, plus hotspot tapset, jni tapset and java backtraces all done..
tcl
Tracking bug: https://bugzilla.redhat.com/show_bug.cgi?id=489017
- tcl-8.4.16+: Has a single generic/tclDTrace.d file.
- upstream docs
- Needs implementation of probe_ENABLED(), which has been implemented. http://sourceware.org/bugzilla/show_bug.cgi?id=10013
Firefox
Tracking bug: https://bugzilla.redhat.com/show_bug.cgi?id=496113
- firefox-3.1-0.6 (upstream docs) - rajan
- Profile Firefox using SystemTap on DTrace probes: ProfilingFirefox
mysql
- mysql 6.0.8: really new alpha version from mysql.com has probes. However, the version in fedora 5.0.67 doesn't. A backport would be required. Won't be in next fedora
apachetop/httpd
- apachetop-0.12.6 (newer httpd have patches directly [1] - note needs some build tweaking)
Perl
perl-5.10.0: implemented as an out-of-tree patch to the core (5.10.1 has some, see INSTALL)
Python
We're tracking our Python work in our downstream bugzilla as bug 545179.
- DONE: Our Python 2 and Python 3 builds contain: (from python-2.6.4-19.fc13 and python3-3.1.1-25.fc13 onwards)
- a tapset providing these probepoints:
python.function.entry
python.function.return
- built with systemtap patches that add the static markers that implement the above to the libpython2.6 and libpython3.1 shared libraries
- contains an example of usage added to docs in the python-libs and python3-libs subpackages, logging all Python function calls/return hierarchically across the whole system or for one process
- dmalcolm has tested the example script on a rawhide box and verified that it works on i686 for both Python runtimes.
- dmalcolm has done initial testing of the performance of the python 2 patch using the Unladen Swallow benchmark suite, initial indications suggest the patch we're using imposes negligible performance cost for the case when the probe points are compiled in but aren't in active use
- a tapset providing these probepoints:
- TODO:
- Double-check generated machine code
- Test with and without probes, on both architectures, with both python 2 and python 3, and with multilib installs on 64-bit
- More documentation
- Send this work upstream (we have taken an out-of-tree patch to the core adding DTrace static markers (upstream RFE 4111), reworked the patch to enable it to work with SystemTap, fixed a performance issue, added a tapset to make the markers easy to use, and written an example script that uses the resulting probe points, and ported the patch to python 3).
- Address error handling within the Python 3 probe.
- Ideas for additional probe points:
- function calls/returns (this is what the dtrace probe has)
- GIL events: instrument the raw function to claim/release of the lock, then capture the times at which it happens, then render stats
- threads starting/stopping
- bytecode execution metrics: e.g. trace individual bytecodes; how often does LOAD_GLOBAL get invoked
- exceptions being thrown
- exceptions being handled (e.g. for tracking down exactly where code is "swallowing" an error)
- unhandled exceptions
- arenas being claimed/freed
- dictionaries switching to inefficient form: http://lewk.org/blog/python-dictionary-optimizations
- _warnings.c: do_warn() (e.g. whole-system python3 warnings for all python 2 running on your system)
- py-level backtraces
See also Mark's blog post about our initial work on Python/SystemTap at FUDCon Toronto
PHP
- php-5.2.8: implemented as an add-on module (shared library listed in /etc/php.ini) that intercepts internal php interpreter function pointers and wraps those calls with markers
Ruby
- ruby-1.8.6-p287: similar to php, but richer & far more complicated
See http://ruby-dtrace.rubyforge.org/
Haskell
- ghc very recently got some static probe support see http://hackage.haskell.org/trac/ghc/wiki/DTrace (not tried to build yet).
- darcs patch
Notes
It seems as if several of the above were dtrace-instrumented in code that was never merged into the upstream versions of the package, but instead represented as run-time add-ons or private patches for Solaris distributions. Disappointing, but perhaps we can do better and engage the respective upstream teams. This will of course take time and panache.
At least the patches tend to be very small so we have some freedom to choose between approaches (adding STAP_PROBE/whatever hooks directly to the core upstream code; or fedora local patches; or add-on shared libraries like for php/httpd).
Another approach worth considering is adding tapsets that map process.mark() events to process.function/statement() to approximate the dtrace out-of-tree patches.
How To Test
Whether systemtap and static markers are working in general can be tested by installing systemtap, kernel-debuginfo and the systemtap-testsuite. Running sudo make installcheck in /usr/share/systemtap/testsuite
When applications get static markers enabled we should add them to a testing page listing:
- Package install instructions.
- Setup and sample run of the application
- A reference to the probe names.
- And an simple example stap invocation listing markers that can be enabled.
Question: Is there a convention/template for adding such test pages for test days?
Answer: https://fedoraproject.org/wiki/QA/Test_Days/Create
User Experience
For packages that have static probes enabled users will be able to trace high-level events, like for example database transactions, through stap.
Dependencies
- A new version of systemtap with the systemtap-sdt-devel subpackage.
- Any package wishing to expose existing probes in its (upstream) sources depending on systemtap-sdt-devel and adding an --enable-dtrace or equivalent to its spec file.
- Not required for this feature, but since systemtap uses debuginfo, this feature is enhanced by the Debug Info Revamp Feature.
Contingency Plan
Even if all the tracing will not work, packages that are converted to provide static probes will not be impacted since the probe points have (near) zero overhead, so in the worse case some packages were recompiled to enable the feature, but users will still not be able to use it.
Documentation
The upstream wiki is the best description for now http://sourceware.org/systemtap/wiki/UsingStaticUserMarkers the systemtap list has an example on converting a package http://sourceware.org/ml/systemtap/2009-q1/msg00140.html
While working on this feature this section will be expanded to list packages that have probe points enabled and pointers to (upstream) package documentation on the probe names and semantics like for postgresql http://www.postgresql.org/docs/8.2/static/dynamic-trace.html
Python
The following Systemtap probe points have been added to Fedora 13's Python 2 and Python 3 packages:
Probe point | Parameters | Overview | Example of usage |
---|---|---|---|
python.function.entry |
|
Indicates that execution of a Python function has begun |
stap \ -e'probe python.function.entry {log(filename);}' \ -c yum help |
python.function.return |
|
Indicates that the Python runtime has returned from a function | Probing modules visited as the python runtime starts up: stap \ -e'probe python.function.return {log(filename);}' \ -c "python -c 'pass'" |
A sample script that uses these probe points to show the hierarchy of function calls and returns within a python process (or across the whole system) has been added to the python-libs
and python3-libs
subpackages.
Here's an example of running it (in verbose mode) to trace what happens during the invocation of a python script (the yum
tool, as it happens).
# stap -v /usr/share/doc/python-libs-2.6.4/systemtap-example.stp -c yum help Pass 1: parsed user script and 66 library script(s) using 20224virt/12248res/2040shr kb, in 150usr/10sys/160real ms. Pass 2: analyzed script: 2 probe(s), 14 function(s), 2 embed(s), 2 global(s) using 25184virt/14572res/3380shr kb, in 20usr/0sys/14real ms. Pass 3: using cached /root/.systemtap/cache/5a/stap_5a80297603ac4434b77b22e6f4127f00_5903.c Pass 4: using cached /root/.systemtap/cache/5a/stap_5a80297603ac4434b77b22e6f4127f00_5903.ko Pass 5: starting run. 0 yum(23287): => <module> in /usr/lib/python2.6/site.py:59 439 yum(23287): => <module> in /usr/lib/python2.6/os.py:22 1021 yum(23287): => <module> in /usr/lib/python2.6/posixpath.py:11 1146 yum(23287): => <module> in /usr/lib/python2.6/stat.py:4 1163 yum(23287): <= <module> in /usr/lib/python2.6/stat.py:94 1272 yum(23287): => <module> in /usr/lib/python2.6/genericpath.py:5 1292 yum(23287): <= <module> in /usr/lib/python2.6/genericpath.py:85 1483 yum(23287): => <module> in /usr/lib/python2.6/warnings.py:1 1677 yum(23287): => <module> in /usr/lib/python2.6/linecache.py:6 1698 yum(23287): <= <module> in /usr/lib/python2.6/linecache.py:68 (etc)
The Python 3 version of the probe point sends the strings back to SystemTap in UTF-8 encoding. For example, if you create a python script with an "interesting" name:
# cat print("Yaarrr!") > ☠.py # cat ☠.py print("Yarrrrr!") # stap -v /usr/share/doc/python3-libs-3.1.1/systemtap-example.stp -c "python3 ☠.py" (copious output snipped) 0 python3(28262): => <module> in ☠.py:1 22 python3(28262): <= <module> in ☠.py:1
Release Notes
Systemtap has been extended to support user space tracing, and in particular to support static (dtrace compatible) markers enabled in various programs in Fedora 12. This enables users, developers and administrators a high level overview of what is going on with their system or deep down in a specific program or subsystem.
Systemtap comes with a tutorial, a language reference manual, a tapsets reference and an examples directory under /usr/share/doc/systemtap-?.?/
- TODO: Should have a list of which packages were enabled with markers when finished.
Comments and Discussion