From Fedora Project Wiki

Revision as of 09:44, 17 July 2009 by Mjw (talk | contribs) (External links don't use double brackets.)


Systemtap Tracing Refresh

Summary

New and improved systemtap with much better documentation, examples and tools. Updated to take advantage of modern gcc debuginfo (dwarf) output, kernel tracepoints. And providing a static user space marker implementation to be used by developers wanting to expose high level tracing events in their applications.

Owner

  • email: mjw@redhat.com

Current status

  • Targeted release: Fedora 12
  • Last updated: July 17 2009
  • Percentage of completion: 75%

Detailed Description

By packaging a new version of systemtap, that is tuned for updated gcc debuginfo output, kernel tracepoints, better examples, tools and development extensions that enable programmers to embed static probe markers in their sources Fedora users will be able to have much better observability of their whole system.

Benefit to Fedora

It will be easier for developers and users to observe what is really happening on their system.

Scope

Most of this work has been done upstream and by coordinating with the gcc and kernel maintainers for better debuginfo output and more tracepoints. Specific improvements that will be delivered through this feature are:

  • Better support for F12 GCC (mainly the new, improved, more compact and more accurate) debuginfo support. This is basically done now except for some bug fixing here and there. And needs coordination with elfutils for new release.
  • Better support for F12 Kernel (2.6.31). The test results already look pretty OK. We are still bug hunting some stuff, but I don't believe anything really nasty is blocking.
  • Better kernel tracepoint support, with thanks to Will lots more documentation on the various tracepoints in the kernel.
  • Support for the unified kernel trace buffer.
  • Lots of pases are faster now. Especially helping those wanting to get a quick list of probes that can be set.
  • Simple GUI client to visualize some standard tracing/profiling issues. Is it enabled in the standard rpm spec package build already?
  • Eclipse Systemtap plugin work.
  • User space backtracing and more accurate kernel dwarf unwinder. We did provide that in an update for F11, but it is more robust now. Although there is some work to be done to make it really easy and reliable/intuitive to use.
  • Dwarfless syscall probing for kernel-debuginfo-less setups.
  • Module signing and a refresh of the systemtap-client-server setup (Dave is writing a whitepaper about it).
  • User space probing is much more robust and accurate (especially in the face of prelinking, separate debuginfo and 32-on-64 executable quirks). Support for symbol aliases, etc. Although some more testing and bugfixing is needed (especially versus c++ stuff). (Mark is writing a whitepaper about it)
  • More tapset functions, more examples, more documentation, some stap language constructs extensions...

This feature will also be the basis for adding more static probing to fedora packages in general. Some packages already have those enabled (java, postgresql) and we will coordinate with those maintainers to make those probes work seamlessly with the other systemtap improvements. Integration of more static probes to other packages is outside the scope of this feature though. That will be done through the Systemtap Static Probes Feature.

How To Test

Whether systemtap and the kernel or a user space application are working in general can be tested by installing systemtap, and the kernel-debuginfo and/or the application debuginfo. There is also the systemtap-testsuite package. Installing that and running sudo make installcheck in /usr/share/systemtap/testsuite gives an overview of how well tracing is working in general on the system.

TODO Add specific examples of interesting traces of kernel and apllications and them to a testing page listing:

  • Package install instructions.
  • Setup and sample run of the application
  • A reference to the probes and systemtap tapset functions used.
  • And an simple example stap invocation listing markers that can be enabled.
  • Doing the same through the simple gui and/or eclipse plugin.

Question: Is there a convention/template for adding such test pages for test days?

User Experience

When installing debuginfo for packages users will be able to trace on a low level what those applications (and/or the kernel) is doing.

Dependencies

Needs some coordination with gcc (to sync on debuginfo improvements), elfutils (for some new features taking advantage of the gcc debuginfo improvements) and kernel (for new tracepoints included). All this is being done already upstream, for fedora we just need to make sure the latest versions are packaged. For packages that already have static markers enabled (java and postgresql) some testing of the results between package updates will be necessary to make sure the user experience is as smooth as can be.

Contingency Plan

Some of the features listed in the scope might not be fully completed, but that just means less functionality for the end user to observe certain behaviour is limited. Except for the risk that systemtap works less well than desirable there is no impact on other packages.

Documentation

  • TODO expand (ref whitepapers?)

The upstream website has lots of documentation and examples.

Release Notes

  • TODO expand with more specifics

Systemtap has been extended to better support user space tracing, kernel tracepoints, take advantage of modern gcc debuginfo (dwarf) output and providing a static user space marker implementation to be used by developers wanting to expose high level tracing events in their applications. This enables users, developers and administrators a low level overview of what is going on with their kernel or deep down in a specific program or subsystem.

Systemtap comes with a tutorial, a language reference manual, a tapsets reference and an examples directory under /usr/share/doc/systemtap-?.?/

Comments and Discussion