From Fedora Project Wiki
(→‎Handling program crashes in Fedora: whoops - fix client/server to be sub-sections of Summary)
m (→‎Comments: bad hr!)
 
(18 intermediate revisions by 4 users not shown)
Line 4: Line 4:
== Summary ==
== Summary ==


As of about Fedora 6, packages no longer include the "debuginfo" data necessary for local crash handlers to get a useful stack trace.  See:
# A crash handler which notifies the user when a program crashes and allows them to submit a report to the Fedora developers, and
http://fedoraproject.org/wiki/Packaging/Debuginfo and http://fedoraproject.org/wiki/StackTraces
# A server for collecting crash reports and mining useful data from them.
 
What we want is a system that gets information about the crash to developers in a form with complete stack trace data.  There are several
options for this.  First, the Apport system developed by Ubuntu.  See the old [[Features/Apport]] feature page.
 
[http://www.redhat.com/archives/rhl-devel-list/2008-June/msg01250.html A discussion] on fedora-devel-list came to the conclusion that the Apport system as designed won't work for Fedora because it sends the complete core dump over the network.
 
Another option (currently used by GNOME upstream) is [http://code.google.com/p/google-breakpad/ Breakpad] and [http://code.google.com/p/socorro/ Socorro].
 
The plan has two parts:
 
=== Client ===
* Create a program to catch crashing programs and write out a crash report / stack trace
** This should be able to produce Breakpad reports, among other output formats
* Notify the user when a program crashes, and allow them to
** Save the crash data and create a report
** Ignore further crashes of that program
** Ignore all further crashes
 
=== Server ===
* Get a Socorro server running in Fedora's infrastructure
* Point the default breakpad configuration to it (easy)


== Owner ==
== Owner ==
Line 32: Line 11:


== Current status ==
== Current status ==
* Targeted release:
* Targeted release: [[Releases/{{FedoraVersion||next}} | {{FedoraVersion|long|next}} ]]
* Last modified: [[Date(2008-06-09)]] 
* Last modified: {{date|2008-12-01}}
* Percent complete: 0%
* Percent complete: 0%


== Usage cases / rationale ==
== Benefit to Fedora ==
* See summary
 
By providing an automated mechanism for tracking application crashes, we will be able to:
* see bugs earlier, and fix them earlier
* see what bugs are hit most
* get usage and crash data from people who are unable or unwilling to interact with bugzilla


== Benefit to Fedora ==
Better crash data leads to more crash fixes, which leads to a higher-quality distribution.
* See summary


== Scope ==
== Scope ==


Requires running a new server in the Fedora infrastructure.   
As of about Fedora 6, packages no longer include the "debuginfo" data necessary for local crash handlers to get a useful stack traceSee [[Packaging/Debuginfo]] and [[StackTraces]] for details.


== Testing ==
What we want is a system that gets information about the crash to developers in a form with complete stack trace data.


Cause a program to crash and get a report submitted to Socorro.  Test that socorro correctly retraces it and gets enough information for a developer to identify the problem.
The plan has two major parts - a crash handler which runs on the client, and a server for submitting/aggregating crash reports.


== Dependencies ==
=== Client ===
==== crash-handler ====
A program to catch crashing programs and write out a crash report / stack trace.
* Catching the crash is trivial using the kernel's core pattern piping support, e.g.:
** <code>echo '|/usr/sbin/crash-handler --pid %p --rlimit %c' > /proc/sys/kernel/core_pattern</code>
* Write crashes to a (configurable) standard location, such as <code>/var/crash</code>
* This crash handler should be able to produce [http://code.google.com/p/google-breakpad/wiki/ClientDesign Breakpad] minidumps
** The same output format is used by GNOME (in {{package|bug-buddy}}) and {{package|firefox}}.


# Need to package the socorro server
==== crash-watcher ====
A small daemon to:
* watch the crash location for new dumps
* clean up old/unneeded dumps, based on user preferences (maximum age/disk space/etc.)


== Details ==
When a new dump is found, send notifications to the user allowing them to:
* Send a report (''iff'' the binary was provided by Fedora)
** Optional "Always send report automatically" checkbox
* Ignore further crashes of that program
* Ignore all further crashes


==== crash-submitter ====
Sends minidumps to the server to be retraced. {{package|bug-buddy}} might work for this.
* Submit report to Socorro server (or similar)
** Configured to use Fedora server by default, but allow user to set their own server
*** Future work: allow per-package overrides (so GNOME dumps go to GNOME, etc)
* Save UUID for that report somewhere, as with {{package|kerneloops}}


== Optional ==
=== Server ===
* Get a Socorro server running in Fedora's infrastructure
* Point the default breakpad configuration to it (easy)
 
=== Open questions ===
* Do symbol resolution on the client or the server?
* How to do symbol resolution? FUSE? littlebottom?
* How much backtracing can be done without debuginfo installed at the client?
* Tie it to smolt profiles?
* Run a separate kerneloops server?
* Why not use breakpad?
** Breakpad is a library - we don't want LD_PRELOAD everywhere to magically link the library in when needed.
 
== How To Test ==
 
Cause a program to crash and get a report submitted to Socorro.  Test that socorro correctly retraces it and gets enough information for a developer to identify the problem.


== User Experience ==
== User Experience ==


A program crashes.  We display a dialog or notification that the program has crashed and save a useful stack trace to a well-known location.
A program crashes.  We display a dialog or notification that the program has crashed and save a useful stack trace to a well-known location.
== Dependencies ==


== Contingency plan ==
== Contingency plan ==
 
# Don't enable the agent
If this plan fails for some unforseen reason, we can reinvestigate other options such as Apport.
# Don't ship the agent
# Reinvestigate other options such as Apport.


== Documentation ==
== Documentation ==


None needed.
Some simple documentation on how to enable and disable the crash reporting, and how to make it happen automatically.


== Release Notes ==
== Release Notes ==


We will want to explain to developers of Free programs how to find crash dumps.
(We will want to explain to developers of Free programs how to find crash dumps.)


== Comments ==
== Comments ==


----
* See [[Talk:Features/CrashHandling]]
* New development continues here [[Features/CrashCatcher]]


[[Category:FeaturePageIncomplete]]
[[Category:FeaturePageIncomplete]]

Latest revision as of 21:48, 26 January 2009

Handling program crashes in Fedora

Summary

  1. A crash handler which notifies the user when a program crashes and allows them to submit a report to the Fedora developers, and
  2. A server for collecting crash reports and mining useful data from them.

Owner

  • Name: [none currently]

Current status

Benefit to Fedora

By providing an automated mechanism for tracking application crashes, we will be able to:

  • see bugs earlier, and fix them earlier
  • see what bugs are hit most
  • get usage and crash data from people who are unable or unwilling to interact with bugzilla

Better crash data leads to more crash fixes, which leads to a higher-quality distribution.

Scope

As of about Fedora 6, packages no longer include the "debuginfo" data necessary for local crash handlers to get a useful stack trace. See Packaging/Debuginfo and StackTraces for details.

What we want is a system that gets information about the crash to developers in a form with complete stack trace data.

The plan has two major parts - a crash handler which runs on the client, and a server for submitting/aggregating crash reports.

Client

crash-handler

A program to catch crashing programs and write out a crash report / stack trace.

  • Catching the crash is trivial using the kernel's core pattern piping support, e.g.:
    • echo '|/usr/sbin/crash-handler --pid %p --rlimit %c' > /proc/sys/kernel/core_pattern
  • Write crashes to a (configurable) standard location, such as /var/crash
  • This crash handler should be able to produce Breakpad minidumps

crash-watcher

A small daemon to:

  • watch the crash location for new dumps
  • clean up old/unneeded dumps, based on user preferences (maximum age/disk space/etc.)

When a new dump is found, send notifications to the user allowing them to:

  • Send a report (iff the binary was provided by Fedora)
    • Optional "Always send report automatically" checkbox
  • Ignore further crashes of that program
  • Ignore all further crashes

crash-submitter

Sends minidumps to the server to be retraced. bug-buddy might work for this.

  • Submit report to Socorro server (or similar)
    • Configured to use Fedora server by default, but allow user to set their own server
      • Future work: allow per-package overrides (so GNOME dumps go to GNOME, etc)
  • Save UUID for that report somewhere, as with kerneloops

Server

  • Get a Socorro server running in Fedora's infrastructure
  • Point the default breakpad configuration to it (easy)

Open questions

  • Do symbol resolution on the client or the server?
  • How to do symbol resolution? FUSE? littlebottom?
  • How much backtracing can be done without debuginfo installed at the client?
  • Tie it to smolt profiles?
  • Run a separate kerneloops server?
  • Why not use breakpad?
    • Breakpad is a library - we don't want LD_PRELOAD everywhere to magically link the library in when needed.

How To Test

Cause a program to crash and get a report submitted to Socorro. Test that socorro correctly retraces it and gets enough information for a developer to identify the problem.

User Experience

A program crashes. We display a dialog or notification that the program has crashed and save a useful stack trace to a well-known location.

Dependencies

Contingency plan

  1. Don't enable the agent
  2. Don't ship the agent
  3. Reinvestigate other options such as Apport.

Documentation

Some simple documentation on how to enable and disable the crash reporting, and how to make it happen automatically.

Release Notes

(We will want to explain to developers of Free programs how to find crash dumps.)

Comments