No edit summary |
(→Current status: updated last updated date per wiki history) |
||
(23 intermediate revisions by 3 users not shown) | |||
Line 4: | Line 4: | ||
<!-- The actual name of your feature page should look something like: Features/YourFeatureName. This keeps all features in the same namespace --> | <!-- The actual name of your feature page should look something like: Features/YourFeatureName. This keeps all features in the same namespace --> | ||
= | = ABRT Backtrace Deduplication Service = | ||
== Summary == | == Summary == | ||
Backtrace deduplication | Backtrace deduplication service solves the problem of many duplicate crash reports being submitted by ABRT to Red Hat Bugzilla. It helps ABRT users to find duplicate reports before filing a new bug, and it helps package maintainers to triage/reassign/merge already reported bugs. | ||
== Owner == | == Owner == | ||
Line 20: | Line 18: | ||
* Name: [[User:mlichvar|Miroslav Lichvar]] | * Name: [[User:mlichvar|Miroslav Lichvar]] | ||
* Email: mlichvar at redhat.com | * Email: mlichvar at redhat.com | ||
* Name: Jan Smejda | |||
== Current status == | == Current status == | ||
* Targeted release: [[Releases/17|Fedora 17]] | * Targeted release: [[Releases/17|Fedora 17]] | ||
* Last updated: 2012- | * Last updated: 2012-03-26 | ||
* Percentage of completion: | * Percentage of completion: 100% | ||
== Detailed Description == | == Detailed Description == | ||
<!-- Expand on the summary, if appropriate. A couple sentences suffices to explain the goal, but the more details you can provide the better. --> | <!-- Expand on the summary, if appropriate. A couple sentences suffices to explain the goal, but the more details you can provide the better. --> | ||
Backtrace deduplication server is a collection of newly-developed tools that will be deployed on the ABRT Retrace Server hardware, which is a part of Fedora infractructure. ABRT will contain a client tool and integration with the server. | |||
== Benefit to Fedora == | == Benefit to Fedora == | ||
# Red Hat Bugzilla receives a lot of duplicate crash reports from ABRT clients, even for a single component. This makes ABRT reports less useful and causes developers to give ABRT reports lower priority. Red Hat Bugzilla receives a lot of low-quality reports, which should be closed without intervention from maintainers. For example, the simple-scan component is very affected by low quality of ABRT: many of its bug reports are duplicates, and some reports are incorrectly showing __libc_message and similar functions as crash functions. | |||
# Red Hat Bugzilla contains multiple crash reports filed on end-user applications, that are caused by a single bug in a library. The crash reports are then analyzed multiple times by various developers, and that wastes their time. | |||
== Scope == | == Scope == | ||
< | <ol> | ||
<li>Implementation of backtrace metrics and indexes in [http://fedorahosted.org/btparser Btparser]. | |||
<ol> | |||
<li>Damerau-Levenshtein distance</li> | |||
<li>Jaro-Winkler distance</li> | |||
</ol> | |||
</li> | |||
<li>Implementation of backtrace optimization in Btparser.</li> | |||
<li>Backtrace deduplication service for C/C++ backtraces, which | |||
takes a backtrace and component, and checks backtraces from all | |||
related components (of libraries used by the crashed binary) in | |||
Bugzilla | |||
<ul> | |||
<li>name: faf-btserver-find-duplicates</li> | |||
</ul> | |||
</li> | |||
<li>HTTP interface to the backtrace deduplication service, | |||
implemented as a CGI script | |||
<ul> | |||
<li>name: faf-btserver-cgi</li> | |||
<li>must contain a machine interface (plain text)</li> | |||
<li>must contain a human interface (HTML)</li> | |||
<li>distinguishes between the two by reading HTTP_ACCEPT | |||
environment variable</li> | |||
<li>Apache configuration file to activate the CGI script</li> | |||
<li>require a backtrace, component name, operating system | |||
version from the user</li> | |||
<li>respond with a list of bug ids, bug components, operating | |||
system version, and similarity: | |||
<pre>625354 glib2 14 94% | |||
688952 glib2 15 94% | |||
654789 emacs 14 92%</pre> | |||
</li> | |||
</ul> | |||
</li> | |||
<li>Crash report cleanup service, which merges crashes that are | |||
already reported in Bugzilla. It also finds low quality reports | |||
and duplicates and close/reassign them. The implementation consists of four scripts: | |||
<ul> | |||
<li>faf-btserver-cluster | |||
<ul> | |||
<li>The merging is done on a component level, where similar | |||
bugs from the same component are merged, and also on a | |||
cross-component level, where bugs from applications are | |||
matched to those of their library dependencies, and bugs | |||
in libraries are detected by searching duplicates between | |||
components with shared dependencies.</li> | |||
<li>Achieve the right balance between application bug and | |||
library bug blaming. For example, many applications are | |||
crashing on a <code>strcmp</code> call, but we can | |||
reasonably assume there is no bug | |||
in <code>strcmp</code>.</li> | |||
<li>Compute distances and similarity indices between a bug | |||
(backtrace of bug) and all relevant bugs</li> | |||
<li>Compute backtrace quality</li> | |||
<li>Store the computed data in a bug report</li> | |||
<li>The number of crash combinations to check is | |||
huge. Optimizations might be needed to limit checks to | |||
backtraces having the same library calls on stack.</li> | |||
</ul> | |||
</li> | |||
<li>faf-btserver-prepare-actions | |||
<ul> | |||
<li>find similar bugs in the bug reports</li> | |||
<li>check bug statuses and generate a list of desired | |||
actions to be performed on Bugzilla</li> | |||
</ul> | |||
</li> | |||
<li>faf-btserver-push-actions-bugzilla | |||
<ul> | |||
<li>Performs desired actions on Bugzilla</li> | |||
<li>If a bug that is filed on an application but belongs to | |||
a library is detected, it will be either reassinged or a | |||
comment will be added:<br/> | |||
<code>It appears that this bug should be moved to | |||
component glib2. Other bugs from emacs (bug #644532) and | |||
evolution (bugs #758654, #749564) are duplicates of this | |||
bug. Please consider marking them as duplicates and | |||
moving this bug to glib2.</code> | |||
</li> | |||
</ul> | |||
</li> | |||
<li>faf-btserver-actions-log - generate a log of desired actions | |||
on Bugzilla in a text file; this is good for development, | |||
tweaking, debugging</li> | |||
</ul> | |||
</li> | |||
<li>Synchronization script to update server metadata — bugs, | |||
backtraces, builds, RPMs</li> | |||
<li>[https://fedorahosted.org/abrt ABRT] client using | |||
Backtrace deduplication server</li> | |||
</ol> | |||
== How To Test == | == How To Test == | ||
# via ABRT | |||
# via web interface | |||
<!-- This does not need to be a full-fledged document. Describe the dimensions of tests that this feature is expected to pass when it is done. If it needs to be tested with different hardware or software configurations, indicate them. The more specific you can be, the better the community testing can be. | <!-- This does not need to be a full-fledged document. Describe the dimensions of tests that this feature is expected to pass when it is done. If it needs to be tested with different hardware or software configurations, indicate them. The more specific you can be, the better the community testing can be. | ||
Line 52: | Line 147: | ||
== User Experience == | == User Experience == | ||
<!-- If this feature is noticeable by its target audience, how will their experiences change as a result? Describe what they will see or notice. --> | <!-- If this feature is noticeable by its target audience, how will their experiences change as a result? Describe what they will see or notice. --> | ||
# Maintainers: ABRT will open lower amount of bug duplicates | |||
# Maintainers: Bugs across components will be marked as duplicates | |||
## by adding comment to each bug with links to other bugs | |||
## by closing all bugs except one as duplucates, with the remaining opened bug being reassingned to a common library | |||
== Dependencies == | == Dependencies == | ||
None | |||
<!-- What other packages (RPMs) depend on this package? Are there changes outside the developers' control on which completion of this feature depends? In other words, completion of another feature owned by someone else and might cause you to not be able to finish on time or that you would need to coordinate? Other upstream projects like the kernel (if this is not a kernel feature)? --> | <!-- What other packages (RPMs) depend on this package? Are there changes outside the developers' control on which completion of this feature depends? In other words, completion of another feature owned by someone else and might cause you to not be able to finish on time or that you would need to coordinate? Other upstream projects like the kernel (if this is not a kernel feature)? --> | ||
== Contingency Plan == | == Contingency Plan == | ||
ABRT uses duplicate hashes to detect duplicates as usual. Without | |||
the backtrace deduplication server, ABRT bugs are still filed on the | |||
software component that owns the crashed binary. Duplicates within | |||
single component can be closed by extending an existing script, | |||
without having a server deployed. | |||
== Documentation == | == Documentation == | ||
<!-- Is there upstream documentation on this feature, or notes you have written yourself? Link to that material here so other interested developers can get involved. --> | <!-- Is there upstream documentation on this feature, or notes you have written yourself? Link to that material here so other interested developers can get involved. --> | ||
* | No documentation is currently available. | ||
If you want to see more details about implementation, you can check the source code: | |||
* See faf-btserver-* source code in [https://fedorahosted.org/faf/browser Faf repository] | |||
* See btparser source code (esp. lib/metrics.[ch]) in [https://fedorahosted.org/btparser/browser Btparser repository] | |||
== Release Notes == | == Release Notes == | ||
<!-- The Fedora Release Notes inform end-users about what is new in the release. Examples of past release notes are here: http://docs.fedoraproject.org/release-notes/ --> | <!-- The Fedora Release Notes inform end-users about what is new in the release. Examples of past release notes are here: http://docs.fedoraproject.org/release-notes/ --> | ||
<!-- The release notes also help users know how to deal with platform changes such as ABIs/APIs, configuration or data file formats, or upgrade concerns. If there are any such changes involved in this feature, indicate them here. You can also link to upstream documentation if it satisfies this need. This information forms the basis of the release notes edited by the documentation team and shipped with the release. --> | <!-- The release notes also help users know how to deal with platform changes such as ABIs/APIs, configuration or data file formats, or upgrade concerns. If there are any such changes involved in this feature, indicate them here. You can also link to upstream documentation if it satisfies this need. This information forms the basis of the release notes edited by the documentation team and shipped with the release. --> | ||
Fedora's bug reporting tool (ABRT) now uses new sophisticated server-side algorithms to discover bug duplicates and direct new reports to right operating system component. | |||
== Comments and Discussion == | == Comments and Discussion == | ||
* See [[Talk:Features/ | * See [[Talk:Features/ABRTBacktraceDeduplication]] | ||
[[Category: | [[Category:FeatureAcceptedF17]] | ||
<!-- When your feature page is completed and ready for review --> | <!-- When your feature page is completed and ready for review --> | ||
<!-- remove Category:FeaturePageIncomplete and change it to Category:FeatureReadyForWrangler --> | <!-- remove Category:FeaturePageIncomplete and change it to Category:FeatureReadyForWrangler --> | ||
<!-- After review, the feature wrangler will move your page to Category:FeatureReadyForFesco... if it still needs more work it will move back to Category:FeaturePageIncomplete--> | <!-- After review, the feature wrangler will move your page to Category:FeatureReadyForFesco... if it still needs more work it will move back to Category:FeaturePageIncomplete--> | ||
<!-- A pretty picture of the page category usage is at: https://fedoraproject.org/wiki/Features/Policy/Process --> | <!-- A pretty picture of the page category usage is at: https://fedoraproject.org/wiki/Features/Policy/Process --> |
Latest revision as of 12:08, 26 March 2012
ABRT Backtrace Deduplication Service
Summary
Backtrace deduplication service solves the problem of many duplicate crash reports being submitted by ABRT to Red Hat Bugzilla. It helps ABRT users to find duplicate reports before filing a new bug, and it helps package maintainers to triage/reassign/merge already reported bugs.
Owner
- Name: Karel Klic
- Email: kklic at redhat.com
- Name: Michal Toman
- Email: mtoman at redhat.com
- Name: Miroslav Lichvar
- Email: mlichvar at redhat.com
- Name: Jan Smejda
Current status
- Targeted release: Fedora 17
- Last updated: 2012-03-26
- Percentage of completion: 100%
Detailed Description
Backtrace deduplication server is a collection of newly-developed tools that will be deployed on the ABRT Retrace Server hardware, which is a part of Fedora infractructure. ABRT will contain a client tool and integration with the server.
Benefit to Fedora
- Red Hat Bugzilla receives a lot of duplicate crash reports from ABRT clients, even for a single component. This makes ABRT reports less useful and causes developers to give ABRT reports lower priority. Red Hat Bugzilla receives a lot of low-quality reports, which should be closed without intervention from maintainers. For example, the simple-scan component is very affected by low quality of ABRT: many of its bug reports are duplicates, and some reports are incorrectly showing __libc_message and similar functions as crash functions.
- Red Hat Bugzilla contains multiple crash reports filed on end-user applications, that are caused by a single bug in a library. The crash reports are then analyzed multiple times by various developers, and that wastes their time.
Scope
- Implementation of backtrace metrics and indexes in Btparser.
- Damerau-Levenshtein distance
- Jaro-Winkler distance
- Implementation of backtrace optimization in Btparser.
- Backtrace deduplication service for C/C++ backtraces, which
takes a backtrace and component, and checks backtraces from all
related components (of libraries used by the crashed binary) in
Bugzilla
- name: faf-btserver-find-duplicates
- HTTP interface to the backtrace deduplication service,
implemented as a CGI script
- name: faf-btserver-cgi
- must contain a machine interface (plain text)
- must contain a human interface (HTML)
- distinguishes between the two by reading HTTP_ACCEPT environment variable
- Apache configuration file to activate the CGI script
- require a backtrace, component name, operating system version from the user
- respond with a list of bug ids, bug components, operating
system version, and similarity:
625354 glib2 14 94% 688952 glib2 15 94% 654789 emacs 14 92%
- Crash report cleanup service, which merges crashes that are
already reported in Bugzilla. It also finds low quality reports
and duplicates and close/reassign them. The implementation consists of four scripts:
- faf-btserver-cluster
- The merging is done on a component level, where similar bugs from the same component are merged, and also on a cross-component level, where bugs from applications are matched to those of their library dependencies, and bugs in libraries are detected by searching duplicates between components with shared dependencies.
- Achieve the right balance between application bug and
library bug blaming. For example, many applications are
crashing on a
strcmp
call, but we can reasonably assume there is no bug instrcmp
. - Compute distances and similarity indices between a bug (backtrace of bug) and all relevant bugs
- Compute backtrace quality
- Store the computed data in a bug report
- The number of crash combinations to check is huge. Optimizations might be needed to limit checks to backtraces having the same library calls on stack.
- faf-btserver-prepare-actions
- find similar bugs in the bug reports
- check bug statuses and generate a list of desired actions to be performed on Bugzilla
- faf-btserver-push-actions-bugzilla
- Performs desired actions on Bugzilla
- If a bug that is filed on an application but belongs to
a library is detected, it will be either reassinged or a
comment will be added:
It appears that this bug should be moved to component glib2. Other bugs from emacs (bug #644532) and evolution (bugs #758654, #749564) are duplicates of this bug. Please consider marking them as duplicates and moving this bug to glib2.
- faf-btserver-actions-log - generate a log of desired actions on Bugzilla in a text file; this is good for development, tweaking, debugging
- faf-btserver-cluster
- Synchronization script to update server metadata — bugs, backtraces, builds, RPMs
- ABRT client using Backtrace deduplication server
How To Test
- via ABRT
- via web interface
User Experience
- Maintainers: ABRT will open lower amount of bug duplicates
- Maintainers: Bugs across components will be marked as duplicates
- by adding comment to each bug with links to other bugs
- by closing all bugs except one as duplucates, with the remaining opened bug being reassingned to a common library
Dependencies
None
Contingency Plan
ABRT uses duplicate hashes to detect duplicates as usual. Without the backtrace deduplication server, ABRT bugs are still filed on the software component that owns the crashed binary. Duplicates within single component can be closed by extending an existing script, without having a server deployed.
Documentation
No documentation is currently available.
If you want to see more details about implementation, you can check the source code:
- See faf-btserver-* source code in Faf repository
- See btparser source code (esp. lib/metrics.[ch]) in Btparser repository
Release Notes
Fedora's bug reporting tool (ABRT) now uses new sophisticated server-side algorithms to discover bug duplicates and direct new reports to right operating system component.