From Fedora Project Wiki
(→‎Current status: updated last updated date per wiki history)
 
(19 intermediate revisions by 3 users not shown)
Line 7: Line 7:


== Summary ==
== Summary ==
Backtrace deduplication service solves the problem of many duplicate crash reports  being submitted  by ABRT to  Red Hat Bugzilla.  It is designed to help ABRT users to find duplicate reports before filing a new bug, and to help package maintainers to triage/reassign/merge already reported bugs.
Backtrace deduplication service solves the problem of many duplicate crash reports  being submitted  by ABRT to  Red Hat Bugzilla.  It helps ABRT users to find duplicate reports before filing a new bug, and it helps package maintainers to triage/reassign/merge already reported bugs.
 
Backtrace deduplication server is a collection of newly-developed tools that will be deployed on the retrace server hardware, which is a part of Fedora infractructure. ABRT will contain a client tool and integration with the server.


== Owner ==
== Owner ==
Line 20: Line 18:
* Name: [[User:mlichvar|Miroslav Lichvar]]
* Name: [[User:mlichvar|Miroslav Lichvar]]
* Email: mlichvar at redhat.com
* Email: mlichvar at redhat.com
* Name: Jan Smejda


== Current status ==
== Current status ==
* Targeted release: [[Releases/17|Fedora 17]]  
* Targeted release: [[Releases/17|Fedora 17]]  
* Last updated: 2012-01-17
* Last updated: 2012-03-26
* Percentage of completion: 60%
* Percentage of completion: 100%


== Detailed Description ==
== Detailed Description ==
<!-- Expand on the summary, if appropriate.  A couple sentences suffices to explain the goal, but the more details you can provide the better. -->
<!-- Expand on the summary, if appropriate.  A couple sentences suffices to explain the goal, but the more details you can provide the better. -->
Backtrace deduplication server is a collection of newly-developed tools that will be deployed on the ABRT Retrace Server hardware, which is a part of Fedora infractructure. ABRT will contain a client tool and integration with the server.


== Benefit to Fedora ==
== Benefit to Fedora ==
Line 34: Line 35:


== Scope ==
== Scope ==
<!-- What work do the developers have to accomplish to complete the feature in time for release?  Is it a large change affecting many parts of the distribution or is it a very isolated change? What are those changes?-->
<ol>
  <li>Implementation of backtrace metrics and indexes in [http://fedorahosted.org/btparser Btparser].
    <ol>
      <li>Damerau-Levenshtein distance</li>
      <li>Jaro-Winkler distance</li>
    </ol>
  </li>
  <li>Implementation of backtrace optimization in Btparser.</li>
  <li>Backtrace deduplication service for C/C++ backtraces, which
    takes a backtrace and component, and checks backtraces from all
    related components (of libraries used by the crashed binary) in
    Bugzilla
    <ul>
      <li>name: faf-btserver-find-duplicates</li>
    </ul>
  </li>
  <li>HTTP interface to the backtrace deduplication service,
    implemented as a CGI script
    <ul>
      <li>name: faf-btserver-cgi</li>
      <li>must contain a machine interface (plain text)</li>
      <li>must contain a human interface (HTML)</li>
      <li>distinguishes between the two by reading HTTP_ACCEPT
environment variable</li>
      <li>Apache configuration file to activate the CGI script</li>
      <li>require a backtrace, component name, operating system
version from the user</li>
      <li>respond with a list of bug ids, bug components, operating
      system version, and similarity:
<pre>625354 glib2 14 94%
688952 glib2 15 94%
654789 emacs 14 92%</pre>
      </li>
    </ul>
  </li>
  <li>Crash report cleanup service, which merges crashes that are
    already reported in Bugzilla. It also finds low quality reports
    and duplicates and close/reassign them. The implementation consists of four scripts:
    <ul>
      <li>faf-btserver-cluster
<ul>
  <li>The merging is done on a component level, where similar
    bugs from the same component are merged, and also on a
    cross-component level, where bugs from applications are
    matched to those of their library dependencies, and bugs
    in libraries are detected by searching duplicates between
    components with shared dependencies.</li>
  <li>Achieve the right balance between application bug and
    library bug blaming. For example, many applications are
    crashing on a <code>strcmp</code> call, but we can
    reasonably assume there is no bug
    in <code>strcmp</code>.</li>
  <li>Compute distances and similarity indices between a bug
    (backtrace of bug) and all relevant bugs</li>
  <li>Compute backtrace quality</li>
  <li>Store the computed data in a bug report</li>
  <li>The number of crash combinations to check is
    huge. Optimizations might be needed to limit checks to
    backtraces having the same library calls on stack.</li>
</ul>
      </li>
      <li>faf-btserver-prepare-actions
<ul>
  <li>find similar bugs in the bug reports</li>
  <li>check bug statuses and generate a list of desired
actions to be performed on Bugzilla</li>
</ul>
      </li>
      <li>faf-btserver-push-actions-bugzilla
<ul>
  <li>Performs desired actions on Bugzilla</li>
  <li>If a bug that is filed on an application but belongs to
    a library is detected, it will be either reassinged or a
    comment will be added:<br/>
    <code>It appears that this bug should be moved to
      component glib2. Other bugs from emacs (bug #644532) and
      evolution (bugs #758654, #749564) are duplicates of this
      bug. Please consider marking them as duplicates and
      moving this bug to glib2.</code>
  </li>
</ul>
      </li>
      <li>faf-btserver-actions-log - generate a log of desired actions
on Bugzilla in a text file; this is good for development,
tweaking, debugging</li>
    </ul>
  </li>
  <li>Synchronization script to update server metadata &mdash; bugs,
    backtraces, builds, RPMs</li>
  <li>[https://fedorahosted.org/abrt ABRT] client using
    Backtrace deduplication server</li>
</ol>


== How To Test ==
== How To Test ==
# via ABRT
# via web interface
<!-- This does not need to be a full-fledged document.  Describe the dimensions of tests that this feature is expected to pass when it is done.  If it needs to be tested with different hardware or software configurations, indicate them.  The more specific you can be, the better the community testing can be.  
<!-- This does not need to be a full-fledged document.  Describe the dimensions of tests that this feature is expected to pass when it is done.  If it needs to be tested with different hardware or software configurations, indicate them.  The more specific you can be, the better the community testing can be.  


Line 53: Line 147:
== User Experience ==
== User Experience ==
<!-- If this feature is noticeable by its target audience, how will their experiences change as a result?  Describe what they will see or notice. -->
<!-- If this feature is noticeable by its target audience, how will their experiences change as a result?  Describe what they will see or notice. -->
# Maintainers: ABRT will open lower amount of bug duplicates
# Maintainers: Bugs across components will be marked as duplicates
## by adding comment to each bug with links to other bugs
## by closing all bugs except one as duplucates, with the remaining opened bug being reassingned to a common library


== Dependencies ==
== Dependencies ==
None
<!-- What other packages (RPMs) depend on this package?  Are there changes outside the developers' control on which completion of this feature depends?  In other words, completion of another feature owned by someone else and might cause you to not be able to finish on time or that you would need to coordinate?  Other upstream projects like the kernel (if this is not a kernel feature)? -->
<!-- What other packages (RPMs) depend on this package?  Are there changes outside the developers' control on which completion of this feature depends?  In other words, completion of another feature owned by someone else and might cause you to not be able to finish on time or that you would need to coordinate?  Other upstream projects like the kernel (if this is not a kernel feature)? -->


== Contingency Plan ==
== Contingency Plan ==
None necessary, revert to ABRT using duphash Bugzilla search mechanism.
ABRT uses duplicate hashes to detect duplicates as usual. Without
the backtrace deduplication server, ABRT bugs are still filed on the
software component that owns the crashed binary. Duplicates within
single component can be closed by extending an existing script,
without having a server deployed.


== Documentation ==
== Documentation ==
<!-- Is there upstream documentation on this feature, or notes you have written yourself?  Link to that material here so other interested developers can get involved. -->
<!-- Is there upstream documentation on this feature, or notes you have written yourself?  Link to that material here so other interested developers can get involved. -->
*
No documentation is currently available.
 
If you want to see more details about implementation, you can check the source code:
* See faf-btserver-* source code in [https://fedorahosted.org/faf/browser Faf repository]
* See btparser source code (esp. lib/metrics.[ch]) in [https://fedorahosted.org/btparser/browser Btparser repository]


== Release Notes ==
== Release Notes ==
<!-- The Fedora Release Notes inform end-users about what is new in the release.  Examples of past release notes are here: http://docs.fedoraproject.org/release-notes/ -->
<!-- The Fedora Release Notes inform end-users about what is new in the release.  Examples of past release notes are here: http://docs.fedoraproject.org/release-notes/ -->
<!-- The release notes also help users know how to deal with platform changes such as ABIs/APIs, configuration or data file formats, or upgrade concerns.  If there are any such changes involved in this feature, indicate them here.  You can also link to upstream documentation if it satisfies this need.  This information forms the basis of the release notes edited by the documentation team and shipped with the release. -->
<!-- The release notes also help users know how to deal with platform changes such as ABIs/APIs, configuration or data file formats, or upgrade concerns.  If there are any such changes involved in this feature, indicate them here.  You can also link to upstream documentation if it satisfies this need.  This information forms the basis of the release notes edited by the documentation team and shipped with the release. -->
*
Fedora's bug reporting tool (ABRT) now uses new sophisticated server-side algorithms to discover bug duplicates and direct new reports to right operating system component.


== Comments and Discussion ==
== Comments and Discussion ==
* See [[Talk:Features/ABRTBacktraceDeduplication]]
* See [[Talk:Features/ABRTBacktraceDeduplication]]


[[Category:FeaturePageIncomplete]]
[[Category:FeatureAcceptedF17]]
<!-- When your feature page is completed and ready for review -->
<!-- When your feature page is completed and ready for review -->
<!-- remove Category:FeaturePageIncomplete and change it to Category:FeatureReadyForWrangler -->
<!-- remove Category:FeaturePageIncomplete and change it to Category:FeatureReadyForWrangler -->
<!-- After review, the feature wrangler will move your page to Category:FeatureReadyForFesco... if it still needs more work it will move back to Category:FeaturePageIncomplete-->
<!-- After review, the feature wrangler will move your page to Category:FeatureReadyForFesco... if it still needs more work it will move back to Category:FeaturePageIncomplete-->
<!-- A pretty picture of the page category usage is at: https://fedoraproject.org/wiki/Features/Policy/Process -->
<!-- A pretty picture of the page category usage is at: https://fedoraproject.org/wiki/Features/Policy/Process -->

Latest revision as of 12:08, 26 March 2012


ABRT Backtrace Deduplication Service

Summary

Backtrace deduplication service solves the problem of many duplicate crash reports being submitted by ABRT to Red Hat Bugzilla. It helps ABRT users to find duplicate reports before filing a new bug, and it helps package maintainers to triage/reassign/merge already reported bugs.

Owner

  • Name: Jan Smejda

Current status

  • Targeted release: Fedora 17
  • Last updated: 2012-03-26
  • Percentage of completion: 100%

Detailed Description

Backtrace deduplication server is a collection of newly-developed tools that will be deployed on the ABRT Retrace Server hardware, which is a part of Fedora infractructure. ABRT will contain a client tool and integration with the server.

Benefit to Fedora

  1. Red Hat Bugzilla receives a lot of duplicate crash reports from ABRT clients, even for a single component. This makes ABRT reports less useful and causes developers to give ABRT reports lower priority. Red Hat Bugzilla receives a lot of low-quality reports, which should be closed without intervention from maintainers. For example, the simple-scan component is very affected by low quality of ABRT: many of its bug reports are duplicates, and some reports are incorrectly showing __libc_message and similar functions as crash functions.
  2. Red Hat Bugzilla contains multiple crash reports filed on end-user applications, that are caused by a single bug in a library. The crash reports are then analyzed multiple times by various developers, and that wastes their time.

Scope

  1. Implementation of backtrace metrics and indexes in Btparser.
    1. Damerau-Levenshtein distance
    2. Jaro-Winkler distance
  2. Implementation of backtrace optimization in Btparser.
  3. Backtrace deduplication service for C/C++ backtraces, which takes a backtrace and component, and checks backtraces from all related components (of libraries used by the crashed binary) in Bugzilla
    • name: faf-btserver-find-duplicates
  4. HTTP interface to the backtrace deduplication service, implemented as a CGI script
    • name: faf-btserver-cgi
    • must contain a machine interface (plain text)
    • must contain a human interface (HTML)
    • distinguishes between the two by reading HTTP_ACCEPT environment variable
    • Apache configuration file to activate the CGI script
    • require a backtrace, component name, operating system version from the user
    • respond with a list of bug ids, bug components, operating system version, and similarity:
      625354 glib2 14 94%
      688952 glib2 15 94%
      654789 emacs 14 92%
  5. Crash report cleanup service, which merges crashes that are already reported in Bugzilla. It also finds low quality reports and duplicates and close/reassign them. The implementation consists of four scripts:
    • faf-btserver-cluster
      • The merging is done on a component level, where similar bugs from the same component are merged, and also on a cross-component level, where bugs from applications are matched to those of their library dependencies, and bugs in libraries are detected by searching duplicates between components with shared dependencies.
      • Achieve the right balance between application bug and library bug blaming. For example, many applications are crashing on a strcmp call, but we can reasonably assume there is no bug in strcmp.
      • Compute distances and similarity indices between a bug (backtrace of bug) and all relevant bugs
      • Compute backtrace quality
      • Store the computed data in a bug report
      • The number of crash combinations to check is huge. Optimizations might be needed to limit checks to backtraces having the same library calls on stack.
    • faf-btserver-prepare-actions
      • find similar bugs in the bug reports
      • check bug statuses and generate a list of desired actions to be performed on Bugzilla
    • faf-btserver-push-actions-bugzilla
      • Performs desired actions on Bugzilla
      • If a bug that is filed on an application but belongs to a library is detected, it will be either reassinged or a comment will be added:
        It appears that this bug should be moved to component glib2. Other bugs from emacs (bug #644532) and evolution (bugs #758654, #749564) are duplicates of this bug. Please consider marking them as duplicates and moving this bug to glib2.
    • faf-btserver-actions-log - generate a log of desired actions on Bugzilla in a text file; this is good for development, tweaking, debugging
  6. Synchronization script to update server metadata — bugs, backtraces, builds, RPMs
  7. ABRT client using Backtrace deduplication server

How To Test

  1. via ABRT
  2. via web interface

User Experience

  1. Maintainers: ABRT will open lower amount of bug duplicates
  2. Maintainers: Bugs across components will be marked as duplicates
    1. by adding comment to each bug with links to other bugs
    2. by closing all bugs except one as duplucates, with the remaining opened bug being reassingned to a common library

Dependencies

None

Contingency Plan

ABRT uses duplicate hashes to detect duplicates as usual. Without the backtrace deduplication server, ABRT bugs are still filed on the software component that owns the crashed binary. Duplicates within single component can be closed by extending an existing script, without having a server deployed.

Documentation

No documentation is currently available.

If you want to see more details about implementation, you can check the source code:

Release Notes

Fedora's bug reporting tool (ABRT) now uses new sophisticated server-side algorithms to discover bug duplicates and direct new reports to right operating system component.

Comments and Discussion