From Fedora Project Wiki
m (→‎Markup for ELF objects: Minor grammar fix)
Line 26: Line 26:
== Implementation ==
== Implementation ==


The current plan for implementing this proposal is a two pronged approach using <code> ELF Notes</code>.  One, small, set of notes would be stored in an allocatable section, and would just contain the information needed by the loader.  This is the scheme proposed by [https://sourceware.org/ml/gnu-gabi/2016-q4/msg00000.html H.J.Lu]. A second, non-allocatable section would contain more detailed notes that can be analysed by separate, static, tools.  This second section would have the ability to record per-function information (actually per-symbol information) as well as file level and application level information.  The necessary information would be gathered by a gcc plugin, so there would be no need to modify the compiler sources directly.
The current plan for implementing this proposal is a two pronged approach using <code> ELF Notes</code>.  One, small, set of notes would be stored in an allocatable section, and would just contain the information needed by the loader.  This is the scheme proposed by [https://sourceware.org/ml/gnu-gabi/2016-q4/msg00000.html H.J.Lu].
 
A second, non-allocatable section would contain more detailed notes that can be analysed by separate, static, tools.  This second section would have the ability to record information on an address range basis as well as file level and application level scope.  The necessary information would be gathered by a gcc plugin, so there would be no need to modify the compiler sources directly.  The notes can be concatenated together, so there is no need to modify the linker, and scripts can be used in conjunction with the readelf program to parse the notes and answer questions about them.
 
== Proposed Specification for non-loaded notes ==
 
* The information is stored in a new section in the file using the ELF
  NOTE format.
 
  Creator tools (compilers, assemblers etc) place the notes into the
  binary files.  Linkers merge the notes together.  Consumer tools
  read the notes (possibly using readelf) and answer questions about
  the binaries concerned.
 
* The new section is called .gnu.build.attributes.  It has the type
  SHT_NOTE and a special flag bit set: SHF_GNU_BUILD_ATTRIBUTES
  (suggested value: 0x00100000).  It does *not* have the SHF_ALLOC
  flag bit set.  The sh_link and sh_info fields should be set to 0.
 
* The section contains ELF format notes.  The type field of a note is
  used to distinguish the range of memory over which an attribute
  applies.  The name field identifies the attribute and gives it a
  value.  The description field specifies the starting address for
  where the attribute is applied.
 
  Two new note types are defined:  NT_GNU_BUILD_ATTRIBUTE_OPEN (0x100)
  and NT_GNU_BUILD_ATTRIBUTE_FUNC (0x101).  These are used by the
  description field (see below).
 
  The description field of the note is a 4-byte or 8-byte wide address
  which indicates the starting location for an attribute.  If the
  bottom bit of the type field is set then the address is for a
  function and the attribute terminates at the end of the function.
  (Reverting back to the previous value for that attribute).  If the
  bottom bit of the type is clear then the address is for the start of
  an open ended range.  The range ends only when another open-ended
  attribute of the same name is defined, although it may be
  temporarily overridden by a function based address.
 
  Notes:
 
    + In unrelocated files the numbers should instead be zero, with a
      relocation present to set the actual value once the file is
      linked.
 
    + The numbers are stored in the same endian format as that
      specified in the EI_DATA field of the ELF header of the file
      containing the note.  The size of the numbers is dictated by the
      EI_CLASS field of the ELF header.
 
    + An empty description field is a special case.  It should be
      treated as if it applies to the same region as the nearest
      preceeding NT_GNU_BUILD_ATTRIBUTE_OPEN note with a non-empty
      description field.  This will probably be a version note.
 
  The name field identifies the type and value of the attribute.  The
  first character indicates the kind of attribute, based upon the
  following table:
 
    * - The attribute takes a numeric value.  Numbers are stored in
        little endian binary format.
    $ - The attribute takes a string value.  Strings should be NUL terminated.
    ! - The attribute takes a boolean value, and the value is false.
    + - The attribute takes a boolean value, and the value is true.
 
  The next character indicates the specific attribute:
 
    ascii printable - first character of a string name.  The string is
                      NUL-terminated.
    1              - version of this specification supported.  Must
                      be string type.
    2              - stack protector
    3              - relro
    4              - stack size
    5              - build tool & version
    6              - ABI
   
  For * and $ type attributes the value is then appended.
 
  Some examples:
 
    *foo\0\001\0\002        Attribute 'foo' with numeric value 0x20001
    *bar\0\0                Attribute 'bar' with numeric value 0
    $fred\0hello\0          Attribute 'fred' with string value "hello"
    *3\377\377              Attribute stack size with numeric value 0xffff
    +2                      Atrribute -fstack-protector enabled.
    !2                      Atrribute -fstack-protector disabled.
    $11\0                  Attribute version with string "1"
    $5gcc v7.0\0            Attribute build tool "gcc v7.0"
 
  Multiple notes for the same attribute can exist, providing that they
  have different values and that their description address ranges do
  not overlap.  The exception to this rule is that
  NT_GNU_BUILD_ATTRIBUTE_FUNC attributes are allowed to overlap
  NT_GNU_BUILD_ATTRIBUTE_OPEN attributes.
 
  The first note should be a version note.
 
* When the linker merges two or more files containing these notes it
  should ensure that the above rules are maintained.  Simply
  concatenating the incoming note sections should ensure this.
 
  The linker can, if it wishes, create its own notes and append, or
  insert them into the note section.  Eg to indicate that -z relro is
  enabled.
 
  The order of the notes from an incoming section must be preserved in
  the outgoing section.  Notes do not have to be sorted by address
  range although this often happens automatically when sections are
  concatenated.
 
  If this is a final link, then relocations on the notes should of
  course be resolved.
 
  The linker, or another tool, may wish to eliminate redundant notes
  in the note section.  When doing this the following rules must be
  observed:
  1. Preserve the ordering of the notes.
  2. Preserve any NT_GNU_BUILD_ATTRIBUTE_FUNC notes.
  3. Eliminate any NT_GNU_BUILD_ATTRIBUTE_OPEN notes that have
      the same full name field as the immediately preceeding
      note with the same type of name.
  4. If an NT_GNU_BUILD_ATTRIBUTE_OPEN note is going to be
              preserved and its description field is empty then the
      nearest preceeding OPEN note with a non-empty
      description field must also be preserved *OR* the
      description field of the note must be changed to
      contain the starting address to which it refers.


== Wiki page categories ==
== Wiki page categories ==

Revision as of 13:05, 7 December 2016

Markup for ELF objects

{This page is here in order to encourage discussion about this project. It is hoped that anyone who is interested will edit this page to add their questions, comments and ideas}.

This project intends to add markers to ELF objects so that it is possible to determine whether they have certain properties. The three overarching goals are:

  • Determine if all objects implement the same ABI (e.g., they agree upon the format of long double). This would be both at link time and at load time. This would also need to include negative properties so that, for example, if a shared library does not use the wchar_t type, then it can be linked with an application that uses any size of wchar_t. Ideally we want to be able to find the answer to these questions:
    • Which (architecture specific) ABI variant is in use in object X and is it compatible with object Y ?
    • What are the sizes of the basic types used in object X ? (For those types not explicitly covered by the ABI, eg enum and wchat_t). If the object does not use a particular type then this should be discoverable as well.
  • Determine if an object was compiled according to applicable security polices (e.g., -fstack-protector-strong was used at compile time). This also includes the ability to check which tool(s) were used to create the object, so that, for example, it is possible to determine if the object was compiled with an out of date version of the compiler. Questions that we want to be able to answer here include:
    • Has every function in object X been compiled with option Y ?
    • Has every function in object X been compiled with version Y of the compiler (or newer) ?
    • Has object X been linked with option Y enabled (eg relro) ?
  • Determine the run-time requirements of the object (e.g. the hardware version they need, or the amount of stack space that they require). This could also be extended to cover symbols that need special binding considerations. For example functions that call execve might need immediate binding even if the rest of the executable uses lazy binding. So questions in this section include:
    • Which symbols in object X posses attribute Y, given that this affects the loading of X.
    • What hardware resources are needed by object X ? (Architecture, memory footprint, stack size, more ?)

One issue with determining this information is that it is possible for a single ELF object to have multiple, possibly conflicting, properties. For example an object might contain ifuncs which support different hardware versions, or function specific optimizations may have been used to change the security of individual functions. In fact using a function level scope for this kind of information may not be enough. It may be that properties need to be associated with a specific set of address ranges instead.

A second issue is that if this information is going to be used at load-time, then it has to be fast and simple to access and process. The loader is a highly optimized program and changes to it need to be small and robust.

A side issue is that storing this information in an ELF object will increase its size. If lots of information is stored in a space inefficient way then this could prove a problem for getting this proposal accepted by package maintainers.

Implementation

The current plan for implementing this proposal is a two pronged approach using ELF Notes. One, small, set of notes would be stored in an allocatable section, and would just contain the information needed by the loader. This is the scheme proposed by H.J.Lu.

A second, non-allocatable section would contain more detailed notes that can be analysed by separate, static, tools. This second section would have the ability to record information on an address range basis as well as file level and application level scope. The necessary information would be gathered by a gcc plugin, so there would be no need to modify the compiler sources directly. The notes can be concatenated together, so there is no need to modify the linker, and scripts can be used in conjunction with the readelf program to parse the notes and answer questions about them.

Proposed Specification for non-loaded notes

  • The information is stored in a new section in the file using the ELF
 NOTE format.
 Creator tools (compilers, assemblers etc) place the notes into the
 binary files.  Linkers merge the notes together.  Consumer tools 
 read the notes (possibly using readelf) and answer questions about 
 the binaries concerned.
  • The new section is called .gnu.build.attributes. It has the type
 SHT_NOTE and a special flag bit set: SHF_GNU_BUILD_ATTRIBUTES
 (suggested value: 0x00100000).  It does *not* have the SHF_ALLOC
 flag bit set.  The sh_link and sh_info fields should be set to 0.
  • The section contains ELF format notes. The type field of a note is
 used to distinguish the range of memory over which an attribute 
 applies.  The name field identifies the attribute and gives it a 
 value.  The description field specifies the starting address for 
 where the attribute is applied.
 Two new note types are defined:  NT_GNU_BUILD_ATTRIBUTE_OPEN (0x100)
 and NT_GNU_BUILD_ATTRIBUTE_FUNC (0x101).  These are used by the
 description field (see below).
 
 The description field of the note is a 4-byte or 8-byte wide address
 which indicates the starting location for an attribute.  If the
 bottom bit of the type field is set then the address is for a
 function and the attribute terminates at the end of the function.
 (Reverting back to the previous value for that attribute).  If the
 bottom bit of the type is clear then the address is for the start of
 an open ended range.  The range ends only when another open-ended
 attribute of the same name is defined, although it may be
 temporarily overridden by a function based address.
 Notes:
 
   + In unrelocated files the numbers should instead be zero, with a
     relocation present to set the actual value once the file is
     linked.
   + The numbers are stored in the same endian format as that
     specified in the EI_DATA field of the ELF header of the file
     containing the note.  The size of the numbers is dictated by the
     EI_CLASS field of the ELF header.
   + An empty description field is a special case.  It should be
     treated as if it applies to the same region as the nearest
     preceeding NT_GNU_BUILD_ATTRIBUTE_OPEN note with a non-empty
     description field.  This will probably be a version note.
 The name field identifies the type and value of the attribute.  The
 first character indicates the kind of attribute, based upon the
 following table:
   * - The attribute takes a numeric value.  Numbers are stored in
       little endian binary format.
   $ - The attribute takes a string value.  Strings should be NUL terminated.
   ! - The attribute takes a boolean value, and the value is false.
   + - The attribute takes a boolean value, and the value is true.
 
 The next character indicates the specific attribute:
   ascii printable - first character of a string name.  The string is
                     NUL-terminated.
   1               - version of this specification supported.  Must
                     be string type.
   2               - stack protector
   3               - relro
   4               - stack size
   5               - build tool & version
   6               - ABI
   
 For * and $ type attributes the value is then appended.
 Some examples:
   *foo\0\001\0\002        Attribute 'foo' with numeric value 0x20001
   *bar\0\0                Attribute 'bar' with numeric value 0
   $fred\0hello\0          Attribute 'fred' with string value "hello"
   *3\377\377              Attribute stack size with numeric value 0xffff
   +2                      Atrribute -fstack-protector enabled.
   !2                      Atrribute -fstack-protector disabled.
   $11\0                   Attribute version with string "1"
   $5gcc v7.0\0            Attribute build tool "gcc v7.0"
 Multiple notes for the same attribute can exist, providing that they
 have different values and that their description address ranges do
 not overlap.  The exception to this rule is that
 NT_GNU_BUILD_ATTRIBUTE_FUNC attributes are allowed to overlap
 NT_GNU_BUILD_ATTRIBUTE_OPEN attributes.
 The first note should be a version note.
  • When the linker merges two or more files containing these notes it
 should ensure that the above rules are maintained.  Simply
 concatenating the incoming note sections should ensure this.
 The linker can, if it wishes, create its own notes and append, or
 insert them into the note section.  Eg to indicate that -z relro is
 enabled.
 The order of the notes from an incoming section must be preserved in
 the outgoing section.  Notes do not have to be sorted by address
 range although this often happens automatically when sections are
 concatenated.
 If this is a final link, then relocations on the notes should of
 course be resolved.
 The linker, or another tool, may wish to eliminate redundant notes
 in the note section.  When doing this the following rules must be
 observed:

1. Preserve the ordering of the notes. 2. Preserve any NT_GNU_BUILD_ATTRIBUTE_FUNC notes. 3. Eliminate any NT_GNU_BUILD_ATTRIBUTE_OPEN notes that have the same full name field as the immediately preceeding note with the same type of name. 4. If an NT_GNU_BUILD_ATTRIBUTE_OPEN note is going to be

             preserved and its description field is empty then the

nearest preceeding OPEN note with a non-empty description field must also be preserved *OR* the description field of the note must be changed to contain the starting address to which it refers.

Wiki page categories

We use wiki categories to track progress.