No edit summary |
m (Fix some typos.) |
||
(40 intermediate revisions by 6 users not shown) | |||
Line 1: | Line 1: | ||
== Markup for ELF objects == | == Markup for ELF objects == | ||
This project | {This page is here in order to encourage discussion about this project. It is hoped that anyone who is interested will edit this page to add their questions, comments and ideas}. | ||
* Determine if all objects | This project intends to add markers to ELF objects so that it is possible to determine whether they have certain properties. The three overarching goals are: | ||
* Determine if an object was compiled according to applicable security polices (e.g., <code>-fstack-protector-strong</code> was used at compile time). | |||
* Determine if all objects implement the same ABI (e.g., they agree upon the format of <code>long double</code>). This would be both at link time and at load time. This would also need to include negative properties so that, for example, if a shared library does not use the <code>wchar_t</code> type, then it can be linked with an application that uses any size of <code>wchar_t</code>. Ideally we want to be able to find the answer to these questions: | |||
** Which (architecture specific) ABI variant is in use in object X and is it compatible with object Y ? | |||
** What are the sizes of the basic types used in object X ? (For those types not explicitly covered by the ABI, eg <code>enum</code> and <code>wchar_t</code>). If the object does not use a particular type then this should be discoverable as well. | |||
* Determine if an object was compiled according to applicable security polices (e.g., <code>-fstack-protector-strong</code> was used at compile time). This also includes the ability to check which tool(s) were used to create the object, so that, for example, it is possible to determine if the object was compiled with an out of date version of the compiler. Questions that we want to be able to answer here include: | |||
** Has every function in object X been compiled with option Y ? | |||
** Has every function in object X been compiled with version Y of the compiler (or newer) ? | |||
** Has object X been linked with option Y enabled (eg <code>relro</code>) ? | |||
* Determine the run-time requirements of the object (e.g. the hardware version they need, or the amount of stack space that they require). This could also be extended to cover symbols that need special binding considerations. For example functions that call <code>execve</code> might need immediate binding even if the rest of the executable uses lazy binding. So questions in this section include: | |||
** Which symbols in object X posses attribute Y, given that this affects the loading of X. | |||
** What hardware resources are needed by object X ? (Architecture, memory footprint, stack size, more ?) | |||
One issue with determining this information is that it is possible for a single ELF object to have multiple, possibly conflicting, properties. For example an object might contain [http://www.agner.org/optimize/blog/read.php?i=167 ifuncs] which support different hardware versions, or [https://gcc.gnu.org/onlinedocs/gcc-6.2.0/gcc/Function-Specific-Option-Pragmas.html#Function-Specific-Option-Pragmas function specific optimizations] may have been used to change the security of individual functions. In fact using a function level scope for this kind of information may not be enough. It may be that properties need to be associated with a specific set of address ranges instead. | |||
A second issue is that if this information is going to be used at load-time, then it has to be fast and simple to access and process. The loader is a highly optimized program and changes to it need to be small and robust. | |||
A side issue is that storing this information in an ELF object will increase its size. If lots of information is stored in a space inefficient way then this could prove a problem for getting this proposal accepted by package maintainers. | |||
== Implementation == | |||
The current plan for implementing this proposal is a two pronged approach using <code> ELF Notes</code>. One, small, set of notes would be stored in an allocatable section, and would just contain the information needed by the loader. This is the scheme proposed by [https://sourceware.org/ml/gnu-gabi/2016-q4/msg00000.html H.J.Lu]. | |||
A second, non-allocatable section would contain more detailed notes that can be analysed by separate, static, tools. This second section would have the ability to record information on an address range basis as well as file level and application level scope. The necessary information would be gathered by a gcc plugin, so there would be no need to modify the compiler sources directly. The notes can be concatenated together, so there is no need to modify the linker, and scripts can be used in conjunction with the readelf program to parse the notes and answer questions about them. | |||
== Proposed Specification for non-loaded notes == | |||
The information is stored in a new section in the file using the ELF NOTE format. Creator tools (compilers, assemblers etc) place the notes into the binary files. Linkers merge the notes together. Consumer tools read the notes (possibly using readelf) and answer questions about the binaries concerned. | |||
The new section is called .gnu.build.attributes. It has the type SHT_NOTE and a special flag bit set: SHF_GNU_BUILD_ATTRIBUTES (suggested value: 0x00100000). It does *not* have the SHF_ALLOC flag bit set. The sh_link and sh_info fields should be set to 0. | |||
The section contains ELF format notes. The type field of a note is used to distinguish the range of memory over which an attribute applies. The name field identifies the attribute and gives it a value and the description field specifies the starting and ending addresses for where the attribute is applied. | |||
The new note types are: NT_GNU_BUILD_ATTRIBUTE_OPEN (0x100) and NT_GNU_BUILD_ATTRIBUTE_FUNC (0x101). These are used by the description field to indicate an open address range or a symbol constrained address range. | |||
The description field of the note is either 0-bytes long, or else a pair of 4-byte wide (for 32-bit targets) or 8-byte wide (for 64-bit targets) addresses which indicate the starting and ending location for the attribute. | |||
If the description field is empty, the note should be treated as if it applies to the same region as the nearest preceding note of the same type (ie either OPEN or FUNC). | |||
In unrelocated files the addresses should instead be zero, with a relocation present to set the actual value once the file is linked. | |||
The numbers are stored in the same endian format as that specified in the EI_DATA field of the ELF header of the file containing the note. The size of the numbers is dictated by the EI_CLASS field of the ELF header. | |||
The name field identifies the type and value of the attribute. The name starts with the string "GA", which is an abbreviation for GNU Attribute. The abbreviation is used in order to save space. The string is there so that tools that do not know about these notes will still be able to parse the note structure. | |||
The character following the identifier string indicates the kind of attribute, based upon the following table: | |||
* - The attribute takes a numeric value. Numbers are stored in | |||
little endian binary format. | |||
$ - The attribute takes a string value. | |||
! - The attribute takes a boolean value, and the value is false. | |||
+ - The attribute takes a boolean value, and the value is true. | |||
The next character indicates the specific attribute: | |||
ASCII | |||
value | |||
0 - Reserved for future use. | |||
1 - Version of the specification supported and producer(s) of the notes. (See below) | |||
2 - -fstack-protector status | |||
3 - relro | |||
4 - stack size | |||
5 - build tool & version | |||
6 - ABI | |||
7 - Position Independence (0=>static, 1=>pic, 2=>PIC, 3=>pie) | |||
8 - short enums | |||
9-31 - Reserved for future use. | |||
32-126 - The first character of an entirely string based attribute. | |||
127+ - Reserved for future use. | |||
For * and $ type attributes the value is then appended. | |||
Per the ELF note spec the name must end with a NUL byte. | |||
Here are some examples: | |||
GA*foo\001\0\002\0 Attribute 'foo' with numeric value 0x200010 (assuming a little endian target). | |||
GA*bar\0\0 Attribute 'bar' with numeric value 0 | |||
GA$fred\0hello\0 Attribute 'fred' with string value "hello" | |||
GA*\004\377\377\0 Attribute stack size with numeric value 0xffff | |||
GA*\002\001\0 -fstack-protector has been enabled. | |||
GA*\002\004\0 -fstack-protector-explicit has been enabled. | |||
GA$\001\002p1\0 Supports spec version 2, notes generated by plugin version 1. | |||
GA$\005gcc v7.0\0 Attribute build tool "gcc v7.0". | |||
Multiple notes for the same attribute can exist, providing that they have different values and that their description address ranges do not overlap. The exception to this rule is that NT_GNU_BUILD_ATTRIBUTE_FUNC attributes are allowed to overlap NT_GNU_BUILD_ATTRIBUTE_OPEN attributes. | |||
Every set of notes should include a version note. Ideally the version note will be the first one in the sequence, but this is not a hard requirement. The version note string should consist of an odd number of characters. The first character is the ASCII code for the number of the version of this protocol supported by the notes. The next pair of characters indicate who produced the notes and which version of this producer has been used. A 'p' character indicates a compiler plugin. An 'l' character indicates the linker. Other characters may be defined in the future. Multiple producers can contribute to the notes. Their identifying pair of characters should be appended to the version note. | |||
When the linker merges two or more files containing these notes it should ensure that the above rules are maintained. Simply concatenating the incoming note sections should ensure this. The linker can, if it wishes, create its own notes and append, or insert them into the note section. Eg to indicate that -z relro is enabled. | |||
The order of the notes from an incoming section must be preserved in the outgoing section. Notes do not have to be sorted by address range although this often happens automatically when sections are concatenated. | |||
If this is a final link, then relocations on the notes should of course be resolved. | |||
The linker, or another tool, may wish to eliminate redundant notes in the note section. It is recommended that if there are relocations against the notes, then they should not be merged. When merging the following rules must be observed: | |||
* Preserve the ordering of the notes. | |||
* Preserve any NT_GNU_BUILD_ATTRIBUTE_FUNC notes. | |||
* Eliminate any NT_GNU_BUILD_ATTRIBUTE_OPEN notes that have the same full name field as the immediately preceding note with the same type of name. | |||
* Combine the numeric value of any NT_GNU_BUILD_ATTRIBUTE_OPEN notes of type GNU_BUILD_ATTRIBUTE_STACK_SIZE. | |||
* If an NT_GNU_BUILD_ATTRIBUTE_OPEN note is going to be preserved and its description field is empty then the nearest preceding OPEN note with a non-empty description field must also be preserved *OR* the description field of the note must be changed to contain the starting address to which it refers. | |||
A proposed implementation of a gcc plugin to generate these notes can be found here: https://bugzilla.redhat.com/show_bug.cgi?id=1451407 | |||
== Questions == | |||
* What happened to SHT_GNU_ATTRIBUTES and how does it relate to what you are proposing? | |||
GNU Attributes still exist and are a close match to the requirements of this specification. There is one major problem however - backwards compatibility. In order to use the SHT_GNU_ATTRIBUTES section type and the corresponding section contents it would be necessary to add support for section-relative and symbol-relative attributes to the binutils. The GNU Attributes specification does include support for these types of attributes, but so far nobody has been using them, and support for them in the assembler and linker is almost completely lacking. | |||
In addition, according to the GNU attributes specification, when multiple input files have conflicting file-level attributes the linker must generate new section-level attributes to cover all of the conflicts. Similarly section-level attribute conflicts must be resolved by creating new symbol-level attributes. All of this leads to a lot more work for the linker, a potential source of new bugs, larger .gnu.attribute sections (compared to the ELF Note based solution proposed here), and lack of backwards compatibility. | |||
The big plus of the current ELF Notes based specification is that it does not require any changes to the compiler, assembler or linker. It can in fact be implemented using the currently existing tools, or even, older versions of these tools, so making the potential uptake of this solution a lot easier. | |||
* What is being done to ensure the attributes are space and time efficient for dynamic link comparison in the dynamic linker? Speed of checking 10,000 DSOs (scalability) for ABI compatibility is going to be a very important requirement. | |||
This is the purview of H.J's run-time annotations proposal. The basic idea I believe is to store only the information needed by the dynamic linker, and to store it in the form of bit masks for quick combination and verification. The main body of this proposal is for a specification for non-allocateable notes that would only be examined by static tools, and never used by the loader. | |||
* Have you compared the markup used by DSEE (Domain Software Engineering Environment) from Apollo Computer in the 1980's? Given an executable it was standard practice to recreate [and cache] the entire exact tool chain that generated that file. When combined with the source version (also encoded in the binary), it was possible to retrieve the original source, apply patches, re-generate (compile, link, post-process) and update the program with a guarantee of runtime compatibility. Applying no patches would produce a bit-for-bit identical result, because every step was performed by the bit-identical tools that were used initially. | |||
== String Notes == | |||
Experience with using ELF notes for implementing this protocol has shown that they are insufficiently compact and cause serious problem when building large projects. So an alternative approach is to use strings instead. In this version of the protocol the notes are stored as strings in a separate section. Each note is its own string, and since string sections are mergeable, the linker will automatically eliminate duplicate notes. | |||
The format of a string note is a two character note identifier, followed by a colon and then whatever value the note requires. Ranges are not encoded. Practice has shown that the range information provides very little benefit to the analysis tools, and causes a lot of problems when building binaries. | |||
== Wiki page categories == | == Wiki page categories == | ||
Line 10: | Line 140: | ||
We use wiki categories to track progress. | We use wiki categories to track progress. | ||
* [[Category:Toolchain/Watermark/Provisional|Proposed properties]] subject to discussion and revision. | * [[:Category:Toolchain/Watermark/Provisional|Proposed properties]] subject to discussion and revision. | ||
* [[Category:Toolchain/Watermark/Accepted|Accepted properties]] pending implementation. | * [[:Category:Toolchain/Watermark/Accepted|Accepted properties]] pending implementation. | ||
* [[Category:Toolchain/Watermark/Implemented|Implemented properties]] | * [[:Category:Toolchain/Watermark/Implemented|Implemented properties]] |
Latest revision as of 20:10, 11 June 2024
Markup for ELF objects
{This page is here in order to encourage discussion about this project. It is hoped that anyone who is interested will edit this page to add their questions, comments and ideas}.
This project intends to add markers to ELF objects so that it is possible to determine whether they have certain properties. The three overarching goals are:
- Determine if all objects implement the same ABI (e.g., they agree upon the format of
long double
). This would be both at link time and at load time. This would also need to include negative properties so that, for example, if a shared library does not use thewchar_t
type, then it can be linked with an application that uses any size ofwchar_t
. Ideally we want to be able to find the answer to these questions:- Which (architecture specific) ABI variant is in use in object X and is it compatible with object Y ?
- What are the sizes of the basic types used in object X ? (For those types not explicitly covered by the ABI, eg
enum
andwchar_t
). If the object does not use a particular type then this should be discoverable as well.
- Determine if an object was compiled according to applicable security polices (e.g.,
-fstack-protector-strong
was used at compile time). This also includes the ability to check which tool(s) were used to create the object, so that, for example, it is possible to determine if the object was compiled with an out of date version of the compiler. Questions that we want to be able to answer here include:- Has every function in object X been compiled with option Y ?
- Has every function in object X been compiled with version Y of the compiler (or newer) ?
- Has object X been linked with option Y enabled (eg
relro
) ?
- Determine the run-time requirements of the object (e.g. the hardware version they need, or the amount of stack space that they require). This could also be extended to cover symbols that need special binding considerations. For example functions that call
execve
might need immediate binding even if the rest of the executable uses lazy binding. So questions in this section include:- Which symbols in object X posses attribute Y, given that this affects the loading of X.
- What hardware resources are needed by object X ? (Architecture, memory footprint, stack size, more ?)
One issue with determining this information is that it is possible for a single ELF object to have multiple, possibly conflicting, properties. For example an object might contain ifuncs which support different hardware versions, or function specific optimizations may have been used to change the security of individual functions. In fact using a function level scope for this kind of information may not be enough. It may be that properties need to be associated with a specific set of address ranges instead.
A second issue is that if this information is going to be used at load-time, then it has to be fast and simple to access and process. The loader is a highly optimized program and changes to it need to be small and robust.
A side issue is that storing this information in an ELF object will increase its size. If lots of information is stored in a space inefficient way then this could prove a problem for getting this proposal accepted by package maintainers.
Implementation
The current plan for implementing this proposal is a two pronged approach using ELF Notes
. One, small, set of notes would be stored in an allocatable section, and would just contain the information needed by the loader. This is the scheme proposed by H.J.Lu.
A second, non-allocatable section would contain more detailed notes that can be analysed by separate, static, tools. This second section would have the ability to record information on an address range basis as well as file level and application level scope. The necessary information would be gathered by a gcc plugin, so there would be no need to modify the compiler sources directly. The notes can be concatenated together, so there is no need to modify the linker, and scripts can be used in conjunction with the readelf program to parse the notes and answer questions about them.
Proposed Specification for non-loaded notes
The information is stored in a new section in the file using the ELF NOTE format. Creator tools (compilers, assemblers etc) place the notes into the binary files. Linkers merge the notes together. Consumer tools read the notes (possibly using readelf) and answer questions about the binaries concerned.
The new section is called .gnu.build.attributes. It has the type SHT_NOTE and a special flag bit set: SHF_GNU_BUILD_ATTRIBUTES (suggested value: 0x00100000). It does *not* have the SHF_ALLOC flag bit set. The sh_link and sh_info fields should be set to 0.
The section contains ELF format notes. The type field of a note is used to distinguish the range of memory over which an attribute applies. The name field identifies the attribute and gives it a value and the description field specifies the starting and ending addresses for where the attribute is applied.
The new note types are: NT_GNU_BUILD_ATTRIBUTE_OPEN (0x100) and NT_GNU_BUILD_ATTRIBUTE_FUNC (0x101). These are used by the description field to indicate an open address range or a symbol constrained address range.
The description field of the note is either 0-bytes long, or else a pair of 4-byte wide (for 32-bit targets) or 8-byte wide (for 64-bit targets) addresses which indicate the starting and ending location for the attribute.
If the description field is empty, the note should be treated as if it applies to the same region as the nearest preceding note of the same type (ie either OPEN or FUNC).
In unrelocated files the addresses should instead be zero, with a relocation present to set the actual value once the file is linked.
The numbers are stored in the same endian format as that specified in the EI_DATA field of the ELF header of the file containing the note. The size of the numbers is dictated by the EI_CLASS field of the ELF header.
The name field identifies the type and value of the attribute. The name starts with the string "GA", which is an abbreviation for GNU Attribute. The abbreviation is used in order to save space. The string is there so that tools that do not know about these notes will still be able to parse the note structure.
The character following the identifier string indicates the kind of attribute, based upon the following table:
* - The attribute takes a numeric value. Numbers are stored in little endian binary format. $ - The attribute takes a string value. ! - The attribute takes a boolean value, and the value is false. + - The attribute takes a boolean value, and the value is true.
The next character indicates the specific attribute:
ASCII value 0 - Reserved for future use. 1 - Version of the specification supported and producer(s) of the notes. (See below) 2 - -fstack-protector status 3 - relro 4 - stack size 5 - build tool & version 6 - ABI 7 - Position Independence (0=>static, 1=>pic, 2=>PIC, 3=>pie) 8 - short enums 9-31 - Reserved for future use. 32-126 - The first character of an entirely string based attribute. 127+ - Reserved for future use.
For * and $ type attributes the value is then appended.
Per the ELF note spec the name must end with a NUL byte.
Here are some examples:
GA*foo\001\0\002\0 Attribute 'foo' with numeric value 0x200010 (assuming a little endian target). GA*bar\0\0 Attribute 'bar' with numeric value 0 GA$fred\0hello\0 Attribute 'fred' with string value "hello" GA*\004\377\377\0 Attribute stack size with numeric value 0xffff GA*\002\001\0 -fstack-protector has been enabled. GA*\002\004\0 -fstack-protector-explicit has been enabled. GA$\001\002p1\0 Supports spec version 2, notes generated by plugin version 1. GA$\005gcc v7.0\0 Attribute build tool "gcc v7.0".
Multiple notes for the same attribute can exist, providing that they have different values and that their description address ranges do not overlap. The exception to this rule is that NT_GNU_BUILD_ATTRIBUTE_FUNC attributes are allowed to overlap NT_GNU_BUILD_ATTRIBUTE_OPEN attributes.
Every set of notes should include a version note. Ideally the version note will be the first one in the sequence, but this is not a hard requirement. The version note string should consist of an odd number of characters. The first character is the ASCII code for the number of the version of this protocol supported by the notes. The next pair of characters indicate who produced the notes and which version of this producer has been used. A 'p' character indicates a compiler plugin. An 'l' character indicates the linker. Other characters may be defined in the future. Multiple producers can contribute to the notes. Their identifying pair of characters should be appended to the version note.
When the linker merges two or more files containing these notes it should ensure that the above rules are maintained. Simply concatenating the incoming note sections should ensure this. The linker can, if it wishes, create its own notes and append, or insert them into the note section. Eg to indicate that -z relro is enabled.
The order of the notes from an incoming section must be preserved in the outgoing section. Notes do not have to be sorted by address range although this often happens automatically when sections are concatenated.
If this is a final link, then relocations on the notes should of course be resolved.
The linker, or another tool, may wish to eliminate redundant notes in the note section. It is recommended that if there are relocations against the notes, then they should not be merged. When merging the following rules must be observed:
- Preserve the ordering of the notes.
- Preserve any NT_GNU_BUILD_ATTRIBUTE_FUNC notes.
- Eliminate any NT_GNU_BUILD_ATTRIBUTE_OPEN notes that have the same full name field as the immediately preceding note with the same type of name.
- Combine the numeric value of any NT_GNU_BUILD_ATTRIBUTE_OPEN notes of type GNU_BUILD_ATTRIBUTE_STACK_SIZE.
- If an NT_GNU_BUILD_ATTRIBUTE_OPEN note is going to be preserved and its description field is empty then the nearest preceding OPEN note with a non-empty description field must also be preserved *OR* the description field of the note must be changed to contain the starting address to which it refers.
A proposed implementation of a gcc plugin to generate these notes can be found here: https://bugzilla.redhat.com/show_bug.cgi?id=1451407
Questions
- What happened to SHT_GNU_ATTRIBUTES and how does it relate to what you are proposing?
GNU Attributes still exist and are a close match to the requirements of this specification. There is one major problem however - backwards compatibility. In order to use the SHT_GNU_ATTRIBUTES section type and the corresponding section contents it would be necessary to add support for section-relative and symbol-relative attributes to the binutils. The GNU Attributes specification does include support for these types of attributes, but so far nobody has been using them, and support for them in the assembler and linker is almost completely lacking.
In addition, according to the GNU attributes specification, when multiple input files have conflicting file-level attributes the linker must generate new section-level attributes to cover all of the conflicts. Similarly section-level attribute conflicts must be resolved by creating new symbol-level attributes. All of this leads to a lot more work for the linker, a potential source of new bugs, larger .gnu.attribute sections (compared to the ELF Note based solution proposed here), and lack of backwards compatibility.
The big plus of the current ELF Notes based specification is that it does not require any changes to the compiler, assembler or linker. It can in fact be implemented using the currently existing tools, or even, older versions of these tools, so making the potential uptake of this solution a lot easier.
- What is being done to ensure the attributes are space and time efficient for dynamic link comparison in the dynamic linker? Speed of checking 10,000 DSOs (scalability) for ABI compatibility is going to be a very important requirement.
This is the purview of H.J's run-time annotations proposal. The basic idea I believe is to store only the information needed by the dynamic linker, and to store it in the form of bit masks for quick combination and verification. The main body of this proposal is for a specification for non-allocateable notes that would only be examined by static tools, and never used by the loader.
- Have you compared the markup used by DSEE (Domain Software Engineering Environment) from Apollo Computer in the 1980's? Given an executable it was standard practice to recreate [and cache] the entire exact tool chain that generated that file. When combined with the source version (also encoded in the binary), it was possible to retrieve the original source, apply patches, re-generate (compile, link, post-process) and update the program with a guarantee of runtime compatibility. Applying no patches would produce a bit-for-bit identical result, because every step was performed by the bit-identical tools that were used initially.
String Notes
Experience with using ELF notes for implementing this protocol has shown that they are insufficiently compact and cause serious problem when building large projects. So an alternative approach is to use strings instead. In this version of the protocol the notes are stored as strings in a separate section. Each note is its own string, and since string sections are mergeable, the linker will automatically eliminate duplicate notes.
The format of a string note is a two character note identifier, followed by a colon and then whatever value the note requires. Ranges are not encoded. Practice has shown that the range information provides very little benefit to the analysis tools, and causes a lot of problems when building binaries.
Wiki page categories
We use wiki categories to track progress.
- Proposed properties subject to discussion and revision.
- Accepted properties pending implementation.
- Implemented properties