Adding support for atomic operations on ARM systems
A number of packages in Fedora require atomic operations for synchronization. Unfortunately, older ARM processors (prior to ARMv6) did not provide hardware instructions to support this. Kernel helpers were added to work around this limitation, but this meant that platform-specific inline assembly code had to be added to make use of these helpers. Jon Masters wrote a good summary of the state of support, and how to use these helpers.
Using these could be a bit tricky, and it was suggested that instead of using platform specific inline assembly and kernel helpers we should consider using the GCC built-in functions to implement the necessary functionality. Looking in the GCC documentation, there appeared to be two alternatives, __sync built-in functions for atomic memory access (as of GCC-4.1, considered legacy) and Built-in functions for memory model aware atomic operations (__atomic, as of GCC-4.7).
The __sync built-in functions are more widely supported (older versions of GCC), but are now considered 'legacy', and will be deprecated. In addition, they did not provide support for older ARM processors (ARMv5 and older). The __atomic built-in functions are the replacement for __sync, and do provide support for ARMv5 through fall-back library functions, but only support the C++11 memory model (GCC-4.7 or newer).
Since Fedora 18 already provides GCC-4.7, and Fedora 18 still supports ARMv5, the __atomic built-in functions were used to add support to some packages which failed to build on ARM. To maintain backward compatibility for other architectures, the version of GCC must be checked, and the appropriate code included. The GCC version can be determined using the following macro:
#define GCC_VERSION (__GNUC__ * 10000 \ + __GNUC_MINOR__ * 100 \ + __GNUC_PATCHLEVEL__)
When updating support for atomic operations, there may be two cases to be addressed:
1) replace the legacy __sync built-in functions with __atomic built-in functions
2) replace platform-specific inline-assembly with __atomic built-in functions
The first case is relatively straight forward. For each __sync built-in function there is a corresponding __atomic built-in function. The arguments to these functions match (position and type), but the __atomic built-ins have an additional argument (memory model). Choosing the strongest memory model (ATOMIC_SEQ_CST) is safe for replacing __sync built-in functions. Here is an example of such a replacement for atomic prefix/postfix increment/decrement (taken from the mongodb package).
#if defined(GCC_VERSION) && GCC_VERSION >= 40700 // in GCC version >= 4.7.0 we can use the built-in __atomic operations inline void AtomicUInt::set(unsigned newX) { __atomic_store_n (&x, newX, __ATOMIC_SEQ_CST); } AtomicUInt AtomicUInt::operator++() { // ++prefix return __atomic_add_fetch(&x, 1, __ATOMIC_SEQ_CST); } AtomicUInt AtomicUInt::operator++(int) { // postfix++ return __atomic_fetch_add(&x, 1, __ATOMIC_SEQ_CST); } AtomicUInt AtomicUInt::operator--() { // --prefix return __atomic_add_fetch(&x, -1, __ATOMIC_SEQ_CST); } AtomicUInt AtomicUInt::operator--(int) { // postfix-- return __atomic_fetch_add(&x, -1, __ATOMIC_SEQ_CST); } void AtomicUInt::signedAdd(int by) { __atomic_fetch_add(&x, by, __ATOMIC_SEQ_CST); } #elif defined(GCC_VERSION) && GCC_VERSION >= 40100 // in GCC version >= 4.1.0 we can use the __sync built-in atomic operations inline void AtomicUInt::set(unsigned newX) { __sync_synchronize(); x = newX; } AtomicUInt AtomicUInt::operator++() { return __sync_add_and_fetch(&x, 1); } AtomicUInt AtomicUInt::operator++(int) { return __sync_fetch_and_add(&x, 1); } AtomicUInt AtomicUInt::operator--() { return __sync_add_and_fetch(&x, -1); } AtomicUInt AtomicUInt::operator--(int) { return __sync_fetch_and_add(&x, -1); } void AtomicUInt::signedAdd(int by) { __sync_fetch_and_add(&x, by); } #else // use any platform-specific inline-assembly : #endif
For the second case, the provided built-in function which most closely provides the functionality that matches the inline assembly code must be selected, for example:
#if defined(GCC_VERSION) && GCC_VERSION >= 40700 // If GCC version >= 4.7.0, we can use the built-in __atomic operations static T compareAndSwap(volatile T* dest, T expected, T newValue) { return __atomic_compare_exchange_n (dest, &expected, newValue, 0, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST); } static T swap(volatile T* dest, T newValue) { return __atomic_exchange_n (dest, newValue, __ATOMIC_SEQ_CST); } static T load(volatile const T* value) { return __atomic_load_n (value, __ATOMIC_SEQ_CST); } static void store(volatile T* dest, T newValue) { __atomic_store_n (dest, newValue, __ATOMIC_SEQ_CST); } static T fetchAndAdd(volatile T* dest, T increment) { return __atomic_fetch_add (dest, increment, __ATOMIC_SEQ_CST); } #else // GCC version < 4.7 // use legacy (platform-specific) atomic operations static T compareAndSwap(volatile T* dest, T expected, T newValue) { T result; asm volatile ("lock cmpxchg %[src], %[dest]" : [dest] "+m" (*dest), "=a" (result) : [src] "r" (newValue), "a" (expected) : "memory", "cc"); return result; } static T swap(volatile T* dest, T newValue) { T result = newValue; // No need for "lock" prefix on "xchg". asm volatile ("xchg %[r], %[dest]" : [dest] "+m" (*dest), [r] "+r" (result) : : "memory"); return result; } static T load(volatile const T* value) { asm volatile ("mfence" ::: "memory"); T result = *value; asm volatile ("mfence" ::: "memory"); return result; } static void store(volatile T* dest, T newValue) { asm volatile ("mfence" ::: "memory"); *dest = newValue; asm volatile ("mfence" ::: "memory"); } static T fetchAndAdd(volatile T* dest, T increment) { T result = increment; asm volatile ("lock xadd %[src], %[dest]" : [dest] "+m" (*dest), [src] "+r" (result) : : "memory", "cc"); return result; } #endif
The above example shows the 32-bit x86 inline assembly code for selected functions. If the GCC built-in functions are not used, code similar to the platform-specific examples above must be included for every supported architecture. When the requisite hardware support is not provided by the architecture, as with ARMv5, kernel helper functions and inline assembly to call them must be used.
While this only shows a few examples of how the GCC built-in __atomic operations can be used, it does show how to add ARM support to packages where it was previously lacking, or to replace old inline assembly with platform-neutral code that should be more readable and maintainable going forward.