ARM/Thumb2PortingHowto

Thumb2PortingHowto

Revision 4 as of 2010-01-29 16:54:10

UNDER CONSTRUCTION

When you see some assembler in a source package, there are some things which you need to consider when porting for Thumb-2 compatibility.

[FIXME - we really need examples to illustrate this page! When some packages have already been looked at, links should be added here]

How to Port Packages

[WRITE ME] [find the affected bits of code] [work out the implications and fix, depending on the issue types]

Key Thumb-2 compatibility Issues

Here's a quick breakdown of the key issues which may need attention

Procedure calls and returns

When using Thumb-2, the system will generally contain a mixture of ARM and Thumb-2 functions (depending on how libraries and binaries, and their component objects and functions, were assembled).

The processor does not automatically know which instruction set is used for the code being executed after a branch, procedure call or procedure return --- instead, it must be told which instruction set to use at the time of the branch or return.

Getting this right is known as "interworking".

For C code, it's magic and will "just work", but for assembler, when you need to jump around, you need to do it the right way and not the wrong way... otherwise the processor will try to interpret the code using the wrong instruction set and sooner or later crash the running process (it certainly won't be doing what the programmer intended... much as if you branched into some arbitrary data).

How it works

The target instruction set state is determined is different ways depending on the type of branch.

Note that the optional condition code at the end of each instruction mnemonic is omitted.

b <label> no switch
- This is (usually) the right way to do a non-returning jump to a label or function.
- if <label> is an external symbol defined elsewhere, the linker will magically patch it up to switch appropriately.
- Caution is required if ARM and Thumb are mixed in a single source file (rare), since there is no automatic instruction set switch for local symbols (in this unlikely case you may need to use bx instead). The assembler may or may not magically introduce a veneer/trampoline depending on whether or not it knows that the destination is in a different instruction set and is definitely a code symbol (à la .type <symbol>, %function or .thumb_func). Note that just because a symbol appears in a code section it is not assumed to be a code symbol unless specifically tagged in one of the aforementioned ways.
bl <label> no switch
- This is (usually) the right way to do a procedure call to a label or function.
- if <label> is an external symbol defined elsewhere, the linker will magically patch it up to switch appropriately --- but this does not guarantee a successful return unless the called function does a correct interworking return.
- The link register (LR or r14) is automatically set to the return address, just follwing the call. The bottom bit of LR is automatically set to 0 (ARM) or 1 (Thumb) to indicate which instruction set to switch to when returning. (This is not automatically done if you use an instruction like mov lr, pc to determine the return address --- this can lead to problems when the called function returns.)
- Caution is required if ARM and Thumb are mixed in a single source file (rare). As for b <label>, the assembler may or may not magically introduce a veneer/trampoline depending on whether or not it knows that the destination is in a different instruction set and is definitely a code symbol (à la .type <symbol>, %function or .thumb_func).
bx <register> (register is usually lr): switches depending on bottom bit of <register>
- The is one of the two right ways to do a procedure return.
- Assumes the LR value (or whatever <register> is used) has the bottom bit set correctly to indicate ARM or Thumb (which will be the case it it comes from the LR value set by correct procedure call)
- It is not supported on ARMv4, so for Debian compatibility a workaround is needed. Note that bx lr is better for performance on newer processors, since mov pc, lr may harm branch prediction performance.

#ifdef (___ARM_ARCH_4T__) || defined (__ARM_ARCH_4__)
        "mov    pc, lr"
#else
        "bx     lr"
#endif

blx <register> (register should usually not be lr): saves the return address in lr and calls a function at the address held in the specified <register>. Switches instruction set depending on bottom bit of <register>.
- This is the preferred way to do a procedure call to a computed or variable address.
- The called function must still do a correct return.
- This instruction is not supported on ARMv4(T), so code intended to build and work on Debian must use an alternative workaround. Because Debian does not use Thumb code, the following snippet is usually sensible (see "Computed destinations and returns" for an explanation of why this isn't safe for Thumb code, though):

#if defined (___ARM_ARCH_4T__) || defined (__ARM_ARCH_4__)
        "mov    lr, pc\n\t"
        "mov    pc, <register>"
#else
        "blx    <register>"
#endif

ldr pc, [...] or pop {..., pc} or ldmfd sp!, {..., pc}: switches depending on the bottom bit of the value loaded for PC.
- The other right way to do a procedure return.
- Assumes you saved the LR value as part of the function prologue.
- Assumes the LR value has the bottom bit set correctly to indicate ARM or Thumb (which will be the case of the procedure call was done in the right way)
- Debian-compatible
- Additional registers can be restored from the stack as part of the return, in the usual way.
mov pc, <register>: no switch unless executed from ARM code AND the processor is >= ARMv7
- Often the right way to implement inline jump tables (see "PC arithmetic and position-independent addressing")
- Debian-compatible way to do a procedure return or computed/variable branch (ARM only)
- Not recommended for procedure returns on ARMv7.
- Generally, it's best to avoid relying on the interworking behaviour of this instruction, since newer ARM processors are not all optimised to do efficient branch-prediction in this case, so something like this is preferable:

#ifdef (___ARM_ARCH_4T__) || defined (__ARM_ARCH_4__)
        "mov    pc, lr"
#else
        "bx     lr"
#endif

Computed destinations and returns

Whenever a destination or return address is variable or otherwise determined at run-time, you need to be careful to set the "thumb bit" (bit 0) in the address correctly and/or do the correct type of branch, to make sure that the call (and return, if applicable) switch instruction set appropriately.

labels or functions:

If you reference an external label or function defined in another object, the linker will magically give you an address with the "Thumb bit" (bit 0) set appropriately. This means you can branch to it safely with bx or blx, or store it in memory and load it into PC later, pass it to other functions as a callback, etc.
If you reference an symbol internal to the object, life is more "interesting":
- If the symbol is a C function the Thumb bit in the address you get will be set appropriately.
- If the symbol is tagged using an assembler directive .type <symbol>, %function or .thumb_func, the Thumb bit in the address you get will be set appropriately. (GCC always does this for C functions.)
- Otherwise, the Thumb bit will not be set appropriately.
- When referencing GNU assembler local labels (0b, 1f etc.) the Thumb bit will not be set appropriately.

".":

The current assembly location symbol in the GNU assembler (.) never has the Thumb bit set.
This behaviour is usually useful but sometimes unexpected. As a consequence, code like ldr r0, =. ; bx r0 is not an infinite loop in Thumb --- instead, it would re-execute the instructions as ARM any will probably crash the process.
Note that because b and bl do not switch instruction state, subs r0, r0, #1 ; bne . - 2 will work as a simple delay loop in Thumb (but you should never write it this way; see "PC and . arithmetic and position-independent addressing").

In cases where the Thumb bit is not set appropriately, it will simply be left as 0. For this reason, the distinction is not important when executing in ARM (where no instruction set change is implied by the Thumb bit), but is important in Thumb (where there may be an unintentional switch to ARM if you don't take corrective action).

The general rule is as follows: if the address will be passed to any other function or object (as a return address, method address, callback etc.) then you must ensure that the Thumb bit gets set if the code is assembler in Thumb.

However, if the little bit of code you're hacking knows that the Thumb bit is never set in the address, it may be safe not to set it so long as you bear this in mind. This can make sense for inline jump tables etc. -- see "Jump tables"

PC and "." arithmetic and position-independent addressing

Jump tables

The SWP instruction

The SWP instruction performs a locked read-write operation on a piece of memory, similar to the x86 xchg instruction. This can have a bad impact on performance in modern systems with a complex hardware architecture and/or multiple processors or other bus masters, so this instruction is deprecated.

Because the Thumb-2 instruction set was introduced after SWP became deprecated, there is no encoding for SWP in Thumb-2 at all; so SWP is not allowed when building for Thumb-2; this will lead to build failures.

See "Atomic Operations" for a more general discussion of how to port these cases.

Atomic operations SWP, LDREX, STREX and similar

[WRITE ME] [port to GCC intrinsics] [or port to LDREX/STREX --- check for memory barriers]

Operand combinations

Thumb-2 is generally a bit more restricted with regard to instruction operands. You may find that you get assembler errors when assembling for Thumb-2

Use of PC and SP

Note:

PC may also be denoted by "r15" in assembler source.
SP may also be denoted by "r13" in assembler source.

Generally, doing fancy stuff with the program counter and stack pointer registers is deprecated in ARMv7, and may not be allowed in Thumb-2 at all. Existing code which does some things will generally need some porting (though it is generally safe not to worry if you don't get errors or warnings when building).

If it sounds from the register's name like it may not have been designed for what you're doing, you are probably doing "fancy stuff" and should avoid it...

Details:

Stack Pointer (SP) register

For SP, you can push, pop, ldmfd sp!,..., stmfd sp!,... or add or sub or mov, but you should generally avoid doing anything else, and do not use the as a destination register in other operations, or attempt to push or pop sp itself from the stack. Older assemblers may not accept push and pop in ARM code; this may be (but probably is not) an issue for Debian compatibility.

Using SP as a base register for load and store operations is allowed, and you may add an offset to the address, but you may not multiply/shift/scale SP.

Upwards-growing stacks (ldmea sp!, ..., stmea sp!, ... etc.) are deprecated, but it is rare to encounter these.

Program Counter (PC) register

For PC, you cannot use it as the destination register in most operations, except for mov, ldr, and pop {..., pc} or ldmfd sp!, {..., pc}.

Using the PC as a source register in simple operations (add <reg>, pc, ..., sub <reg>, pc, ..., or mov <reg>, pc) is allowed but may produce different results in Thumb compared with ARM, and non-Thumb-aware code which does these things will need to be ported. In particular, attempts to manually determine a return address (mov lr, pc or similar) or index inline jump tables (ldr pc, [pc, <index>]) or similar may need attention.

Using the PC as a base address register (i.e., the first operand inside the brackets [pc ...]) is allowed in simple load and store operations, but you may not multiply/shift/scale or auto-update the PC (! or [pc],<index> syntax). Again, the results may be different between ARM and Thumb.

See "PC arithmetic and position-independent addressing" for more detail on this and how to handle these cases.

You should generally not push PC onto the stack or store it via push, str, stmfd etc.; some of these operations may not be allowed in Thumb-2, and results may differ between ARM and Thumb. However, loading PC from the stack is explicitly allowed as one of the "correct" ways of doing a procedure return - see "Procedure calls and returns".

ARM versus Thumb

It's important to understand what kind of assembler you're looking at, and the instruction set it will be assembled for. The primary reason for this is to understand the requirements for interworking (function calls between ARM and Thumb code) to work properly.

There are a few possibilities here:

Traditional ARM assembler

out-of-line assembler files (.s, .S)
the following directives are not present in the source: .code 16, .thumb, .thumb_func, .syntax unified
3- or 4-operand instructions present (e.g., add r0, r1, #3) and arbitrary conditional instructions (e.g., subgt r1, r4, r5)
assembles to fixed-size 32-bit instructions

Unified assembler

out-of-line assembler files (.s, .S)
the following directive is present in the source: .syntax unified
code looks similar to traditional ARM assembler
hashes (#) in front of immediate operands are not required and may be absent
except for conditional branches, conditional instruction sequences must be preceded immediately by it directives (such as itte eq / moveq r0, r1 / subeq r2, 1 / movne r0, 0
assembles either to fixed-size 32-bit (ARM) instructions, or mixed-size (16-/32-bit) Thumb-2 instructions, depending on the presence of .code, .thumb, .arm directives etc.

"Hybrid" assembler

all GCC inline assembler (.c, .h, .cpp, .cxx, .c++ and so on)
code intended to build for Thumb-2 (lucid default) or ARM, depending on GCC configuration and command-line switches (-marm, -mthumb)
code should be understandable as tradational ARM assembler and unified assembler
it blocks may be present (but are inferred otherwise --- may be better to omit them unless it is critial for performance, for compatibility with older Debian tools etc.)
hashes (#) in front of immediate integer constants should be present.
assembles either to fixed-size 32-bit (ARM) instructions, or mixed-size (16-/32-bit) Thumb-2 instructions (lucid default), depending on the GCC configuration and command-line options (-marm, -mthumb).

Traditional Thumb-1 assembler (rare)

You probably won't see any of this!
out-of-line assembler files (.s, .S)
any the following directives are present in the source: .code 16, .thumb, .thumb_func
the following directive is not present in the source: .syntax unified
mostly 2-operand instructions (add or sub can sometimes have 3)
all instructions are unconditional except for branches (beq etc.)

Ubuntu Wiki