GCC内联汇编

1 什么是内联汇编

通过C标准或编译器特有的语法规则，用汇编语言编写函数，称为内联汇编(Inline Assembly)。C标准定义关键词__asm__，GCC扩展关键词为asm（参见这里）。

2 为什么使用内联汇编

使用内联汇编基于两个原因：

性能：通过精准控制汇编实现，提供更好的性能
功能：部分功能只能通过汇编语言实现，比如不使用系统调用获取时间戳(Intel RDTSC)

3 内联汇编语法格式

完整的清单请查看GCC Mnaual - For Using Assembly Language with C。

3.1 Basic Asm

asm [volatile] ( AssemblerInstruction )

可选的volatile没有任何作用，因为所有basic asm都是隐含volatile的。

AssemblerInstruction是字符串形式的汇编码，GCC不解析basic asm汇编字符串含义。如果有多条汇编指令，请添加分隔符，一般是"\n\t"。

通常建议使用extended asm，它能够生成更小、更安全、更快速的代码。但如果想在文件级别使用内联汇编（比如定义汇编宏或修改汇编指令），必须使用basic asm，因为 extended asm只能在C函数内部使用。

如果需要使用C语言中的数据，建议使用extend asm。GCC文档描述如何将basic asm转换为 extend asm，参考这里。

GCC可能会在优化时复制或消除一些汇编语句，因此如果在汇编中定义符号，可能导致编译错误。

一个简单的basic asm例子：

#define DebugBreak() asm("int $3")

#include <stdio.h>

int sum = 0;

int main(int argc, char *argv[argc])
{
        asm("movl $100, sum\n\t");
        printf("sum=%d\n", sum);
        return 0;
}

3.2 Extend Asm

asm [volatile] ( AssemblerTemplate
                 : OutputOperands
                 [ : InputOperands
                 [ : Clobbers ] ])

asm [volatile] goto ( AssemblerTemplate
                      :
                      : InputOperands
                      : Clobbers
                      : GotoLabels)

3.2.1 volatile

volatile和C语言中的一致，如果汇编指令产生边界效应，需要添加volatile禁止优化。如下示例：

#include <stdio.h>
#include <stdint.h>
#include <unistd.h>

static inline uint64_t nv_rdtsc(void)
{
        uint64_t rts;

        asm ("rdtsc\n\t"    // Returns the time in EDX:EAX.
             "shl $32, %%rdx\n\t"  // Shift the upper bits left.
             "or %%rdx, %0"        // 'Or' in the lower bits.
             : "=a" (rts)
             :
             : "rdx");

        return rts;
}

static inline uint64_t rdtsc(void)
{
        uint64_t rts;

        asm volatile ("rdtsc\n\t"    // Returns the time in EDX:EAX.
                      "shl $32, %%rdx\n\t"  // Shift the upper bits left.
                      "or %%rdx, %0"        // 'Or' in the lower bits.
                      : "=a" (rts)
                      :
                      : "rdx");

        return rts;
}

int main(int argc, char *argv[argc])
{
        uint64_t rts1, rts2, rts3;

        rts1 = nv_rdtsc();
        usleep(10);
        rts2 = nv_rdtsc();
        usleep(20);
        rts3 = nv_rdtsc();

        printf("rts = %lu, %lu, %lu\n", rts1, rts2, rts3);

        rts1 = rdtsc();
        usleep(10);
        rts2 = rdtsc();
        usleep(20);
        rts3 = rdtsc();
        printf("rts = %lu, %lu, %lu\n", rts1, rts2, rts3);

        return 0;
}

Manual说明：

https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Volatile

GCC’s optimizers sometimes discard asm statements if they determine there is no need for the output variables. Also, the optimizers may move code out of loops if they believe that the code will always return the same result (i.e. none of its input values change between calls). Using the volatile qualifier disables these optimizations. asm statements that have no output operands, including asm goto statements, are implicitly volatile.

This i386 code demonstrates a case that does not use (or require) the volatile qualifier. If it is performing assertion checking, this code uses asm to perform the validation. Otherwise, dwRes is unreferenced by any code. As a result, the optimizers can discard the asm statement, which in turn removes the need for the entire DoCheck routine. By omitting the volatile qualifier when it isn’t needed you allow the optimizers to produce the most efficient code possible.
void DoCheck(uint32_t dwSomeValue)
{
   uint32_t dwRes;

   // Assumes dwSomeValue is not zero.
   asm ("bsfl %1,%0"
     : "=r" (dwRes)
     : "r" (dwSomeValue)
     : "cc");

   assert(dwRes > 3);
}
The next example shows a case where the optimizers can recognize that the input (dwSomeValue) never changes during the execution of the function and can therefore move the asm outside the loop to produce more efficient code. Again, using volatile disables this type of optimization.
void do_print(uint32_t dwSomeValue)
{
   uint32_t dwRes;

   for (uint32_t x=0; x < 5; x++)
   {
      // Assumes dwSomeValue is not zero.
      asm ("bsfl %1,%0"
        : "=r" (dwRes)
        : "r" (dwSomeValue)
        : "cc");

      printf("%u: %u %u\n", x, dwSomeValue, dwRes);
   }
}
The following example demonstrates a case where you need to use the volatile qualifier. It uses the x86 rdtsc instruction, which reads the computer’s time-stamp counter. Without the volatile qualifier, the optimizers might assume that the asm block will always return the same value and therefore optimize away the second call.
uint64_t msr;

asm volatile ( "rdtsc\n\t"    // Returns the time in EDX:EAX.
        "shl $32, %%rdx\n\t"  // Shift the upper bits left.
        "or %%rdx, %0"        // 'Or' in the lower bits.
        : "=a" (msr)
        :
        : "rdx");

printf("msr: %llx\n", msr);

// Do other work...

// Reprint the timestamp
asm volatile ( "rdtsc\n\t"    // Returns the time in EDX:EAX.
        "shl $32, %%rdx\n\t"  // Shift the upper bits left.
        "or %%rdx, %0"        // 'Or' in the lower bits.
        : "=a" (msr)
        :
        : "rdx");

printf("msr: %llx\n", msr);
GCC’s optimizers do not treat this code like the non-volatile code in the earlier examples. They do not move it out of loops or omit it on the assumption that the result from a previous call is still valid.

Note that the compiler can move even volatile asm instructions relative to other code, including across jump instructions. For example, on many targets there is a system register that controls the rounding mode of floating-point operations. Setting it with a volatile asm, as in the following PowerPC example, does not work reliably.
asm volatile("mtfsf 255, %0" : : "f" (fpenv));
sum = x + y;
The compiler may move the addition back before the volatile asm. To make it work as expected, add an artificial dependency to the asm by referencing a variable in the subsequent code, for example:
asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv));
sum = x + y;
Under certain circumstances, GCC may duplicate (or remove duplicates of) your assembly code when optimizing. This can lead to unexpected duplicate symbol errors during compilation if your asm code defines symbols or labels. Using ‘%=’ (see AssemblerTemplate) may help resolve this problem.

3.2.2 goto

允许汇编指令跳转到C语言定义的GotoLabels任一个，多个GotoLabels用逗号分割。允许 goto跳转的汇编块不允许包含输出。如下代码示例显示了内联跳转。

#include <stdio.h>

void func(int i)
{
        asm goto ("cmpl $1,%0\n\t"
                  "je %l[Label1]\n\t"
                  "cmpl $2, %0\n\t"
                  "je %l[Label2]\n\t"
                  : /* no outputs */
                  : "r"(i)
                  : /* no clobber */
                  : Label1, Label2
                 );

        printf("asm no goto\n");
        return;

Label1:
        printf("asm goto Label1\n");
        return;

Label2:
        printf("asm goto Label2\n");
        return;
}

int main(int argc, char *argv[argc])
{
        func(0);
        func(1);
        func(2);
        func(3);
        return 0;
}

注意上述示例中，根据gcc规则，常数必须是cmpl的第一个参数。

https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#GotoLabels

asm goto allows assembly code to jump to one or more C labels. The GotoLabels section in an asm goto statement contains a comma-separated list of all C labels to which the assembler code may jump. GCC assumes that asm execution falls through to the next statement (if this is not the case, consider using the __builtin_unreachable intrinsic after the asm statement). Optimization of asm goto may be improved by using the hot and cold label attributes (see Label Attributes).

An asm goto statement cannot have outputs. This is due to an internal restriction of the compiler: control transfer instructions cannot have outputs. If the assembler code does modify anything, use the "memory" clobber to force the optimizers to flush all register values to memory and reload them if necessary after the asm statement.

Also note that an asm goto statement is always implicitly considered volatile.

To reference a label in the assembler template, prefix it with ‘%l’ (lowercase ‘L’) followed by its (zero-based) position in GotoLabels plus the number of input operands. For example, if the asm has three inputs and references two labels, refer to the first label as ‘%l3’ and the second as ‘%l4’).

Alternately, you can reference labels using the actual C label name enclosed in brackets. For example, to reference a label named carry, you can use ‘%l[carry]’. The label must still be listed in the GotoLabels section when using this approach.

Here is an example of asm goto for i386:
asm goto (
    "btl %1, %0\n\t"
    "jc %l2"
    : /* No outputs. */
    : "r" (p1), "r" (p2)
    : "cc"
    : carry);

return 0;

carry:
return 1;
The following example shows an asm goto that uses a memory clobber.
int frob(int x)
{
  int y;
  asm goto ("frob %%r5, %1; jc %l[error]; mov (%2), %%r5"
            : /* No outputs. */
            : "r"(x), "r"(&y)
            : "r5", "memory"
            : error);
  return y;
error:
  return -1;
}

3.2.3 AssemblerTemplate

https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#AssemblerTemplate

An assembler template is a literal string containing assembler instructions. The compiler replaces tokens in the template that refer to inputs, outputs, and goto labels, and then outputs the resulting string to the assembler. The string can contain any instructions recognized by the assembler, including directives. GCC does not parse the assembler instructions themselves and does not know what they mean or even whether they are valid assembler input. However, it does count the statements (see Size of an asm).

You may place multiple assembler instructions together in a single asm string, separated by the characters normally used in assembly code for the system. A combination that works in most places is a newline to break the line, plus a tab character to move to the instruction field (written as ‘\n\t’). Some assemblers allow semicolons as a line separator. However, note that some assembler dialects use semicolons to start a comment.

Do not expect a sequence of asm statements to remain perfectly consecutive after compilation, even when you are using the volatile qualifier. If certain instructions need to remain consecutive in the output, put them in a single multi-instruction asm statement.

Accessing data from C programs without using input/output operands (such as by using global symbols directly from the assembler template) may not work as expected. Similarly, calling functions directly from an assembler template requires a detailed understanding of the target assembler and ABI.

Since GCC does not parse the assembler template, it has no visibility of any symbols it references. This may result in GCC discarding those symbols as unreferenced unless they are also listed as input, output, or goto operands.

Special format strings

In addition to the tokens described by the input, output, and goto operands, these tokens have special meanings in the assembler template:

‘%%’ Outputs a single ‘%’ into the assembler code.

‘%=’ Outputs a number that is unique to each instance of the asm statement in the entire compilation. This option is useful when creating local labels and referring to them multiple times in a single template that generates multiple assembler instructions.

‘%{’ ‘%|’ ‘%}’ Outputs ‘{’, ‘|’, and ‘}’ characters (respectively) into the assembler code. When unescaped, these characters have special meaning to indicate multiple assembler dialects, as described below.

Multiple assembler dialects in asm templates

On targets such as x86, GCC supports multiple assembler dialects. The -masm option controls which dialect GCC uses as its default for inline assembler. The target-specific documentation for the -masm option contains the list of supported dialects, as well as the default dialect if the option is not specified. This information may be important to understand, since assembler code that works correctly when compiled using one dialect will likely fail if compiled using another. See x86 Options.

If your code needs to support multiple assembler dialects (for example, if you are writing public headers that need to support a variety of compilation options), use constructs of this form:

{ dialect0 | dialect1 | dialect2… } This construct outputs dialect0 when using dialect #0 to compile the code, dialect1 for dialect #1, etc. If there are fewer alternatives within the braces than the number of dialects the compiler supports, the construct outputs nothing.

For example, if an x86 compiler supports two dialects (‘att’, ‘intel’), an assembler template such as this:
"bt{l %[Offset],%[Base] | %[Base],%[Offset]}; jc %l2"
is equivalent to one of
"btl %[Offset],%[Base] ; jc %l2"   /* att dialect */
"bt %[Base],%[Offset]; jc %l2"     /* intel dialect */
Using that same compiler, this code:
"xchg{l}\t{%%}ebx, %1"
corresponds to either
"xchgl\t%%ebx, %1"                 /* att dialect */
"xchg\tebx, %1"                    /* intel dialect */
There is no support for nesting dialect alternatives.

3.3 Output Operands

https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#OutputOperands

An asm statement has zero or more output operands indicating the names of C variables modified by the assembler code.

In this i386 example, old (referred to in the template string as %0) and *Base (as %1) are outputs and Offset (%2) is an input:
bool old;

__asm__ ("btsl %2,%1\n\t" // Turn on zero-based bit #Offset in Base.
         "sbb %0,%0"      // Use the CF to calculate old.
   : "=r" (old), "+rm" (*Base)
   : "Ir" (Offset)
   : "cc");

return old;
Operands are separated by commas. Each operand has this format:
[ [asmSymbolicName] ] constraint (cvariablename)
asmSymbolicName
Specifies a symbolic name for the operand. Reference the name in the assembler template by enclosing it in square brackets (i.e. ‘%[Value]’). The scope of the name is the asm statement that contains the definition. Any valid C variable name is acceptable, including names already defined in the surrounding code. No two operands within the same asm statement can use the same symbolic name.

When not using an asmSymbolicName, use the (zero-based) position of the operand in the list of operands in the assembler template. For example if there are three output operands, use ‘%0’ in the template to refer to the first, ‘%1’ for the second, and ‘%2’ for the third.

constraint
A string constant specifying constraints on the placement of the operand; See Constraints, for details.

Output constraints must begin with either ‘=’ (a variable overwriting an existing value) or ‘+’ (when reading and writing). When using ‘=’, do not assume the location contains the existing value on entry to the asm, except when the operand is tied to an input; see Input Operands.

After the prefix, there must be one or more additional constraints (see Constraints) that describe where the value resides. Common constraints include ‘r’ for register and ‘m’ for memory. When you list more than one possible location (for example, "=rm"), the compiler chooses the most efficient one based on the current context. If you list as many alternates as the asm statement allows, you permit the optimizers to produce the best possible code. If you must use a specific register, but your Machine Constraints do not provide sufficient control to select the specific register you want, local register variables may provide a solution (see Local Register Variables).

cvariablename
Specifies a C lvalue expression to hold the output, typically a variable name. The enclosing parentheses are a required part of the syntax.

When the compiler selects the registers to use to represent the output operands, it does not use any of the clobbered registers (see Clobbers and Scratch Registers).

Output operand expressions must be lvalues. The compiler cannot check whether the operands have data types that are reasonable for the instruction being executed. For output expressions that are not directly addressable (for example a bit-field), the constraint must allow a register. In that case, GCC uses the register as the output of the asm, and then stores that register into the output.

Operands using the ‘+’ constraint modifier count as two operands (that is, both as input and output) towards the total maximum of 30 operands per asm statement.

Use the ‘&’ constraint modifier (see Modifiers) on all output operands that must not overlap an input. Otherwise, GCC may allocate the output operand in the same register as an unrelated input operand, on the assumption that the assembler code consumes its inputs before producing outputs. This assumption may be false if the assembler code actually consists of more than one instruction.

The same problem can occur if one output parameter (a) allows a register constraint and another output parameter (b) allows a memory constraint. The code generated by GCC to access the memory address in b can contain registers which might be shared by a, and GCC considers those registers to be inputs to the asm. As above, GCC assumes that such input registers are consumed before any outputs are written. This assumption may result in incorrect behavior if the asm writes to a before using b. Combining the ‘&’ modifier with the register constraint on a ensures that modifying a does not affect the address referenced by b. Otherwise, the location of b is undefined if a is modified before using b.

asm supports operand modifiers on operands (for example ‘%k2’ instead of simply ‘%2’). Typically these qualifiers are hardware dependent. The list of supported modifiers for x86 is found at x86 Operand modifiers.

If the C code that follows the asm makes no use of any of the output operands, use volatile for the asm statement to prevent the optimizers from discarding the asm statement as unneeded (see Volatile).

This code makes no use of the optional asmSymbolicName. Therefore it references the first output operand as %0 (were there a second, it would be %1, etc). The number of the first input operand is one greater than that of the last output operand. In this i386 example, that makes Mask referenced as %1:
uint32_t Mask = 1234;
uint32_t Index;

asm ("bsfl %1, %0"
     : "=r" (Index)
     : "r" (Mask)
     : "cc");
That code overwrites the variable Index (‘=’), placing the value in a register (‘r’). Using the generic ‘r’ constraint instead of a constraint for a specific register allows the compiler to pick the register to use, which can result in more efficient code. This may not be possible if an assembler instruction requires a specific register.

The following i386 example uses the asmSymbolicName syntax. It produces the same result as the code above, but some may consider it more readable or more maintainable since reordering index numbers is not necessary when adding or removing operands. The names aIndex and aMask are only used in this example to emphasize which names get used where. It is acceptable to reuse the names Index and Mask.
uint32_t Mask = 1234;
uint32_t Index;

asm ("bsfl %[aMask], %[aIndex]"
     : [aIndex] "=r" (Index)
     : [aMask] "r" (Mask)
     : "cc");
Here are some more examples of output operands.
uint32_t c = 1;
uint32_t d;
uint32_t *e = &c;

asm ("mov %[e], %[d]"
     : [d] "=rm" (d)
     : [e] "rm" (*e));
Here, d may either be in a register or in memory. Since the compiler might already have the current value of the uint32_t location pointed to by e in a register, you can enable it to choose the best location for d by specifying both constraints.

3.4 Input Operands

https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#InputOperands

Input operands make values from C variables and expressions available to the assembly code.

Operands are separated by commas. Each operand has this format:
[ [asmSymbolicName] ] constraint (cexpression)
asmSymbolicName
Specifies a symbolic name for the operand. Reference the name in the assembler template by enclosing it in square brackets (i.e. ‘%[Value]’). The scope of the name is the asm statement that contains the definition. Any valid C variable name is acceptable, including names already defined in the surrounding code. No two operands within the same asm statement can use the same symbolic name.

When not using an asmSymbolicName, use the (zero-based) position of the operand in the list of operands in the assembler template. For example if there are two output operands and three inputs, use ‘%2’ in the template to refer to the first input operand, ‘%3’ for the second, and ‘%4’ for the third.

constraint
A string constant specifying constraints on the placement of the operand; See Constraints, for details.

Input constraint strings may not begin with either ‘=’ or ‘+’. When you list more than one possible location (for example, ‘"irm"’), the compiler chooses the most efficient one based on the current context. If you must use a specific register, but your Machine Constraints do not provide sufficient control to select the specific register you want, local register variables may provide a solution (see Local Register Variables).

Input constraints can also be digits (for example, "0"). This indicates that the specified input must be in the same place as the output constraint at the (zero-based) index in the output constraint list. When using asmSymbolicName syntax for the output operands, you may use these names (enclosed in brackets ‘[]’) instead of digits.

cexpression
This is the C variable or expression being passed to the asm statement as input. The enclosing parentheses are a required part of the syntax.

When the compiler selects the registers to use to represent the input operands, it does not use any of the clobbered registers (see Clobbers and Scratch Registers).

If there are no output operands but there are input operands, place two consecutive colons where the output operands would go:
__asm__ ("some instructions"
         : /* No outputs. */
         : "r" (Offset / 8));
Warning: Do not modify the contents of input-only operands (except for inputs tied to outputs). The compiler assumes that on exit from the asm statement these operands contain the same values as they had before executing the statement. It is not possible to use clobbers to inform the compiler that the values in these inputs are changing. One common work-around is to tie the changing input variable to an output variable that never gets used. Note, however, that if the code that follows the asm statement makes no use of any of the output operands, the GCC optimizers may discard the asm statement as unneeded (see Volatile).

asm supports operand modifiers on operands (for example ‘%k2’ instead of simply ‘%2’). Typically these qualifiers are hardware dependent. The list of supported modifiers for x86 is found at x86 Operand modifiers.

In this example using the fictitious combine instruction, the constraint "0" for input operand 1 says that it must occupy the same location as output operand 0. Only input operands may use numbers in constraints, and they must each refer to an output operand. Only a number (or the symbolic assembler name) in the constraint can guarantee that one operand is in the same place as another. The mere fact that foo is the value of both operands is not enough to guarantee that they are in the same place in the generated assembler code.
asm ("combine %2, %0"
     : "=r" (foo)
     : "0" (foo), "g" (bar));
Here is an example using symbolic names.
asm ("cmoveq %1, %2, %[result]"
     : [result] "=r"(result)
     : "r" (test), "r" (new), "[result]" (old));