GCC使用手记

目录

1 Attributes

1.1 别名/alias

GCC手册Common Function Attribute介绍,该特性导入一个符号的别名。如下示例:

#include <stdio.h>

void __f()
{
        printf("I am __f()\n");
}

void f() __attribute__ ((alias ("__f")));

int main(int argc, char *argv[])
{
        f();
        return 0;
}

定义函数 __f() ,并定义别名 f() 。另一种相关特性是 weak 属性,与别名组合 定义弱别名,该符号可以被同名符号替代,主要用于可被用户代码替换的库函数。弱符号 和强符号不能定义在同一个源代码文件中,否则会报重复定义错误。示例: 保存如下代码分别为a.c和b.c,并编译: gcc -o test-weak a.c b.c

/* save as a.c */
#include <stdio.h>

extern void (f)();

int main(int argc, char *argv[])
{
        f();
        return 0;
}

void f()
{
        printf("another function definition f()\n");
}
/* save follow as b.c */
#include <stdio.h>

void __f()
{
        printf("I am __f()\n");
}

void f() __attribute__ ((weak, alias ("__f")));

1.2 __builtin_expect

1.2.1 应用

Kernel常见的\(likely/unlikely\)基于此实现:

# define likely(x)      __builtin_expect(!!(x), 1)
# define unlikely(x)    __builtin_expect(!!(x), 0)

glibc中有类似定义:

#if __GNUC__ >= 3
# define __glibc_unlikely(cond) __builtin_expect ((cond), 0)
# define __glibc_likely(cond)   __builtin_expect ((cond), 1)
#else
# define __glibc_unlikely(cond) (cond)
# define __glibc_likely(cond)   (cond)
#endif

1.2.2 说明

基于分支预测,辅助编译器生成更高效代码。\(if (__builtin_expect(expr, 0))\)表示 不大可能发生,而\(if (__builtin_expect(expr,1))\)表示很有可能发生的。应该在非常 “有可能”时才使用此参数,否则可带来负面影响。可以使用-fprofile-arcs搜集运行信息 辅助分析。

简单代码测试 \(__builtin_expect\) 生成的汇编差异:

#include <stdio.h>

int func_expect_0(int i)
{
        int result;

        if (__builtin_expect(i > 2, 0)) {
                result = 100;
        } else {
                result = 200;
        }

        return result;
}

int func_expect_1(int i)
{
        int result;

        if (__builtin_expect(i > 4, 1)) {
                result = 100;
        } else {
                result = 200;
        }

        return result;
}

int main(int argc, char **argv)
{

        int ret1, ret2;

        ret1 = func_expect_0(argc);
        ret2 = func_expect_1(argc);

        printf("ret1=%d, ret2=%d\n", ret1, ret2);

        return 0;
}

使用gcc -Og built_expect.c观察,跳转处调整分支位置。用-O优化时我的机器上 编译器消除了差异:

        .globl  func_expect_0
        .def    func_expect_0;  .scl    2;      .type   32;     .endef
        .seh_proc       func_expect_0
func_expect_0:
        .seh_endprologue
        cmpl    $2, %ecx
        jg      .L4
        movl    $200, %eax
.L1:
        ret
.L4:
        movl    $100, %eax
        jmp     .L1
        .seh_endproc
        .globl  func_expect_1
        .def    func_expect_1;  .scl    2;      .type   32;     .endef
        .seh_proc       func_expect_1
func_expect_1:
        .seh_endprologue
        cmpl    $4, %ecx
        jle     .L7
        movl    $100, %eax
.L5:
        ret
.L7:
        movl    $200, %eax
        jmp     .L5
        .seh_endproc

直接编译可执行文件,objdump结果如下:

~$ gcc -Og -o builtin_expect builtin_expect.c
~$ objdump -d builtin_expect
...
00000000000006b0 <func_expect_0>:
 6b0:   83 ff 02                cmp    $0x2,%edi
 6b3:   7f 06                   jg     6bb <func_expect_0+0xb>
 6b5:   b8 c8 00 00 00          mov    $0xc8,%eax
 6ba:   c3                      retq
 6bb:   b8 64 00 00 00          mov    $0x64,%eax
 6c0:   c3                      retq

00000000000006c1 <func_expect_1>:
 6c1:   83 ff 04                cmp    $0x4,%edi
 6c4:   7e 06                   jle    6cc <func_expect_1+0xb>
 6c6:   b8 64 00 00 00          mov    $0x64,%eax
 6cb:   c3                      retq
 6cc:   b8 c8 00 00 00          mov    $0xc8,%eax
 6d1:   c3                      retq
...

1.2.3 性能差异测试

How much do __builtin_expect(), likely(), and unlikely() improve performance?描述 了不同场景下 \(__builtin_expect\) 的性能差异,但我未能测试到这点。在Linux下bash提供 默认的time,因此使用绝对路径\(/usr/bin/time\)测试 (安装 \(sudo apt-get install time\) ): 这里是测试代码,如下:

~$ cc -DDONT_EXPECT -O3 builtin_expect_test.c -o bn
~$ /usr/bin/time -f "%E real, %U user, %S sys" ./bn 1000
0, 1000000000
0:01.43 real, 1.42 user, 0.00 sys

~$ cc -DEXPECT_RESULT=0 -O3 builtin_expect_test.c -o b0
~$ /usr/bin/time -f "%E real, %U user, %S sys" ./b0 1000
0, 1000000000
0:01.41 real, 1.41 user, 0.00 sys

~$ cc -DEXPECT_RESULT=1 -O3 builtin_expect_test.c -o b1
~$ /usr/bin/time -f "%E real, %U user, %S sys" ./b1 1000
0, 1000000000
0:01.40 real, 1.40 user, 0.00 sys

1.2.4 onlinedocs

Built-in Function: long __builtin_expect (long exp, long c)
You may use __builtin_expect to provide the compiler with branch prediction information. In general, you should prefer to use actual profile feedback for this (-fprofile-arcs), as programmers are notoriously bad at predicting how their programs actually perform. However, there are applications in which this data is hard to collect.

The return value is the value of exp, which should be an integral expression. The semantics of the built-in are that it is expected that exp == c. For example:

if (__builtin_expect (x, 0))
  foo ();
indicates that we do not expect to call foo, since we expect x to be zero. Since you are limited to integral expressions for exp, you should use constructions such as

if (__builtin_expect (ptr != NULL, 1))
  foo (*ptr);
when testing pointer or floating-point values.

1.2.5 manpages

-fno-guess-branch-probability Do not guess branch probabilities using heuristics.

GCC uses heuristics to guess branch probabilities if they are not provided by profiling feedback (-fprofile-arcs). These heuristics are based on the control flow graph. If some branch probabilities are specified by "__builtin_expect", then the heuristics are used to guess branch probabilities for the rest of the control flow graph, taking the "__builtin_expect" info into account. The interactions between the heuristics and "__builtin_expect" can be complex, and in some cases, it may be useful to disable the heuristics so that the effects of "__builtin_expect" are easier to understand.

The default is -fguess-branch-probability at levels -O, -O2, -O3, -Os.

builtin-expect-probability Control the probability of the expression having the specified value. This parameter takes a percentage (i.e. 0 … 100) as input. The default probability of 90 is obtained empirically.

1.3 __builtin_constant_p

GCC内置函数__builtin_constant_p(expr)用来检测expr是否为编译期常数,为编译期常数 返回1,否则返回0。可基于检测结果决定是否使用常量表达式优化。

Built-in Function: int __builtin_constant_p (exp)
You can use the built-in function __builtin_constant_p to determine if a value is known to be constant at compile time and hence that GCC can perform constant-folding on expressions involving that value. The argument of the function is the value to test. The function returns the integer 1 if the argument is known to be a compile-time constant and 0 if it is not known to be a compile-time constant. A return of 0 does not indicate that the value is not a constant, but merely that GCC cannot prove it is a constant with the specified value of the -O option.

You typically use this function in an embedded application where memory is a critical resource. If you have some complex calculation, you may want it to be folded if it involves constants, but need to call a function if it does not. For example:

#define Scale_Value(X)                          \
        (__builtin_constant_p (X)               \
         ? ((X) * SCALE + OFFSET) : Scale (X))

You may use this built-in function in either a macro or an inline function. However, if you use it in an inlined function and pass an argument of the function as the argument to the built-in, GCC never returns 1 when you call the inline function with a string constant or compound literal (see Compound Literals) and does not return 1 when you pass a constant numeric value to the inline function unless you specify the -O option.

You may also use __builtin_constant_p in initializers for static data. For instance, you can write

static const int table[] = {
        __builtin_constant_p (EXPRESSION) ? (EXPRESSION) : -1,
        /*  */
};

This is an acceptable initializer even if EXPRESSION is not a constant expression, including the case where __builtin_constant_p returns 1 because EXPRESSION can be folded to a constant but EXPRESSION contains operands that are not otherwise permitted in a static initializer (for example, 0 && foo ()). GCC must be more conservative about evaluating the built-in in this case, because it has no opportunity to perform optimization.

示例如下:

#include <stdio.h>

const int val = 10;

int main(int argc, char *argv[argc])
{
        int iscc = 0;

        if (__builtin_constant_p(val))
                iscc = 1;
        else
                iscc = 0;

        printf("iscc=%d\n", iscc);

        return 0;
}

Linux Kernel较多了使用此特性:

/* Copy from arch/x86/boot/bitops.h */
#define test_bit(nr,addr)                       \
        (__builtin_constant_p(nr) ?             \
         constant_test_bit((nr),(addr)) :       \
         variable_test_bit((nr),(addr)))

1.4 weak

2 Debug Skills/调试技巧

2.1 -Wl,-wrap

2.1.1 Usage: Wrap system or customized library functions

  • For function func, define __wrap_func and extern __real_func;

2.1.2 Example for wrap library malloc and free

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

unsigned long mcount = 0;

extern void * __real_malloc(size_t size);
extern void  __real_free(void *ptr);

void *__wrap_malloc(size_t size)
{
        void *ptr = __real_malloc(size);

        if (ptr)
                ++mcount;

        printf("wrap malloc ptr=%p, size=%zu, count=%lu\n", ptr, size, mcount);

        return ptr;
}

void __wrap_free(void *ptr)
{
        __real_free(ptr);
        if (ptr)
                --mcount;
        printf("wrap free   ptr=%p, count=%lu\n", ptr, mcount);
}

int main(int argc, char *argv[argc])
{
        int *pi;

        pi = malloc(sizeof(*pi)*10);
        memset(pi, 0, sizeof(*pi)*10);
        free(pi);

        return 0;
}

3 References