Yanyg - Software Engineer

编译器未定义行为与程序正确性

目录

1 几个例子

1.1 数学运算溢出

#include <stdio.h>
#include <stdint.h>

void checkAdd(int a)
{
    if (a + 100 > a)
    {
        printf("%d > %d\n", a + 100, a);
    }
    else
    {
        printf("%d > %d\n", a + 100, a);
    }
}

int main(int argc, char *argv[])
{
    checkAdd(100);
    checkAdd(INT32_MAX);
    return 0;
}

[yanyg@x1{192.168.0.106} ~/test ] $ ./a.out 200 > 100 -2147483549 > 2147483647

给编译器加参数可以在运行时捕捉到类似问题:

[yanyg@x1{192.168.0.106} ~/test ]
$ gcc -Wall overflow.c -ubsan -fsanitize=undefined
[yanyg@x1{192.168.0.106} ~/test ]
$ ./a.out
200 > 100
overflow.c:6:17: runtime error: signed integer overflow: 2147483647 + 99 cannot be represented in type 'int'
overflow.c:12:9: runtime error: signed integer overflow: 2147483647 + 100 cannot be represented in type 'int'
-2147483549 > 2147483647

1.2 数组越界问题

之前有个同事遇到的一个生产bug:想通过min确保长度不会溢出,实际溢出了。

#include <stdio.h>
#include <string.h>

typedef struct Bar
{
    char name[16];
    int mode;
    int zero;
} Bar;

#define MIN(a, b)       ((a) < (b) ? (a) : (b))

int main(int argc, char *argv[])
{
    Bar bar;
    memset(&bar, 0xff, sizeof(bar));
    bar.zero = 0;
    size_t len = strlen(bar.name);
    size_t min = MIN(16, len);
    printf("len=%zu, min=%zu\n", len, min);

    bar.name[14] = '\0';
    len = strlen(bar.name);
    min = MIN(16, len);
    printf("len=%zu, min=%zu\n", len, min);
    return 0;
}
[yanyg@x1{192.168.0.106} ~/test ]
$ gcc -Wall array-overflow.c -O2
[yanyg@x1{192.168.0.106} ~/test ]
$ ./a.out
len=20, min=20
len=14, min=14

编译器认为程序员不会错,因此strlen(bar.name)不会比sizeof(bar.name)更大,因此直接优化成了*min = len*; 查看生成的汇编代码:

// -O0
movl    $16, %eax
cmpq    $16, -8(%rbp)
cmovbe  -8(%rbp), %rax
movq    %rax, -16(%rbp)

// -O2
sbbq    $3, %rsi
xorl    %eax, %eax
subq    %rbx, %rsi
movq    %rsi, %rdx // min = MIN(16, len);

2 编译器未定义行为

2.1 WIKI描述

UB行为结果是不确定的,发生任何事情都是允许的。

In computer programming, undefined behavior (UB) is the result of executing a program whose behavior is prescribed to be unpredictable, in the language specification to which the computer code adheres. This is different from unspecified behavior, for which the language specification does not prescribe a result, and implementation-defined behavior that defers to the documentation of another component of the platform (such as the ABI or the translator documentation).

In the C community, undefined behavior may be humorously referred to as "nasal demons", after a comp.std.c post that explained undefined behavior as allowing the compiler to do anything it chooses, even "to make demons fly out of your nose".

2.2 编译器优化假设

  • 编译器在保证单线程正确性的前提下做优化;
  • 编译器假设代码(程序员)正确的前提下做优化;
  • 编译器假设没有溢出;

3 References