Yanyg - Software Engineer

C++ Memory Model

目录

1 一个例子

1.1 乱序执行

下面代码在x86下工作正常,在arm上频繁输出错误,观察到的,b比a大1或2。参考强内存模型与弱内存模型:https://preshing.com/20120930/weak-vs-strong-memory-models/

#include <pthread.h>
#include <stdio.h>
#include <time.h>

long a = 0;
long b = 0;

void* thread_func(void *arg)
{
    time_t tmOld = 0;
    unsigned long cc = 0;
    while (1)
    {
        // read b first, then read a;
        long rb = *(volatile long*)&b;
        long ra = *(volatile long*)&a;
        if (rb > ra)
        {
            ++cc;
            time_t tmCur = time(NULL);
            if (tmCur - tmOld > 1)
            {
                tmOld = tmCur;
                printf("Caught out-of-order(total=%lu). a=%ld, b=%ld\n",
                       cc, ra, rb);
            }
        }
    }

    return NULL;
}

int main(int argc, char *argv[])
{
    pthread_t tid;
    pthread_create(&tid, NULL, thread_func, NULL);

    while (1)
    {
        // increment a first, then increment b;
        ++a;
        ++b;
    }

    return 0;
}

2 程序并发

  • 编译器在保证单线程正确性的前提下做优化;
  • 编译器假设代码(程序员)是正确的前提下做优化;
  • 处理器乱序执行;
    • Professional Assembly Language 描述了CPU乱序执行引擎;
  • CPU缓存未同步
    • Store buffer From Stackoverflow https://stackoverflow.com/questions/11105827/what-is-a-store-buffer

      A store buffer is a speculative structure that exists in the CPU, just like the load queue and is for allowing the CPU to speculate on stores. A write combining buffer is part of the memory system and essentially takes a bunch of small writes (think 8 byte writes) and packs them into a single larger transaction (a 64-byte cache line) before sending them to the memory system. These writes are not speculative and are part of the coherence protocol. The goal is to save bus bandwidth. Typically, a write combining buffer is used for uncached writes to I/O devices (often for graphics cards). It's typical in I/O devices to do a bunch of programming of device registers by doing 8 byte writes and the write combining buffer allows those writes to be combined into larger transactions when shipping them out past the cache.

    • L1, L2, L3 Cache;

3 Memory Model

3.1 Weak vs. Strong Memory Model

3.2 Memory order in c++

Inter-thread synchronization and memory ordering determine how evaluations and side effects of expressions are ordered between different threads of execution. They are defined in the following terms:

3.2.1 Relaxed - memory_order_relaxed

Relaxed operation: there are no synchronization or ordering constraints imposed on other reads or writes, only this operation's atomicity is guaranteed (see Relaxed ordering below) 除了保证操作是原子的,没有其他任何保证。

3.2.2 Acquire - memory_order_acquire

A load operation with this memory order performs the acquire operation on the affected memory location: no reads or writes in the current thread can be reordered before this load. All writes in other threads that release the same atomic variable are visible in the current thread (see Release-Acquire ordering below)

用于load操作,保证当前线程没有读或者写会重排到这个load操作之前。其他线程对同一个原子变量的写入,在当前线程可见。

3.2.3 Release - memory_order_release

A store operation with this memory order performs the release operation: no reads or writes in the current thread can be reordered after this store. All writes in the current thread are visible in other threads that acquire the same atomic variable (see Release-Acquire ordering below) and writes that carry a dependency into the atomic variable become visible in other threads that consume the same atomic (see Release-Consume ordering below).

用于store操作,保证当前线程没有读或者写会重排到这个操作之后。

3.2.4 Acquire-Release - memory_order_acq_rel

A read-modify-write operation with this memory order is both an acquire operation and a release operation. No memory reads or writes in the current thread can be reordered before the load, nor after the store. All writes in other threads that release the same atomic variable are visible before the modification and the modification is visible in other threads that acquire the same atomic variable.

用于读写操作,同时保证acquire和release。保证当前线程读或写不会重排到load之前,也不会重拍到store之后。

3.2.5 Sequence - memory_order_seq_cst

A load operation with this memory order performs an acquire operation, a store performs a release operation, and read-modify-write performs both an acquire operation and a release operation, plus a single total order exists in which all threads observe all modifications in the same order (see Sequentially-consistent ordering below)

在seq_cst约束下,load满足acquire约束,store满足release约束。read-modify-write 满足acquire和release约束。total order这里还没看明白。

4 References