Yanyg - Software Engineer

Memory Model

目录

编译器基于单线程正确性原则, 执行代码优化, 可能重排内存访问顺序. 而实际的程序很可能是多线程/多进程的.

为解决CPU 与内存之间的性能差异带来的性能问题, 引入了 CPU Storebuffer, Cache, 预取, 乱序执行引擎.

保证偏序, 不是全序.

部分代码的内存访问顺序是重要的. 例如 Producer-Consumer 多线程同步, 硬件条件检测与寄存器访问. 这要求部分代码需要严格控制顺序.

基于上述内存访问特征, 引入内存模型(memory-model), 解决 Data-Race 问题.

1. Acquire/Release

  • Acquire
    • Acquire 之后的代码, 不会乱序到 Acquire 之前;
    • Acquire 之前的代码, 可以乱序到 Acquire 之后;
    • 比 mfence 范围小;
  • Release
    • Release 之前的代码, 不会乱序到 Release 之后;
    • Release 之后的代码, 可能会乱序到 Release 之前;
    • 比 mfence 范围小;
  • Acquire/Release 在多线程上表现为偏序
    • 典型地, 不是全序;
    • SC(sequentially Consistency) 保证全序

2. Concepts

2.1. Atomic

All-or-nothing.

2.2. Non-Blocking-Algorithm

  • 任何线程异常退出, 不会阻塞系统整体推进;
  • Lock-Free: 系统层级, 总是有进展; 例如, CAS;
  • Wait-Free: 线程层级, 总是有进展; 例如,
  • 任何

2.3. Sequential Consistency (SC)

  • 1979, Lamport

2.4. SC-DRF (Sequential Consistency Data Race Free programs)

2.5. Race Condition

  • A memory Location (variable) can be simultaneously by two threads;
  • simultaneously == Not happened before;

3. Strong Memory Model and

4. Acquire/Release

5. Compiler Optimization

5.1. Example

// From:
x = 1;
y = "hello";
x = 2;

// to:
y = "hello";
x = 2;
// From:
for (int i = 0; i < M; ++i)
  {
    z += a[i];
  }

// To:
register long r = z;
for (int i = 0; i < M; ++i)
  {
    r += a[i];
  }
z = r;

// From:
x = 100;
y = 200;
z = 300;

// To:
z = 300;
x = 200;
y = 100;

5.2. Rules

Single-Thread.

6. Reference