vendredi 22 septembre 2017

Why do functions using std::mutex make a null check of the address of pthread_key_create?

Take this simple function that increments an integer under a lock implement by std::mutex:

#include <mutex>

std::mutex m;

void inc(int& i) {
    std::unique_lock<std::mutex> lock(m);
    i++;
}

I would expect this (after inlining) to compile in a straightforward way to a call of m.lock() an increment of i and then m.unlock().

Checking the generated assembly for recent versions of gcc and clang, however, we see an extra complication. Taking the gcc version first:

inc(int&):
  mov eax, OFFSET FLAT:__gthrw___pthread_key_create(unsigned int*, void (*)(void*))
  test rax, rax
  je .L2
  push rbx
  mov rbx, rdi
  mov edi, OFFSET FLAT:m
  call __gthrw_pthread_mutex_lock(pthread_mutex_t*)
  test eax, eax
  jne .L10
  add DWORD PTR [rbx], 1
  mov edi, OFFSET FLAT:m
  pop rbx
  jmp __gthrw_pthread_mutex_unlock(pthread_mutex_t*)
.L2:
  add DWORD PTR [rdi], 1
  ret
.L10:
  mov edi, eax
  call std::__throw_system_error(int)

It's the first couple of lines that are interesting. The assembled code examines the address of __gthrw___pthread_key_create (which is the implementation for pthread_key_create - a function to create a thread-local storage key), and if it is zero, it branches to .L2 which implements the increment in a single instruction without any locking at all.

If it is non-zero it proceeds as expected.

clang does even more: it checks the address of the function twice, once before the lock and once before the unlock:

inc(int&): # @inc(int&)
  push rbx
  mov rbx, rdi
  mov eax, __pthread_key_create
  test rax, rax
  je .LBB0_4
  mov edi, m
  call pthread_mutex_lock
  test eax, eax
  jne .LBB0_6
  inc dword ptr [rbx]
  mov eax, __pthread_key_create
  test rax, rax
  je .LBB0_5
  mov edi, m
  pop rbx
  jmp pthread_mutex_unlock # TAILCALL
.LBB0_4:
  inc dword ptr [rbx]
.LBB0_5:
  pop rbx
  ret
.LBB0_6:
  mov edi, eax
  call std::__throw_system_error(int)

What's the purpose of this check?

Perhaps it is to support the case where the object file is ultimately complied into a binary without pthreads support and then to fall back to a version without locking in that case? I couldn't find any documentation on this behavior.

Aucun commentaire:

Enregistrer un commentaire