Say there is some simple adding function
// c[i] = a[i] + b[i] for i in [0, n)
static void add(const float * __restrict__ a,
const float * __restrict__ b,
float * __restrict__ c,
int n);
in some header test.hpp, which is then implemented in test.cpp. To the best of my (extremely mediocre) disassembly analysis skills (naive inspection and diff), without __restrict__ in the signature of the implementation (.cpp), it is not enforced.
Does __restrict__ need to be in both the definition AND implementation signature? If so, why?
I believe the add function is an appropriate test case for this, but I'm not skilled enough to understand what is different and why it matters. I am researching that aspect right now to hopefully refine the question. Right now I just see fewer instructions, but need to find where / why they matter and will update this section.
I believe the code below is straightforward, included to make something easy to compile. If you set USE_RESTRICT in test.cpp to 0, the disassembled results are different. The one in the header (TEST_NO_RESTRICT) is there for additional permutations, but should remain 1 for this question.
test.hpp
#pragma once
// set to 1 to check the differences
#define TEST_NO_RESTRICT 0
#if TEST_NO_RESTRICT
#define RESTRICT /* does nothing */
#else
#if defined(__GNUC__) || defined(__clang__)
#define RESTRICT __restrict__
#elif defined(_MSC_VER)
#define RESTRICT __restrict
#else
#define RESTRICT /* does nothing */
#endif
#endif // TEST_NO_RESTRICT
// c[i] = a[i] + b[i] for i in [0, n)
void add(const float * RESTRICT a, const float * RESTRICT b, float * c, int n);
test.cpp
#include "test.hpp"
// set to 0 to undo restrict
#define USE_RESTRICT 1
#if USE_RESTRICT
#define IMPL_RESTRICT RESTRICT
#else
#define IMPL_RESTRICT /* does nothing */
#endif
// c[i] = a[i] + b[i] for i in [0, n)
void add(const float * IMPL_RESTRICT a,
const float * IMPL_RESTRICT b,
float * IMPL_RESTRICT c,
int n) {
for (int i = 0; i < n; ++i)
c[i] = a[i] + b[i];
}
main.cpp
#include "test.hpp"
#include <iostream>
#include <iomanip>
int main(void) {
int n = 4;
float *a = (float *)malloc(n * sizeof(float));
float *b = (float *)malloc(n * sizeof(float));
float *c = (float *)malloc(n * sizeof(float));
for (int i = 0; i < n; ++i) {
a[i] = i;
b[i] = i;
}
add(a, b, c, n);
auto print = [n](const std::string &desc, const float *arr) {
std::cout << desc << std::endl;
for (int i = 0; i < n; ++i)
std::cout << " " << arr[i];
std::cout << std::endl;
};
print("A: ", a);
print("B: ", b);
print("C: ", c);
free(a);
free(b);
free(c);
return 0;
}
To build with CMake (really we just need -std=c++11 and -O3):
CMakeLists.txt
cmake_minimum_required(VERSION 3.1.3 FATAL_ERROR)
project("restrict_test")
# C++11 required for this project
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_CXX_EXTENSIONS OFF)
set(CMAKE_CXX_FLAGS "-O3 ${CMAKE_CXX_FLAGS}")
add_executable(restrict-test test.hpp test.cpp main.cpp)
Aucun commentaire:
Enregistrer un commentaire