I am adding a custom op in Tensorflow the single-threaded Op is working fine it compiles and runs when imported.
(It also runs using pure-python and multiprocessing library with a shared Array.)
I am defining a shard function:
#include "work_sharder.h"
//Register Op then overwrite of the Compute function then inside
//arg1 and arg2 are const
auto shard = [arg1, arg2,
&shared_array]
(int64 start, int64 limit) {
//Reading (and sometimes passing by value shared_array, which is an Eigen::Tensor) often outside the start-limit safe zones.
//Writing in shared_array only between indexes start and limit so no write/write concurrency
//For instance
for (int counter=1; counter<hard_limit; counter++){
if (shared_array(start-counter) == 1){
shared_array(start)=0;}}
//Not sure if relevant but I am also using types from std like std::pair.
};
const DeviceBase::CpuWorkerThreads& worker_threads = *(context->device()->tensorflow_cpu_worker_threads());
const int64 shard_cost = N;
Shard(worker_threads.num_threads, worker_threads.workers,
shared_array.size(), shard_cost, shard);
It compiles perfectly. When running this multi-threaded code I run into 1592 segfaults.
Is it expected behavior because I am at the same time writing and reading in shared_array ? Or should it be handled gracefully by this type of code ?
If this is expected behavior what would be the minimal set of glue code to make it work ?
Aucun commentaire:
Enregistrer un commentaire