mardi 8 janvier 2019

Multiple Tensorflow Session on Separate GPU's Cannot Seem to Speed up Inference

I am trying to speed up Tensorflow inference by creating multiple Sessions, with each Session loading its own graph on its own GPU. When I ran this same model for a batch of 10 images on a single GPU, it took about 2700 ms. I was hoping I can run 2 batches, one per GPU and process 20 images in the same time frame. Instead, the run time actually took about 5300 ms. So it seems like I was not able to get the speed up I was hoping for.

I am running Tensorflow 1.7 with 2 Quadro GV100's. I did not get any error messages running my code. Below is my code:

auto options = SessionOptions();
options.config.mutable_gpu_options()->set_visible_device_list("0,1");

NewSession(options, &m_session[0]);
NewSession(options, &m_session[1]);

GraphDef graph_def0;
graph::SetDefaultDevice("/device:GPU:0", &graph_def0);
ReadBinaryProto(Env::Default(), graphPath, &graph_def0);
m_session[0]->Create(graph_def0);

GraphDef graph_def1;
graph::SetDefaultDevice("/device:GPU:1", &graph_def1);
ReadBinaryProto(Env::Default(), graphPath, &graph_def1);
m_session[1]->Create(graph_def1);

//list0 and list1 are list of images, CallSessionRun()'s 2nd arg is index into m_session
std::future<std::vector<std::vector<tf_detection>>> fut0 = std::async([&]()->std::vector<std::vector<tf_detection>>{
    auto detections = CallSessionRun(list0, 0);
    return detections;
});

std::future<std::vector<std::vector<tf_detection>>> fut1 = std::async([&]()->std::vector<std::vector<tf_detection>>{
    auto detections = CallSessionRun(list1, 1);
    return detections;
});

auto ans0 = fut0.get();
auto ans1 = fut1.get();

graph::SetDefaultDevice is supposed to dedicate a GPU for a graph and calling m_session[i]->run() in std::async is supposed to utilize each session concurrently. But it didn't seem to work. Am I missing something?

Thank you very much for your help in advance!

Aucun commentaire:

Enregistrer un commentaire