I am trying to speed up Tensorflow inference by creating multiple Sessions, with each Session loading its own graph on its own GPU. When I ran this same model for a batch of 10 images on a single GPU, it took about 2700 ms. I was hoping I can run 2 batches, one per GPU and process 20 images in the same time frame. Instead, the run time actually took about 5300 ms. So it seems like I was not able to get the speed up I was hoping for.
I am running Tensorflow 1.7 with 2 Quadro GV100's. I did not get any error messages running my code. Below is my code:
auto options = SessionOptions();
options.config.mutable_gpu_options()->set_visible_device_list("0,1");
NewSession(options, &m_session[0]);
NewSession(options, &m_session[1]);
GraphDef graph_def0;
graph::SetDefaultDevice("/device:GPU:0", &graph_def0);
ReadBinaryProto(Env::Default(), graphPath, &graph_def0);
m_session[0]->Create(graph_def0);
GraphDef graph_def1;
graph::SetDefaultDevice("/device:GPU:1", &graph_def1);
ReadBinaryProto(Env::Default(), graphPath, &graph_def1);
m_session[1]->Create(graph_def1);
//list0 and list1 are list of images, CallSessionRun()'s 2nd arg is index into m_session
std::future<std::vector<std::vector<tf_detection>>> fut0 = std::async([&]()->std::vector<std::vector<tf_detection>>{
auto detections = CallSessionRun(list0, 0);
return detections;
});
std::future<std::vector<std::vector<tf_detection>>> fut1 = std::async([&]()->std::vector<std::vector<tf_detection>>{
auto detections = CallSessionRun(list1, 1);
return detections;
});
auto ans0 = fut0.get();
auto ans1 = fut1.get();
graph::SetDefaultDevice is supposed to dedicate a GPU for a graph and calling m_session[i]->run() in std::async is supposed to utilize each session concurrently. But it didn't seem to work. Am I missing something?
Thank you very much for your help in advance!
Aucun commentaire:
Enregistrer un commentaire