samedi 3 août 2019

Why does vector "emplace_back" behave much slower in multiple threads than single threads

I'm doing a project that need to put lots of data into a vector, I found that it was much slower to "emplace_back" about 800,000 data into a vector in multithread callback function(about 4.5 seconds) then the single thread with the same work(about 0.04s), I wonder why and how to solve this problem

My cpu has 18 cores(E5 2699 v3), 2 * 8G memory , I opened 17 threads, VS2015 release x64 , the concurrency visualizer says the cpu has 85% execution and the "emplace_back" has about 98% inclusive samples, I wrote a sample demo to test the performance, the code is shown below

#include <Windows.h>
#include <stdio.h>
#include <process.h>
#include<time.h>
#include <vector>

/**brief: In the thread callback function, 800,000 emplace_back
* operations were performed on local vector, 
*/

unsigned int __stdcall ThreadFun(PVOID pM)
{
    double stop, start, durationTime;
    int x = 0;
    std::vector<int> indices_v;
    indices_v.reserve(10000000);
    //========= emplace_back  test==============
    start = clock();


    for (; x < 800000; ++x)
    {
        indices_v.emplace_back(7788);
    }
    stop = clock();
    durationTime = ((double)(stop - start)) / CLK_TCK;

    printf("Thread ID %4d ,time: %f\n",
        GetCurrentThreadId(),durationTime);

    return 0;
}
/*
* same tesk with ThreadFun(), but no reserve(1000000)
* still faster then multithread
*/
void SingleThread()
{
    double stop, start, durationTime;
    int x = 0;
    std::vector<int> indices_v;
    //=========emplace_back  test==============
    start = clock();
    for (; x < 800000; ++x)
    {
        indices_v.emplace_back(7788);
    }
    stop = clock();
    durationTime = ((double)(stop - start)) / CLK_TCK;
    //
    printf("Single Thread  time: %f\n", durationTime);
}

int main()
{
    const int ThreadNum = 17;
    //do 800000
    SingleThread();
    printf("\n");
    //===========MultiThreading======================
    HANDLE handle[ThreadNum];

    for (int i = 0; i < ThreadNum; i++)
    {
        handle[i] = (HANDLE)_beginthreadex(NULL, 0, ThreadFun, NULL, 0, NULL);
    }
    WaitForMultipleObjects(ThreadNum, handle, TRUE, INFINITE);
    Sleep(5000);
    return 0;
}


Output:
Single Thread  time: 0.046000

Thread ID 28580 ,time: 0.050000 Thread ID 25132 ,time: 1.384000 Thread ID 15428 ,time: 3.059000 Thread ID 15964 ,time: 3.556000 Thread ID 17620 ,time: 3.849000 Thread ID 9056 ,time: 3.965000 Thread ID 18300 ,time: 4.191000 Thread ID 13328 ,time: 4.182000 Thread ID 24972 ,time: 4.184000 Thread ID 13352 ,time: 4.174000 Thread ID 29316 ,time: 4.293000 Thread ID 3056 ,time: 4.278000 Thread ID 25016 ,time: 4.111000 Thread ID 13976 ,time: 4.195000 Thread ID 652 ,time: 4.259000 Thread ID 22104 ,time: 4.174000 Thread ID 13772 ,time: 4.148000

I expect the time consumed by "emplace_back" in multiple threads should be similar to the single thread ,but it takes much more time then single thread , I want to know why and how to solve it ,any help?

Aucun commentaire:

Enregistrer un commentaire