vendredi 26 août 2016

std::async performance on Windows and Solaris 10

I'm running a simple threaded test program on both a Windows machine (compiled using MSVS2015) and a server running Solaris 10 (compiled using GCC 4.9.3). On Windows I'm getting significant performance increases from increasing the threads from 1 to the amount of cores available; however, the very same code does not see any performance gains at all on Solaris 10.

The Windows machine has 4 cores (8 logical) and the Unix machine has 8 cores (16 logical).

What could be the cause for this? I'm compiling with -pthread, and it is creating threads since it prints all the "S"es before the first "F". I don't have root access on the Solaris machine, and from what I can see there's no installed tool which I can use to view a process' affinity.

Example code:

#include <iostream>
#include <vector>
#include <future>
#include <random>
#include <chrono>

std::default_random_engine gen(std::chrono::system_clock::now().time_since_epoch().count());
std::normal_distribution<double> randn(0.0, 1.0);

double generate_randn(uint64_t iterations)
{
    // Print "S" when a thread finishes
    std::cout << "S";
    std::cout.flush();

    double rvalue = 0;
    for (int i = 0; i < iterations; i++)
    {
        rvalue += randn(gen);
    }
    // Print "F" when a thread finishes
    std::cout << "F";
    std::cout.flush();

    return rvalue/iterations;
}

int main(int argc, char *argv[])
{
    if (argc < 2)
        return 0;

    uint64_t count = 100000000;
    uint32_t threads = std::atoi(argv[1]);

    double total = 0;

    std::vector<std::future<double>> futures;
    std::chrono::high_resolution_clock::time_point t1;
    std::chrono::high_resolution_clock::time_point t2;

    // Start timing
    t1 = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < threads; i++)
    {
        // Start async tasks
        futures.push_back(std::async(std::launch::async, generate_randn, count/threads));
    }
    for (auto &future : futures)
    {
        // Wait for tasks to finish
        future.wait();
        total += future.get();
    }
    // End timing
    t2 = std::chrono::high_resolution_clock::now();

    // Take the average of the threads' results
    total /= threads;

    std::cout << std::endl;
    std::cout << total << std::endl;
    std::cout << "Finished in " << std::chrono::duration_cast<std::chrono::milliseconds>(t2 - t1).count() << " ms" << std::endl;
}

Aucun commentaire:

Enregistrer un commentaire