jeudi 27 septembre 2018

Why is single iteration for map insertion and map.find() much slower than two separate iterations for insert and for map.find()

I find an interesting phenomenon when I try to optimize my solution for the leetcode two sum problem (https://leetcode.com/problems/two-sum/description/). Leetcode description for the two-sum problem is:

Given an array of integers, return indices of the two numbers such that they add up to a specific target. You may assume that each input would have exactly one solution, and you may not use the same element twice.

Initially, I solve this problem by using two loops. First I loop through input array to store array value and array index as pair into a map. Then I loop through input array again to loop up each element and check if it exists in the map. The following is my solution from leetcode:

class Solution {
public:
    vector<int> twoSum(vector<int>& nums, int target) 
    {
        vector<int> res;
        map<int, int> store;

        for(int i = 0; i < nums.size(); ++i)
        {
            store[nums[i]] = i;
        }

        for(int i = 0; i < nums.size(); ++i)
        {
            auto iter = store.find(target - nums[i]);
            if(iter != store.end() && (iter -> second) != i)
            {
                res.push_back(i);
                res.push_back(iter -> second);
                break;
            }
        }
        return res;
    }
};

This solution takes 4ms in leetcode submission. Since I am looping through the same array twice, I was thinking to optimize my code by combining insert operation and map.find() into a single loop. Therefore I can check for a solution while inserting elements. Then I have the following solution:

class Solution {
public:
    vector<int> twoSum(vector<int>& nums, int target) 
    {
        vector<int> res;
        map<int, int> store;

        for(int i = 0; i < nums.size(); ++i)
        {
            auto iter = store.find(target - nums[i]);
            if(iter != store.end() && (iter -> second) != i)
            {
                res.push_back(i);
                res.push_back(iter -> second);
                break;
            }
            store[nums[i]] = i;
        }

        return res;
    }
};

However, the single loop version is much slower than two separate loops, which takes 12ms.

For further research, I made a test case where the input size is 100000001 and solution for this code will be [0, 100000001] (first index and last index). The following is my test code:

#include <iostream>
#include <vector>
#include <algorithm>
#include <map>
#include <iterator>
#include <cstdio>
#include <ctime>

using namespace std;

vector<int> twoSum(vector<int>& nums, int target) 
{
    vector<int> res;
    map<int, int> store;

    for(int i = 0; i < nums.size(); ++i)
    {
        auto iter = store.find(target - nums[i]);
        if(iter != store.end() && (iter -> second) != i)
        {
            res.push_back(i);
            res.push_back(iter -> second);
            break;
        }
        store[nums[i]] = i;
    }

    return res;
}


vector<int> twoSum2(vector<int>& nums, int target) 
{
    vector<int> res;
    map<int, int> store;

    for(int i = 0; i < nums.size(); ++i)
    {
        store[nums[i]] = i;
    }

    for(int i = 0; i < nums.size(); ++i)
    {
        auto iter = store.find(target - nums[i]);
        if(iter != store.end() && (iter -> second) != i)
        {
            res.push_back(i);
            res.push_back(iter -> second);
            break;
        }
    }
    return res;
}

int main()
{

    std::vector<int> test1;
    test1.push_back(4);
    for (int i = 0; i < 100000000; ++i)
    {
        test1.push_back(3);
    }
    test1.push_back(6);

    std::clock_t start;
    double duration;

    start = std::clock();
    auto res1 = twoSum(test1, 10);
    duration = ( std::clock() - start ) / (double) CLOCKS_PER_SEC;
    std::cout<<"single loop: "<< duration <<'\n';   
    cout << "result: " << res1[1] << ", " << res1[0] << endl;

    start = std::clock();
    res1 = twoSum2(test1, 10);
    duration = ( std::clock() - start ) / (double) CLOCKS_PER_SEC;
    std::cout<<"double loops: "<< duration <<'\n';   
    cout << "result: " << res1[0] << ", " << res1[1] << endl;

}

I still get a similar result, which single loop version (7.9s) is slower than double loops version (3.0s): results

I don't really understand why a single loop combined version is slower than a double loops separated version? I think the single loop version should reduce some redundant looping. Is it because of the STL map implementation that it is better to do insertion and map.find() operation separately in two loops, rather than do insertion and map.find() alternately in one loop?

BTW I am working on a MAC OS and using Apple LLVM version 10.0.0 (clang-1000.10.44.2).

Aucun commentaire:

Enregistrer un commentaire