lundi 25 février 2019

Pybind11 and std::vector -- How to free data using capsules?

I have a C++ function that returns a std::vector and, using Pybind11, I would like to return the contents of that vector as a Numpy array without having to copy the underlying data of the vector into a raw data array.

Current Attempt

In this well-written SO answer the author demonstrates how to ensure that a raw data array created in C++ is appropriately freed when the Numpy array has zero reference count. I tried to write a version of this using std::vector instead:

// aside - I made a templated version of the wrapper with which
// I create specific instances of in the PYBIND11_MODULE definitions:
//
//     m.def("my_func", &wrapper<int>, ...)
//     m.def("my_func", &wrapper<float>, ...)
// 
template <typename T>
py::array_t<T> wrapper(py::array_t<T> input) {
    auto proxy = input.template unchecked<1>();
    std::vector<T> result = compute_something_returns_vector(proxy);

    // give memory cleanup responsibility to the Numpy array
    py::capsule free_when_done(result.data(), [](void *f) {
        auto foo = reinterpret_cast<T  *>(f);
        delete[] foo;
    });

    return py::array_t<T>({result.size()}, // shape
                          {sizeof(T)},     // stride
                          result.data(),   // data pointer
                          free_when_done);
}

Observed Issues

However, if I call this from Python I observe two things: (1) the data in the output array is garbage and (2) when I manually delete the Numpy array I receive the following error (SIGABRT):

python3(91198,0x7fff9f2c73c0) malloc: *** error for object 0x7f8816561550: pointer being freed was not allocated

My guess is that this issue has to do with the line "delete[] foo", which presumably is being called with foo set to result.data(). This is not the way to deallocate a std::vector.

Possible Solutions

One possible solution is to create a T *ptr = new T[result.size()] and copy the contents of result to this raw data array. However, I have cases where the results might be large and I want to avoid taking all of that time to allocate and copy. (But perhaps it's not as long as I think it would be.)

Also, I don't know much about std::allocator but perhaps there is a way to allocate the raw data array needed by the output vector outside the compute_something_returns_vector() function call and then discard the std::vector afterwards, retaining the underlying raw data array?

The final option is to rewrite compute_something_returns_vector.

Aucun commentaire:

Enregistrer un commentaire