mardi 2 juin 2015

Guaranteed Detection of Temporary->Named Points

Suppose you write a matrix class with some operations:

class matrix
{
public:
    double operator()(size_t i, size_t j) const;
    ...
};

matrix operator*(const matrix &lhs, const matrix &rhs);
...

It makes sense to defer the evaluation of some matrix expressions: m0 * m1 * m2 * m3 * m4 (which is a series of four operator* calls) can benefit from using the dynamic-programming matrix chain multiplication algorithm; the very common m0 * m1t has a very efficient dgemm implementation, and so forth.

It consequently pays to defer the actual computation until when it's needed. This changes the above to:

class matrix
{
private:
    /*
    * Pointer to an abstract base class - either an actual matrix, 
    *    or an expression tree. */
    std::shared_ptr<matrix_imp> m_imp;

public:
    // Forces compaction - 
    double operator()(size_t i, size_t j) const;
    ...
};

/* Lazy; creates a matrix with an expression tree using the
*    internals of lhs and rhs. */
matrix operator*(const matrix &lhs, const matrix &rhs);
...

Each matrix holds a pointer to a base class object that can range from a real matrix to a complicated expression tree. Each operation tries to form a matrix using the laziest change to the internal implementation. Some operations have no choice but to actually evaluate things, compact the expression tree, and set the internal implementation to an actual matrix.


The problem was that, in practice, this caused huge memory overhead in very common cases. Let's say you read from a file a long-and-narrow matrix x = xp X q, p >> q, store xt x in a variable, and discard x. With lazy evaluation, the memory is pq >> qq. Load these in a loop, and it's a serious problem. (Of course, compaction can be forced after each load by the client code calling operator(), but requiring this without algorithmic justification is ugly and error prone.)

Initially, I thought that the move ctor was a good point for automatic compaction - it's exactly the point where a temporary becomes a named object, and it's named objects that cause increasing memory consumption, so

matrix(matrix &&other); // <- Force compaction only here

would appear to solve everything, e.g.,

auto res = // <- temp becoming named
    a * // temp
    b * // temp
    c + // temp
    2 * // temp
    d;

but can it be counted on? E.g., consider

matrix load_xtx(const string &f_name)
{
    matrix x = ...
    return x.t() * x; 
}

auto xtx = load_xtx("foo.hdf5"); // (*)

is the compiler forbidden to do in (*) something similar to what it does with the NRVO, namely just to construct it in place? Even if not, might a compiler optimize away things in other cases?

Aucun commentaire:

Enregistrer un commentaire