mardi 25 août 2015

The C++ Copy and Move Assignment Operators and Constructors - Safe for the Paranoid and Fast for the Mad

Intro

There is a lot of info around the web, here on StackOverflow and in many other places, some of it contradictory, regarding "How to implement the Copy and Move Assignment Operators and Constructors".

Part of the problem is there appears to be several possible methods, and the most appropriate is sometimes a matter of opinion, or depends on the intended application of the software to be written. Sometimes the implementation choice is critical, and sometimes it is not. I believe this to be part of the problem.

There is no 1 question on StackOverflow with an answer to all the question which follow. I believe I also have acquired some incorrect habits due to confusion.


Example Code - My Best Effort (so far)

Example Class

Below is my best effort for implementation of these 4 functions. I have made references where I can.

Let us dive right in with a "trivial example". (It is not so trivial to me when thinking about the implementations of the next 4 functions, but the class is as simple as it could possibly be!)

First consider a class of the form:

class my_class
{

    protected:

    double *my_data;
    uint64_t my_data_length;
}

This type of class, with a pointer to heap allocated data, will not be what everyone requires to answer their version of this question, but I suspect that it will be good enough for say the 90 % majority.

The Copy and Move Assignment Operators and Constructors

(Something I don't mention later on is the noexcept keyword, which according to wikipedia is required for optimization enabling.)

//
// Function 1
// Copy Constructor
//

my_class(const my_class& other) : my_data_length{other.my_data_length} , my_data{new double[my_data_length]}
    // *(Q8)* Do we need to check whether memory needs to be deleted before call to new?
{
    // Copy the data
    memcpy(my_data, other.my_data, my_data_length * sizeof(double));

    // *(A9)* Should we bother with if(my_data_length != other.my_data_length) ?
}


//
// Function 2:
// Move Constructor
//

my_class(my_class&& other) noexcept : my_data_length{other.my_data_length}, my_data{other.my_data}
    // *(Q8)* Do we need to check whether memory needs to be deleted before call to new?
{
    // Steal the data
    other.my_data = nullptr;
    other.my_data_length = 0;
}


//
// Function 3:
// Copy Assignment Operator
//

const my_class& operator=(const my_class& other)
{
    // Copy the data
    my_class tmp(other); // *(A4a/Q4b)*
    *this = std::move(tmp);

    return *this;

    // *(Q10)* Should we bother with if(this != &other) ?
    // *(A9)* Should we bother with if(my_data_length != other.my_data_length) ?
}


//
// Function 4:
// Move Assignment Operator
//

const my_class& operator=(my_class&& other)
{
    // Steal the data
    std::swap(my_data_length, other.my_data_length); // Bonus question:
    std::swap(my_data, other.my_data);               // Is there any point to std::swap other than is saves 2 lines of code?

    return *this;
}

The Naive Implementation

//
// Function 1
// Copy Constructor
//

my_class(const my_class& other) : my_data_length{other.my_data_length} , my_data{new double[my_data_length]}        
{
    // Copy the data
    memcpy(my_data, other.my_data, my_data_length * sizeof(double));
}


//
// Function 2:
// Move Constructor
//

my_class(my_class&& other) : my_data_length{other.my_data_length}, my_data{other.my_data}
{
    // Steal the data
    other.my_data = nullptr;
    other.my_data_length = 0;
}


//
// Function 3:
// Copy Assignment Operator
//

const my_class& operator=(const my_class& other)
{
    // Copy the data
    my_data_length = other.my_data_length;
    my_data = new double[my_data_length];
    memcpy(my_data, other.my_data, my_data_length * sizeof(double));

    return *this;
}


//
// Function 4:
// Move Assignment Operator
//

const my_class& operator=(my_class&& other)
{
    // Steal the data
    my_data_length = other.my_data_length;
    my_data = other.my_data;
    other.my_data_length = 0;
    other.my_data = nullptr;

    return *this;
}

There is already a question here. "Should I put a const for the return types of the assignment operators?"

  • (Q1a) What is the difference if I do or do not? By putting the const what do I prevent the user (user of the code at a later date) from doing?
  • (Q1b) Is there a concrete rule? Should I always put const here or is it sometimes useful not to?

(A2) There is another question also. "Should I assign other.my_data and other.my_data_length to nullptr and 0 in the move constructor and move assignment operator? If so, or so not, then why is it important to do so, or not to do so? (Answer below, source here.)

(Q3) The 3rd question which comes to mind is, "When is it important to check to see if memory need deleting in order to prevent memory leaks?"

In addition, we have written a large amount of duplicate code. (Which could be a lot of code for a less trivial class.) Some recommend implementing some of these operators in terms of the others. I believe that the Move Assignment Operator can be implemented in terms of a Move Constructor, and the Copy Assignment Operator can be implemented in terms of the Copy Constructor. (A4a) How can this be done, and (Q4b) how does this affect runtime efficiency? (Many may be of the opinion that it is worth a small penalty in runtime efficiency for a shorter code. Or perhaps there is no penalty? I don't know the answer to this.)

I think it is beyond the scope of this question to consider what may happen if new fails. Checking and recovering from memory allocation errors will open a whole new can of worms I am sure, and there are already questions which address this issue. For example, here, here and here or here.

Bonus question: (Q5) Is return *this required for a move operation? For a copy assignment, return *this is required for "chaining", ie; a = b = c. But what about the move assignment operator? Is it possible to have a move operation where 3 objects are involved? (According to wikipedia, the return *this is required, but I cannot think of an example demonstrating why - I would like to see one.)


Dave the Mad and Bob the Paranoid

I would like to get enough information together to produce 2 sets of code:

  • A "safest possible" code for the paranoid
  • A "fastest possible" code for the mad

"Dave is mad and Bob is paranoid. They are both programmers, but at opposite extremes of the "programmer spectrum". How can we write 2 pairs of code to satisfy each of them, from which any other C++ programmer should be able to take ideas from and tune his code as he sees fit? You might say that Dave the Mad really is mad, and therefore he allows for the possibility of memory leaks in the pursuit of raw performance, but no, Dave is not that mad."

Additional Points

This question has expanded a lot quicker than I anticipated, so I shall shorten this next section.

The following list is a summary of other suggestions I have seen previously. For each of these I ask the question; what are the advantages and disadvantages of implementing (or not implementing) the following ideas? To be more specific, what does each of the following allow or prevent you from doing? What effect do they have on runtime performance, and what runtime errors to they prevent? Importantly, which of the 4 functions (copy/move, assignment operator/constructor) are the following applicable to? (Note some of these have already been mentioned in the above section.)

  • (A4a/Q4b) Implement a copy assignment operator or copy constructor in terms of the other (to save typing)
  • (Q6) Same as above for move assignment or move constructor (to save typing)
  • (A7) Check to guard against a zero length memory allocation, see here
  • (Q8) Do we need to check to see if memory needs to be deleted before a call to new? If so, when and where?
  • (A9) Check to see if the block of memory to be allocated is the same size as the current block allocated, in which case we do not need to call new (which is dependent on whether the above point needs to be done, an example code is featured below)
  • Extending the above point, if the block of memory currently allocated is more than the block required, but not twice more then _don't bother to delete and new, just use the block but change the variable my_data_length [this is my own idea for an "optimization" which may be useful in some situations]
  • (Q10) Check for self-assignment. Is this required for constructors? What is an example code which causes the infinite loop problem? Is it important* to check for self-assignment?
  • Some "assignments" are made in the constructor initialization list rather than in the constructor body. I assume this is largely a matter of preference/style? Are there cases where it is not? One example is of calling a base class constructor. (Example below.)

    // Is there a reason to put it here?

    my_class(const my_class& other) : my_data{new double[other.my_data_size]}

  • The answer to this question uses the ternary operator ?:. I have already raised the question "is it important to check for zero length assignment?" however I wanted to include this link to demonstrate another method of doing that check, this time importantly (?) inside the initialization list. Is this important for any reason? (Q11)

  • The use of std::swap and std::move is also well summarized in the wikipedia article.

*It would be important if it were possible to subtly introduce an infinite loop by accidental self-assignment in such a way that is it not always obvious at runtime that the bug exists. Perhaps we enter the subjective here? Personally, I see no point bothering with this check considering what I know now because I assume (perhaps incorrectly) that an infinite loop would always be easy to spot! ... Assuming our program is deterministic in a "self-contained" way (ie, not random/stochastic/subject to external parameters [!] which if we have IO it always will be [!!]).


The following are self-descriptive optimizations. I have included these to attempt completeness: (Please mention to me any I have missed.)

// In the copy assignment operator and copy constructor (2 functions)
if(other.my_data_size == my_data_size)
{
    // There is no reason to change the value of my_data_size or change the size of allocated memory
    memcpy(my_data, other.my_data, my_data_size * sizeof(double));
}
else
{
    my_data_size = other.my_data_size;
    if(my_data) delete [] my_data;
    my_data = new double[my_data_size];
    memcpy(my_data, other.my_data, my_data_size * sizeof(double));
}

Aucun commentaire:

Enregistrer un commentaire