samedi 26 août 2017

Does FBString's small string optimization rely on undefined behavior?

Facebook's fbstring_core class uses the "Small String Optimization" described in this talk wherein the storage for the class' data members -- a Char*, size and capacity -- will be repurposed to store character data if the string is sufficiently small. The flag bits used to distinguish between these cases are located in the "rightmost char of the storage". My question is whether accessing these bits through the bytes_ union member, which is never actually written, constitutes undefined behavior per the C++11 standard? The answer to this question suggests it is.

The following excerpt contains the declaration of these members and the category() member function that is used to determine whether this optimization is in effect.

    typedef uint8_t category_type;

    enum class Category : category_type {
      isSmall = 0,
      isMedium = kIsLittleEndian ? 0x80 : 0x2,
      isLarge = kIsLittleEndian ? 0x40 : 0x1,
    };

    Category category() const {
      // works for both big-endian and little-endian
      return static_cast<Category>(bytes_[lastChar] & categoryExtractMask);
    }

    struct MediumLarge {
      Char * data_;
      size_t size_;
      size_t capacity_;

      size_t capacity() const {
        return kIsLittleEndian
          ? capacity_ & capacityExtractMask
          : capacity_ >> 2;
      }

      void setCapacity(size_t cap, Category cat) {
        capacity_ = kIsLittleEndian
            ? cap | (static_cast<size_t>(cat) << kCategoryShift)
            : (cap << 2) | static_cast<size_t>(cat);
      }
    };

    union {
      uint8_t bytes_[sizeof(MediumLarge)]; // For accessing the last byte.
      Char small_[sizeof(MediumLarge) / sizeof(Char)];
      MediumLarge ml_;
    };

It seems that this implementation relies on using "type punning" to access a byte that might actually be part of the size_t capacity_ member. From the answer to the question linked above, I gather that this is defined behavior in C99, but not in C++11?

Aucun commentaire:

Enregistrer un commentaire