C++ 的 sizeof 是怎麼實現的？

01-13

環境:win7 x86 + vc++6
網上有很多"深入理解sizeof"，都看了，還是不理解
int a, b, c, d;

a = sizeof("123456789"); //a為10
b = sizeof("123456789"+1); //b也為10
而
c = strlen("123456789"); //c為9
d = strlen("123456789"+1); //d為8
沒搞錯的話，字元串常量放在表達式中返回的是此字元串常量的首址。這裡的b為10，不解
------------------------------------------------------------------------------------------------------------------
有如下代碼片段:
int i = 10;
cout &<&< i &<&

cout &<&< sizeof(++i) &<&< endl;
cout &<&< i &<&第二次輸出i仍是10，"++i"為什麼沒被執行，不解
--------------------------------------
綜上2點，我想知道c++ sizeof 的實現大概是什麼樣的？為什麼會出現上面2種現象？

sizeof的東西會被編譯器直接替換掉，即使是彙編代碼都只能看到一個常量，所以下面有童鞋說看反彙編源碼是不行的，因為已經在編譯器內部替換掉了（更嚴謹的說法是，VLA是特殊情況，這是後面的代碼說明中有提到）。下面以Clang對sizeof的處理來看sizeof的實現。

在Clang的實現中，在lib/AST/ExprConstant.cpp中有這樣的方法：

bool IntExprEvaluator::VisitUnaryExprOrTypeTraitExpr

這個方法的實現如此：

switch(E-&>getKind()) { case UETT_AlignOf: { if (E-&>isArgumentType()) return Success(GetAlignOfType(Info, E-&>getArgumentType()), E); else return Success(GetAlignOfExpr(Info, E-&>getArgumentExpr()), E); }


  case UETT_VecStep: {

    QualType Ty = E-&>getTypeOfArgument();
    if (Ty-&>isVectorType()) {

      unsigned n = Ty-&>castAs&()-&>getNumElements();
      // The vec_step built-in functions that take a 3-component

      // vector return 4. (OpenCL 1.1 spec 6.11.12)

      if (n == 3)

        n = 4;
      return Success(n, E);

    } else

      return Success(1, E);

  }
  case UETT_SizeOf: {

    QualType SrcTy = E-&>getTypeOfArgument();

    // C++ [expr.sizeof]p2: "When applied to a reference or a reference type,

    //   the result is the size of the referenced type."

    if (const ReferenceType *Ref = SrcTy-&>getAs&())

      SrcTy = Ref-&>getPointeeType();
    CharUnits Sizeof;

    if (!HandleSizeof(Info, E-&>getExprLoc(), SrcTy, Sizeof))

      return false;

    return Success(Sizeof, E);

  }

  }

llvm_unreachable("unknown expr/type trait"); }

然後通過這個方法，我們可以順藤摸瓜，發現sizeof的處理其實是在HandleSizeof這個方法內，結果是會存儲在Sizeof這個CharUnits中，而一個CharUnits是Clang內部的一個表示，引用Clang的注釋如下

/// CharUnits - This is an opaque type for sizes expressed in character units. /// Instances of this type represent a quantity as a multiple of the size /// of the standard C type, char, on the target architecture. As an opaque /// type, CharUnits protects you from accidentally combining operations on /// quantities in bit units and character units. /// /// In both C and C++, an object of type "char", "signed char", or "unsigned /// char" occupies exactly one byte, so "character unit" and "byte" refer to /// the same quantity of storage. However, we use the term "character unit" /// rather than "byte" to avoid an implication that a character unit is /// exactly 8 bits. /// /// For portability, never assume that a target character is 8 bits wide. Use /// CharUnit values wherever you calculate sizes, offsets, or alignments /// in character units.

然後，我們找尋HandleSizeof方法：

/// Get the size of the given type in char units. static bool HandleSizeof(EvalInfo Info, SourceLocation Loc, QualType Type, CharUnits Size) { // sizeof(void), __alignof__(void), sizeof(function) = 1 as a gcc // extension. if (Type-&>isVoidType() || Type-&>isFunctionType()) { Size = CharUnits::One(); return true; }


  if (!Type-&>isConstantSizeType()) {

    // sizeof(vla) is not a constantexpr: C99 6.5.3.4p2.

    // FIXME: Better diagnostic.

    Info.Diag(Loc);

    return false;

  }

Size = Info.Ctx.getTypeSizeInChars(Type); return true; }

走到這裡，我們就知道了為什麼會被替換掉了，如你這裡是void或者Function type，編譯器都直接替換為CharUnits::One()這個常量（即一個Char的大小），所以這就是彙編也只能看到常量的原因，畢竟彙編是後面CodeGen的事情，而這裡是在CodeGen之前發生的了。而在這裡也會判斷Type是不是ConstantSizeType，因為需要在編譯期計算出來，而注釋則是針對VLA，有興趣的同學可以按照注釋的C99地方去看說的是什麼。接下來則是把Type傳給getTypeSizeInChars方法了。

OK，接下來我們再一步一步的走下去，看getTypeSizeInChars做了什麼。

/// getTypeSizeInChars - Return the size of the specified type, in characters. /// This method does not work on incomplete types. CharUnits ASTContext::getTypeSizeInChars(QualType T) const { return getTypeInfoInChars(T).first; }

走到這裡的時候，雖然我們就算不走下去都能知道這個方法是返回特定類型的大小了，但是我們還是要打破沙鍋問到底，看到底是怎麼實現的。於是我們繼續走getTypeInfoChars()這個方法。

std::pair& ASTContext::getTypeInfoInChars(QualType T) const { return getTypeInfoInChars(T.getTypePtr()); }

走到這裡，我們也知道為什麼會有first了，因為這個方法返回的是一個std::pair，接下來我們可以發現調用的還是getTypeInChar方法，但是參數一個TypePointers，於是我們找這個重載方法：

std::pair& ASTContext::getTypeInfoInChars(const Type *T) const { if (const ConstantArrayType *CAT = dyn_cast&(T)) return getConstantArrayInfoInChars(*this, CAT); TypeInfo Info = getTypeInfo(T); return std::make_pair(toCharUnitsFromBits(Info.Width), toCharUnitsFromBits(Info.Align)); }

隨後，我們可以發現是getTypeInfo這個方法，然後我們找到對應的代碼：

TypeInfo ASTContext::getTypeInfo(const Type *T) const { TypeInfoMap::iterator I = MemoizedTypeInfo.find(T); if (I != MemoizedTypeInfo.end()) return I-&>second;

// This call can invalidate MemoizedTypeInfo[T], so we need a second lookup. TypeInfo TI = getTypeInfoImpl(T); MemoizedTypeInfo[T] = TI; return TI; }

然後我們找到了這個，對於MemorizedTypeInfo我們暫時不需要關心，我們也能發現需要的東西其實在getTypeInfoImpl裡面

/// getTypeInfoImpl - Return the size of the specified type, in bits. This /// method does not work on incomplete types. /// /// FIXME: Pointers into different addr spaces could have different sizes and /// alignment requirements: getPointerInfo should take an AddrSpace, this /// should take a QualType, c. TypeInfo ASTContext::getTypeInfoImpl(const Type *T) const { uint64_t Width = 0; unsigned Align = 8; bool AlignIsRequired = false; switch (T-&>getTypeClass()) { #define TYPE(Class, Base) #define ABSTRACT_TYPE(Class, Base) #define NON_CANONICAL_TYPE(Class, Base) #define DEPENDENT_TYPE(Class, Base) case Type::Class: #define NON_CANONICAL_UNLESS_DEPENDENT_TYPE(Class, Base) case Type::Class: assert(!T-&>isDependentType() "should not see dependent types here"); return getTypeInfo(cast&(T)-&>desugar().getTypePtr()); #include "clang/AST/TypeNodes.def" llvm_unreachable("Should not see dependent types");


  case Type::FunctionNoProto:

  case Type::FunctionProto:

    // GCC extension: alignof(function) = 32 bits

    Width = 0;

    Align = 32;

    break;
  case Type::IncompleteArray:

  case Type::VariableArray:

    Width = 0;

    Align = getTypeAlign(cast&(T)-&>getElementType());

    break;
  case Type::ConstantArray: {

    const ConstantArrayType *CAT = cast&(T);
    TypeInfo EltInfo = getTypeInfo(CAT-&>getElementType());

    uint64_t Size = CAT-&>getSize().getZExtValue();

    assert((Size == 0 || EltInfo.Width &<= (uint64_t)(-1) / Size) 
           "Overflow in array type bit size evaluation");
    Width = EltInfo.Width * Size;
    Align = EltInfo.Align;
    if (!getTargetInfo().getCXXABI().isMicrosoft() ||
        getTargetInfo().getPointerWidth(0) == 64)
      Width = llvm::RoundUpToAlignment(Width, Align);
    break;
  }
  case Type::ExtVector:
  case Type::Vector: {
    const VectorType *VT = cast&(T);

    TypeInfo EltInfo = getTypeInfo(VT-&>getElementType());

    Width = EltInfo.Width * VT-&>getNumElements();

    Align = Width;

    // If the alignment is not a power of 2, round up to the next power of 2.

    // This happens for non-power-of-2 length vectors.

    if (Align  (Align-1)) {

      Align = llvm::NextPowerOf2(Align);

      Width = llvm::RoundUpToAlignment(Width, Align);

    }

    // Adjust the alignment based on the target max.

    uint64_t TargetVectorAlign = Target-&>getMaxVectorAlign();

    if (TargetVectorAlign  TargetVectorAlign &< Align)
      Align = TargetVectorAlign;
    break;
  }

  case Type::Builtin:
    switch (cast&(T)-&>getKind()) {

    default: llvm_unreachable("Unknown builtin type!");

    case BuiltinType::Void:

      // GCC extension: alignof(void) = 8 bits.

      Width = 0;

      Align = 8;

      break;
    case BuiltinType::Bool:

      Width = Target-&>getBoolWidth();

      Align = Target-&>getBoolAlign();

      break;

    case BuiltinType::Char_S:

    case BuiltinType::Char_U:

    case BuiltinType::UChar:

    case BuiltinType::SChar:

      Width = Target-&>getCharWidth();

      Align = Target-&>getCharAlign();

      break;

    case BuiltinType::WChar_S:

    case BuiltinType::WChar_U:

      Width = Target-&>getWCharWidth();

      Align = Target-&>getWCharAlign();

      break;

    case BuiltinType::Char16:

      Width = Target-&>getChar16Width();

      Align = Target-&>getChar16Align();

      break;

    case BuiltinType::Char32:

      Width = Target-&>getChar32Width();

      Align = Target-&>getChar32Align();

      break;

    case BuiltinType::UShort:

    case BuiltinType::Short:

      Width = Target-&>getShortWidth();

      Align = Target-&>getShortAlign();

      break;

    case BuiltinType::UInt:

    case BuiltinType::Int:

      Width = Target-&>getIntWidth();

      Align = Target-&>getIntAlign();

      break;

    case BuiltinType::ULong:

    case BuiltinType::Long:

      Width = Target-&>getLongWidth();

      Align = Target-&>getLongAlign();

      break;

    case BuiltinType::ULongLong:

    case BuiltinType::LongLong:

      Width = Target-&>getLongLongWidth();

      Align = Target-&>getLongLongAlign();

      break;

    case BuiltinType::Int128:

    case BuiltinType::UInt128:

      Width = 128;

      Align = 128; // int128_t is 128-bit aligned on all targets.

      break;

    case BuiltinType::Half:

      Width = Target-&>getHalfWidth();

      Align = Target-&>getHalfAlign();

      break;

    case BuiltinType::Float:

      Width = Target-&>getFloatWidth();

      Align = Target-&>getFloatAlign();

      break;

    case BuiltinType::Double:

      Width = Target-&>getDoubleWidth();

      Align = Target-&>getDoubleAlign();

      break;

    case BuiltinType::LongDouble:

      Width = Target-&>getLongDoubleWidth();

      Align = Target-&>getLongDoubleAlign();

      break;

    case BuiltinType::NullPtr:

      Width = Target-&>getPointerWidth(0); // C++ 3.9.1p11: sizeof(nullptr_t)

      Align = Target-&>getPointerAlign(0); //   == sizeof(void*)

      break;

    case BuiltinType::ObjCId:

    case BuiltinType::ObjCClass:

    case BuiltinType::ObjCSel:

      Width = Target-&>getPointerWidth(0);

      Align = Target-&>getPointerAlign(0);

      break;

    case BuiltinType::OCLSampler:

      // Samplers are modeled as integers.

      Width = Target-&>getIntWidth();

      Align = Target-&>getIntAlign();

      break;

    case BuiltinType::OCLEvent:

    case BuiltinType::OCLImage1d:

    case BuiltinType::OCLImage1dArray:

    case BuiltinType::OCLImage1dBuffer:

    case BuiltinType::OCLImage2d:

    case BuiltinType::OCLImage2dArray:

    case BuiltinType::OCLImage3d:

      // Currently these types are pointers to opaque types.

      Width = Target-&>getPointerWidth(0);

      Align = Target-&>getPointerAlign(0);

      break;

    }

    break;

  case Type::ObjCObjectPointer:

    Width = Target-&>getPointerWidth(0);

    Align = Target-&>getPointerAlign(0);

    break;

  case Type::BlockPointer: {

    unsigned AS = getTargetAddressSpace(

        cast&(T)-&>getPointeeType());

    Width = Target-&>getPointerWidth(AS);

    Align = Target-&>getPointerAlign(AS);

    break;

  }

  case Type::LValueReference:

  case Type::RValueReference: {

    // alignof and sizeof should never enter this code path here, so we go

    // the pointer route.

    unsigned AS = getTargetAddressSpace(

        cast&(T)-&>getPointeeType());

    Width = Target-&>getPointerWidth(AS);

    Align = Target-&>getPointerAlign(AS);

    break;

  }

  case Type::Pointer: {

    unsigned AS = getTargetAddressSpace(cast&(T)-&>getPointeeType());

    Width = Target-&>getPointerWidth(AS);

    Align = Target-&>getPointerAlign(AS);

    break;

  }

  case Type::MemberPointer: {

    const MemberPointerType *MPT = cast&(T);

    std::tie(Width, Align) = ABI-&>getMemberPointerWidthAndAlign(MPT);

    break;

  }

  case Type::Complex: {

    // Complex types have the same alignment as their elements, but twice the

    // size.

    TypeInfo EltInfo = getTypeInfo(cast&(T)-&>getElementType());

    Width = EltInfo.Width * 2;

    Align = EltInfo.Align;

    break;

  }

  case Type::ObjCObject:

    return getTypeInfo(cast&(T)-&>getBaseType().getTypePtr());

  case Type::Adjusted:

  case Type::Decayed:

    return getTypeInfo(cast&(T)-&>getAdjustedType().getTypePtr());

  case Type::ObjCInterface: {

    const ObjCInterfaceType *ObjCI = cast&(T);

    const ASTRecordLayout Layout = getASTObjCInterfaceLayout(ObjCI-&>getDecl());

    Width = toBits(Layout.getSize());

    Align = toBits(Layout.getAlignment());

    break;

  }

  case Type::Record:

  case Type::Enum: {

    const TagType *TT = cast&(T);
    if (TT-&>getDecl()-&>isInvalidDecl()) {

      Width = 8;

      Align = 8;

      break;

    }
    if (const EnumType *ET = dyn_cast&(TT)) {

      const EnumDecl *ED = ET-&>getDecl();

      TypeInfo Info =

          getTypeInfo(ED-&>getIntegerType()-&>getUnqualifiedDesugaredType());

      if (unsigned AttrAlign = ED-&>getMaxAlignment()) {

        Info.Align = AttrAlign;

        Info.AlignIsRequired = true;

      }

      return Info;

    }
    const RecordType *RT = cast&(TT);

    const RecordDecl *RD = RT-&>getDecl();

    const ASTRecordLayout Layout = getASTRecordLayout(RD);

    Width = toBits(Layout.getSize());

    Align = toBits(Layout.getAlignment());

    AlignIsRequired = RD-&>hasAttr&();

    break;

  }
  case Type::SubstTemplateTypeParm:

    return getTypeInfo(cast&(T)-&>

                       getReplacementType().getTypePtr());
  case Type::Auto: {

    const AutoType *A = cast&(T);

    assert(!A-&>getDeducedType().isNull()

           "cannot request the size of an undeduced or dependent auto type");

    return getTypeInfo(A-&>getDeducedType().getTypePtr());

  }
  case Type::Paren:

    return getTypeInfo(cast&(T)-&>getInnerType().getTypePtr());
  case Type::Typedef: {

    const TypedefNameDecl *Typedef = cast&(T)-&>getDecl();

    TypeInfo Info = getTypeInfo(Typedef-&>getUnderlyingType().getTypePtr());

    // If the typedef has an aligned attribute on it, it overrides any computed

    // alignment we have.  This violates the GCC documentation (which says that

    // attribute(aligned) can only round up) but matches its implementation.

    if (unsigned AttrAlign = Typedef-&>getMaxAlignment()) {

      Align = AttrAlign;

      AlignIsRequired = true;

    } else {

      Align = Info.Align;

      AlignIsRequired = Info.AlignIsRequired;

    }

    Width = Info.Width;

    break;

  }
  case Type::Elaborated:

    return getTypeInfo(cast&(T)-&>getNamedType().getTypePtr());
  case Type::Attributed:

    return getTypeInfo(

                  cast&(T)-&>getEquivalentType().getTypePtr());
  case Type::Atomic: {

    // Start with the base type information.

    TypeInfo Info = getTypeInfo(cast&(T)-&>getValueType());

    Width = Info.Width;

    Align = Info.Align;
    // If the size of the type doesn"t exceed the platform"s max

    // atomic promotion width, make the size and alignment more

    // favorable to atomic operations:

    if (Width != 0  Width &<= Target-&>getMaxAtomicPromoteWidth()) {

      // Round the size up to a power of 2.

      if (!llvm::isPowerOf2_64(Width))

        Width = llvm::NextPowerOf2(Width);
      // Set the alignment equal to the size.

      Align = static_cast&(Width);

    }

  }

一切真相大白了，已不需要解釋了

"abcde"的類型其實是char[6]，只是當他是"abcde"的時候，裡面的一些類型轉換有點奇怪。你說普通的char[6]可以免費轉char*，但是"abcde"就只能轉const char*。你只要明白了這個就容易理解了。

而且sizeof編譯成一個編譯期常量之後，裡面的代碼會被直接刪掉，所以sizeof(++i)裡面的++i不執行，是很正常的，你寫什麼他都不執行。同樣的有decltype。

沒搞錯的話，字元串常量放在表達式中返回的是此字元串常量的首址。這裡的b為10，不解

sizeof針對的是類型的尺寸。

第二次輸出i仍是10，"++i"為什麼沒被執行，不解

sizeof是一個編譯時完成的東西。整個sizeof表達式在編譯時被析值成那個尺寸。裡面的東西在運行時是不存在的。

原來百度百科就有答案，之前是太不相信百度百科而直接跳過了，罪過！

類似於sizeof操作符，decltype也不需對其操作數求值。粗略來說，decltype(e)返回類型前，進行了如下推導：

若表達式e指向一個局部變數、命名空間作用域變數、靜態成員變數或函數參數，那麼返回類型即為該變數（或參數）的「聲明類型」；
若e是一個左值（lvalue，即「可定址值」），則decltype(e)將返回T，其中T為e的類型；
若e是一個x值（xvalue），則返回值為T；
若e是一個純右值（prvalue），則返回值為T。

「

----------------------------------------------------------------------------------

或者換種說法：

放在sizeof或decltype裡面的表達式是不執行的（完全編譯時的），僅僅表示「如果執行，返回值做sizeof/decltye」

」如果執行「:

類型是不需要實際計算就已經可以知道的東西（只有值才需要執行以取得結果，值的類型則不需要）。實際上C++裡面根本不做實際執行

看到「深入理解 sizeof 」，我只能呵呵。 sizeof 也需要深入？如果要深入那也是 c/c++ 的類型系統， sizeof 真的是「短的不能再短」了。

sizeof 是一個運算符，編譯期求一個類型 sizeof(int) 或是一個表達式 size(++i) 的類型的長度！

重點： (1) 運算符，非函數； (2) 編譯期求值，所以 sizeof 的結果是常量； (3) 求的是類型的長度，表達式是不需要求值的！

sizeof(4) == sizeof(i) == sizeof(++i) == 4 ，為什麼是 4！因為(某編譯器)一開始就規定的 int 是 4 個位元組呀，混蛋！

補充：(4) c/c++ 標識符先聲明後使用 + 靜態類型意味著：任何一個表達式 (*p++)[0]-&>a 本身從聲明和解析上就能知道類型，不需要計算出結果才知道呀，哎喲喂！！

所以回到題目的問題：

第2個問題： sizeof(++i) 中 i 值為什麼沒有 +1 ，表達式都沒有執行！ sizeof(++i) 和 sizeof(i) 和 sizeof(int) 沒有區別！

第1個問題： sizeof("123456789") 為什麼是 10 ? 因為 "123456789" 的類型是 const char[10] ！(注意 "123456789"後面有一個默默無聞的 "" 字元) 這是一個 10 個字元的數組類型， sizeof(const char[10]) == sizeof(char)*10，所以是10 ！至於為什麼 "123456789" 的類型是 const char[10]，那不是 sizeof 的問題，是 c/c++ 類型系統就是這麼規定的！

sizeof("12345689"+1) == 10 ???!!!! 如果結果是 10，只能說編譯器有問題吧！"123456789"+1 的類型是 char*， 所以 sizeof("123456789"+1) == sizeof(char*) == sizeof(int*) == 4 or 8 ！！！(10 不能說不對，但是感覺略奇葩，就像 c 中 sizeof(char) 其實可以不等於 1，一個位元組可以不是 8 bit 一樣！ )

這裡就不得不提 c/c++ 中的類型轉換，除了 sizeof 和一個數組 T[N] 之外譬如 sizeof(char[5])，其他時候數組類型 T[N] 都隱式轉換為 T* ， so, "123456789"+1 的類型是 char* !

c/c++ 的類型以及類型轉換這裡省略一篇作文的長度，%&>_&<%

sizeof由編譯器實現，strlen由標準庫實現。你可以寫int a[sizeof(int)];但strlen不能，可以自己研究

1、b應該等於sizeof( std::ptrdiff_t )，通常這個值是4或者8；

2、姑且不說實現，C++標準要求不對用作sizeof、alignof、decltype、typeid、noexcept操作數的表達式求值；

我剛才說的noexcept不是noexcept specification，而是noexcept operator；

sizeof 這個編譯時函數的目的是得到一個變數或類型佔用的位元組數，它的求值是由編譯器完成時，沒有運行時邏輯。像「123456789」這樣的字元串字面量，類型為char[10]，因為C的字元串需要末尾有一個來表示結束的。所以b的結果為10應該是一個bug。

strlen 是一個C的庫函數，是運行時的求值過程，這個應該很好理解。

自己做實驗研究居然用vc6？

看到第一行就不想看下去了