標籤:

C++ 的 sizeof 是怎麼實現的?

環境:win7 x86 + vc++6

網上有很多"深入理解sizeof",都看了,還是不理解

int a, b, c, d;

a = sizeof("123456789"); //a為10

b = sizeof("123456789"+1); //b也為10

c = strlen("123456789"); //c為9

d = strlen("123456789"+1); //d為8

沒搞錯的話,字元串常量放在表達式中返回的是此字元串常量的首址。這裡的b為10,不解

------------------------------------------------------------------------------------------------------------------

有如下代碼片段:

int i = 10;

cout &<&< i &<&

cout &<&< sizeof(++i) &<&< endl;

cout &<&< i &<&第二次輸出i仍是10,"++i"為什麼沒被執行,不解

--------------------------------------

綜上2點,我想知道c++ sizeof 的實現大概是什麼樣的?為什麼會出現上面2種現象?


sizeof的東西會被編譯器直接替換掉,即使是彙編代碼都只能看到一個常量,所以下面有童鞋說看反彙編源碼是不行的,因為已經在編譯器內部替換掉了(更嚴謹的說法是,VLA是特殊情況,這是後面的代碼說明中有提到)。下面以Clang對sizeof的處理來看sizeof的實現。

在Clang的實現中,在lib/AST/ExprConstant.cpp中有這樣的方法:

bool IntExprEvaluator::VisitUnaryExprOrTypeTraitExpr

這個方法的實現如此:

switch(E-&>getKind()) {
case UETT_AlignOf: {
if (E-&>isArgumentType())
return Success(GetAlignOfType(Info, E-&>getArgumentType()), E);
else
return Success(GetAlignOfExpr(Info, E-&>getArgumentExpr()), E);
}

case UETT_VecStep: {
QualType Ty = E-&>getTypeOfArgument();

if (Ty-&>isVectorType()) {
unsigned n = Ty-&>castAs&()-&>getNumElements();

// The vec_step built-in functions that take a 3-component
// vector return 4. (OpenCL 1.1 spec 6.11.12)
if (n == 3)
n = 4;

return Success(n, E);
} else
return Success(1, E);
}

case UETT_SizeOf: {
QualType SrcTy = E-&>getTypeOfArgument();
// C++ [expr.sizeof]p2: "When applied to a reference or a reference type,
// the result is the size of the referenced type."
if (const ReferenceType *Ref = SrcTy-&>getAs&())
SrcTy = Ref-&>getPointeeType();

CharUnits Sizeof;
if (!HandleSizeof(Info, E-&>getExprLoc(), SrcTy, Sizeof))
return false;
return Success(Sizeof, E);
}
}

llvm_unreachable("unknown expr/type trait");
}

然後通過這個方法,我們可以順藤摸瓜,發現sizeof的處理其實是在HandleSizeof這個方法內,結果是會存儲在Sizeof這個CharUnits中,而一個CharUnits是Clang內部的一個表示,引用Clang的注釋如下

/// CharUnits - This is an opaque type for sizes expressed in character units.
/// Instances of this type represent a quantity as a multiple of the size
/// of the standard C type, char, on the target architecture. As an opaque
/// type, CharUnits protects you from accidentally combining operations on
/// quantities in bit units and character units.
///
/// In both C and C++, an object of type "char", "signed char", or "unsigned
/// char" occupies exactly one byte, so "character unit" and "byte" refer to
/// the same quantity of storage. However, we use the term "character unit"
/// rather than "byte" to avoid an implication that a character unit is
/// exactly 8 bits.
///
/// For portability, never assume that a target character is 8 bits wide. Use
/// CharUnit values wherever you calculate sizes, offsets, or alignments
/// in character units.

然後,我們找尋HandleSizeof方法:

/// Get the size of the given type in char units.
static bool HandleSizeof(EvalInfo Info, SourceLocation Loc,
QualType Type, CharUnits Size) {
// sizeof(void), __alignof__(void), sizeof(function) = 1 as a gcc
// extension.
if (Type-&>isVoidType() || Type-&>isFunctionType()) {
Size = CharUnits::One();
return true;
}

if (!Type-&>isConstantSizeType()) {
// sizeof(vla) is not a constantexpr: C99 6.5.3.4p2.
// FIXME: Better diagnostic.
Info.Diag(Loc);
return false;
}

Size = Info.Ctx.getTypeSizeInChars(Type);
return true;
}

走到這裡,我們就知道了為什麼會被替換掉了,如你這裡是void或者Function type,編譯器都直接替換為CharUnits::One()這個常量(即一個Char的大小),所以這就是彙編也只能看到常量的原因,畢竟彙編是後面CodeGen的事情,而這裡是在CodeGen之前發生的了。而在這裡也會判斷Type是不是ConstantSizeType,因為需要在編譯期計算出來,而注釋則是針對VLA,有興趣的同學可以按照注釋的C99地方去看說的是什麼。接下來則是把Type傳給getTypeSizeInChars方法了。

OK,接下來我們再一步一步的走下去,看getTypeSizeInChars做了什麼。

/// getTypeSizeInChars - Return the size of the specified type, in characters.
/// This method does not work on incomplete types.
CharUnits ASTContext::getTypeSizeInChars(QualType T) const {
return getTypeInfoInChars(T).first;
}

走到這裡的時候,雖然我們就算不走下去都能知道這個方法是返回特定類型的大小了,但是我們還是要打破沙鍋問到底,看到底是怎麼實現的。於是我們繼續走getTypeInfoChars()這個方法。

std::pair&
ASTContext::getTypeInfoInChars(QualType T) const {
return getTypeInfoInChars(T.getTypePtr());
}

走到這裡,我們也知道為什麼會有first了,因為這個方法返回的是一個std::pair,接下來我們可以發現調用的還是getTypeInChar方法,但是參數一個TypePointers,於是我們找這個重載方法:

std::pair&
ASTContext::getTypeInfoInChars(const Type *T) const {
if (const ConstantArrayType *CAT = dyn_cast&(T))
return getConstantArrayInfoInChars(*this, CAT);
TypeInfo Info = getTypeInfo(T);
return std::make_pair(toCharUnitsFromBits(Info.Width),
toCharUnitsFromBits(Info.Align));
}

隨後,我們可以發現是getTypeInfo這個方法,然後我們找到對應的代碼:

TypeInfo ASTContext::getTypeInfo(const Type *T) const {
TypeInfoMap::iterator I = MemoizedTypeInfo.find(T);
if (I != MemoizedTypeInfo.end())
return I-&>second;

// This call can invalidate MemoizedTypeInfo[T], so we need a second lookup.
TypeInfo TI = getTypeInfoImpl(T);
MemoizedTypeInfo[T] = TI;
return TI;
}

然後我們找到了這個,對於MemorizedTypeInfo我們暫時不需要關心,我們也能發現需要的東西其實在getTypeInfoImpl裡面

/// getTypeInfoImpl - Return the size of the specified type, in bits. This
/// method does not work on incomplete types.
///
/// FIXME: Pointers into different addr spaces could have different sizes and
/// alignment requirements: getPointerInfo should take an AddrSpace, this
/// should take a QualType, c.
TypeInfo ASTContext::getTypeInfoImpl(const Type *T) const {
uint64_t Width = 0;
unsigned Align = 8;
bool AlignIsRequired = false;
switch (T-&>getTypeClass()) {
#define TYPE(Class, Base)
#define ABSTRACT_TYPE(Class, Base)
#define NON_CANONICAL_TYPE(Class, Base)
#define DEPENDENT_TYPE(Class, Base) case Type::Class:
#define NON_CANONICAL_UNLESS_DEPENDENT_TYPE(Class, Base)
case Type::Class:
assert(!T-&>isDependentType() "should not see dependent types here");
return getTypeInfo(cast&(T)-&>desugar().getTypePtr());
#include "clang/AST/TypeNodes.def"
llvm_unreachable("Should not see dependent types");

case Type::FunctionNoProto:
case Type::FunctionProto:
// GCC extension: alignof(function) = 32 bits
Width = 0;
Align = 32;
break;

case Type::IncompleteArray:
case Type::VariableArray:
Width = 0;
Align = getTypeAlign(cast&(T)-&>getElementType());
break;

case Type::ConstantArray: {
const ConstantArrayType *CAT = cast&(T);

TypeInfo EltInfo = getTypeInfo(CAT-&>getElementType());
uint64_t Size = CAT-&>getSize().getZExtValue();
assert((Size == 0 || EltInfo.Width &<= (uint64_t)(-1) / Size) "Overflow in array type bit size evaluation"); Width = EltInfo.Width * Size; Align = EltInfo.Align; if (!getTargetInfo().getCXXABI().isMicrosoft() || getTargetInfo().getPointerWidth(0) == 64) Width = llvm::RoundUpToAlignment(Width, Align); break; } case Type::ExtVector: case Type::Vector: { const VectorType *VT = cast&(T);
TypeInfo EltInfo = getTypeInfo(VT-&>getElementType());
Width = EltInfo.Width * VT-&>getNumElements();
Align = Width;
// If the alignment is not a power of 2, round up to the next power of 2.
// This happens for non-power-of-2 length vectors.
if (Align (Align-1)) {
Align = llvm::NextPowerOf2(Align);
Width = llvm::RoundUpToAlignment(Width, Align);
}
// Adjust the alignment based on the target max.
uint64_t TargetVectorAlign = Target-&>getMaxVectorAlign();
if (TargetVectorAlign TargetVectorAlign &< Align) Align = TargetVectorAlign; break; } case Type::Builtin: switch (cast&(T)-&>getKind()) {
default: llvm_unreachable("Unknown builtin type!");
case BuiltinType::Void:
// GCC extension: alignof(void) = 8 bits.
Width = 0;
Align = 8;
break;

case BuiltinType::Bool:
Width = Target-&>getBoolWidth();
Align = Target-&>getBoolAlign();
break;
case BuiltinType::Char_S:
case BuiltinType::Char_U:
case BuiltinType::UChar:
case BuiltinType::SChar:
Width = Target-&>getCharWidth();
Align = Target-&>getCharAlign();
break;
case BuiltinType::WChar_S:
case BuiltinType::WChar_U:
Width = Target-&>getWCharWidth();
Align = Target-&>getWCharAlign();
break;
case BuiltinType::Char16:
Width = Target-&>getChar16Width();
Align = Target-&>getChar16Align();
break;
case BuiltinType::Char32:
Width = Target-&>getChar32Width();
Align = Target-&>getChar32Align();
break;
case BuiltinType::UShort:
case BuiltinType::Short:
Width = Target-&>getShortWidth();
Align = Target-&>getShortAlign();
break;
case BuiltinType::UInt:
case BuiltinType::Int:
Width = Target-&>getIntWidth();
Align = Target-&>getIntAlign();
break;
case BuiltinType::ULong:
case BuiltinType::Long:
Width = Target-&>getLongWidth();
Align = Target-&>getLongAlign();
break;
case BuiltinType::ULongLong:
case BuiltinType::LongLong:
Width = Target-&>getLongLongWidth();
Align = Target-&>getLongLongAlign();
break;
case BuiltinType::Int128:
case BuiltinType::UInt128:
Width = 128;
Align = 128; // int128_t is 128-bit aligned on all targets.
break;
case BuiltinType::Half:
Width = Target-&>getHalfWidth();
Align = Target-&>getHalfAlign();
break;
case BuiltinType::Float:
Width = Target-&>getFloatWidth();
Align = Target-&>getFloatAlign();
break;
case BuiltinType::Double:
Width = Target-&>getDoubleWidth();
Align = Target-&>getDoubleAlign();
break;
case BuiltinType::LongDouble:
Width = Target-&>getLongDoubleWidth();
Align = Target-&>getLongDoubleAlign();
break;
case BuiltinType::NullPtr:
Width = Target-&>getPointerWidth(0); // C++ 3.9.1p11: sizeof(nullptr_t)
Align = Target-&>getPointerAlign(0); // == sizeof(void*)
break;
case BuiltinType::ObjCId:
case BuiltinType::ObjCClass:
case BuiltinType::ObjCSel:
Width = Target-&>getPointerWidth(0);
Align = Target-&>getPointerAlign(0);
break;
case BuiltinType::OCLSampler:
// Samplers are modeled as integers.
Width = Target-&>getIntWidth();
Align = Target-&>getIntAlign();
break;
case BuiltinType::OCLEvent:
case BuiltinType::OCLImage1d:
case BuiltinType::OCLImage1dArray:
case BuiltinType::OCLImage1dBuffer:
case BuiltinType::OCLImage2d:
case BuiltinType::OCLImage2dArray:
case BuiltinType::OCLImage3d:
// Currently these types are pointers to opaque types.
Width = Target-&>getPointerWidth(0);
Align = Target-&>getPointerAlign(0);
break;
}
break;
case Type::ObjCObjectPointer:
Width = Target-&>getPointerWidth(0);
Align = Target-&>getPointerAlign(0);
break;
case Type::BlockPointer: {
unsigned AS = getTargetAddressSpace(
cast&(T)-&>getPointeeType());
Width = Target-&>getPointerWidth(AS);
Align = Target-&>getPointerAlign(AS);
break;
}
case Type::LValueReference:
case Type::RValueReference: {
// alignof and sizeof should never enter this code path here, so we go
// the pointer route.
unsigned AS = getTargetAddressSpace(
cast&(T)-&>getPointeeType());
Width = Target-&>getPointerWidth(AS);
Align = Target-&>getPointerAlign(AS);
break;
}
case Type::Pointer: {
unsigned AS = getTargetAddressSpace(cast&(T)-&>getPointeeType());
Width = Target-&>getPointerWidth(AS);
Align = Target-&>getPointerAlign(AS);
break;
}
case Type::MemberPointer: {
const MemberPointerType *MPT = cast&(T);
std::tie(Width, Align) = ABI-&>getMemberPointerWidthAndAlign(MPT);
break;
}
case Type::Complex: {
// Complex types have the same alignment as their elements, but twice the
// size.
TypeInfo EltInfo = getTypeInfo(cast&(T)-&>getElementType());
Width = EltInfo.Width * 2;
Align = EltInfo.Align;
break;
}
case Type::ObjCObject:
return getTypeInfo(cast&(T)-&>getBaseType().getTypePtr());
case Type::Adjusted:
case Type::Decayed:
return getTypeInfo(cast&(T)-&>getAdjustedType().getTypePtr());
case Type::ObjCInterface: {
const ObjCInterfaceType *ObjCI = cast&(T);
const ASTRecordLayout Layout = getASTObjCInterfaceLayout(ObjCI-&>getDecl());
Width = toBits(Layout.getSize());
Align = toBits(Layout.getAlignment());
break;
}
case Type::Record:
case Type::Enum: {
const TagType *TT = cast&(T);

if (TT-&>getDecl()-&>isInvalidDecl()) {
Width = 8;
Align = 8;
break;
}

if (const EnumType *ET = dyn_cast&(TT)) {
const EnumDecl *ED = ET-&>getDecl();
TypeInfo Info =
getTypeInfo(ED-&>getIntegerType()-&>getUnqualifiedDesugaredType());
if (unsigned AttrAlign = ED-&>getMaxAlignment()) {
Info.Align = AttrAlign;
Info.AlignIsRequired = true;
}
return Info;
}

const RecordType *RT = cast&(TT);
const RecordDecl *RD = RT-&>getDecl();
const ASTRecordLayout Layout = getASTRecordLayout(RD);
Width = toBits(Layout.getSize());
Align = toBits(Layout.getAlignment());
AlignIsRequired = RD-&>hasAttr&();
break;
}

case Type::SubstTemplateTypeParm:
return getTypeInfo(cast&(T)-&>
getReplacementType().getTypePtr());

case Type::Auto: {
const AutoType *A = cast&(T);
assert(!A-&>getDeducedType().isNull()
"cannot request the size of an undeduced or dependent auto type");
return getTypeInfo(A-&>getDeducedType().getTypePtr());
}

case Type::Paren:
return getTypeInfo(cast&(T)-&>getInnerType().getTypePtr());

case Type::Typedef: {
const TypedefNameDecl *Typedef = cast&(T)-&>getDecl();
TypeInfo Info = getTypeInfo(Typedef-&>getUnderlyingType().getTypePtr());
// If the typedef has an aligned attribute on it, it overrides any computed
// alignment we have. This violates the GCC documentation (which says that
// attribute(aligned) can only round up) but matches its implementation.
if (unsigned AttrAlign = Typedef-&>getMaxAlignment()) {
Align = AttrAlign;
AlignIsRequired = true;
} else {
Align = Info.Align;
AlignIsRequired = Info.AlignIsRequired;
}
Width = Info.Width;
break;
}

case Type::Elaborated:
return getTypeInfo(cast&(T)-&>getNamedType().getTypePtr());

case Type::Attributed:
return getTypeInfo(
cast&(T)-&>getEquivalentType().getTypePtr());

case Type::Atomic: {
// Start with the base type information.
TypeInfo Info = getTypeInfo(cast&(T)-&>getValueType());
Width = Info.Width;
Align = Info.Align;

// If the size of the type doesn"t exceed the platform"s max
// atomic promotion width, make the size and alignment more
// favorable to atomic operations:
if (Width != 0 Width &<= Target-&>getMaxAtomicPromoteWidth()) {
// Round the size up to a power of 2.
if (!llvm::isPowerOf2_64(Width))
Width = llvm::NextPowerOf2(Width);

// Set the alignment equal to the size.
Align = static_cast&(Width);
}
}

一切真相大白了,已不需要解釋了 :-)


"abcde"的類型其實是char[6],只是當他是"abcde"的時候,裡面的一些類型轉換有點奇怪。你說普通的char[6]可以免費轉char*,但是"abcde"就只能轉const char*。你只要明白了這個就容易理解了。

而且sizeof編譯成一個編譯期常量之後,裡面的代碼會被直接刪掉,所以sizeof(++i)裡面的++i不執行,是很正常的,你寫什麼他都不執行。同樣的有decltype。


沒搞錯的話,字元串常量放在表達式中返回的是此字元串常量的首址。這裡的b為10,不解

sizeof針對的是類型的尺寸。

第二次輸出i仍是10,"++i"為什麼沒被執行,不解

sizeof是一個編譯時完成的東西。整個sizeof表達式在編譯時被析值成那個尺寸。裡面的東西在運行時是不存在的。


原來百度百科就有答案,之前是太不相信百度百科而直接跳過了,罪過!

"

類似於sizeof操作符,decltype也不需對其操作數求值。粗略來說,decltype(e)返回類型前,進行了如下推導:

  1. 若表達式e指向一個局部變數、命名空間作用域變數、靜態成員變數或函數參數,那麼返回類型即為該變數(或參數)的「聲明類型」;

  2. 若e是一個左值(lvalue,即「可定址值」),則decltype(e)將返回T,其中T為e的類型;

  3. 若e是一個x值(xvalue),則返回值為T;

  4. 若e是一個純右值(prvalue),則返回值為T。

----------------------------------------------------------------------------------

或者換種說法:

放在sizeof或decltype裡面的表達式是不執行的(完全 編譯時 的),僅僅表示「如果執行,返回值做sizeof/decltye」

」如果執行「:

類型是不需要實際計算就已經可以知道的東西(只有值才需要執行以取得結果,值的類型則不需要)。實際上C++裡面根本不做實際執行


看到「深入理解 sizeof 」,我只能呵呵。 sizeof 也需要深入? 如果要深入那也是 c/c++ 的類型系統, sizeof 真的是「短的不能再短」了。

sizeof 是一個運算符,編譯期求一個類型 sizeof(int) 或是一個表達式 size(++i) 的類型的長度!

重點: (1) 運算符,非函數; (2) 編譯期求值,所以 sizeof 的結果是常量; (3) 求的是類型的長度,表達式是不需要求值的!

sizeof(4) == sizeof(i) == sizeof(++i) == 4 ,為什麼是 4! 因為(某編譯器)一開始就規定的 int 是 4 個位元組呀,混蛋!

補充:(4) c/c++ 標識符 先聲明後使用 + 靜態類型 意味著: 任何一個表達式 (*p++)[0]-&>a 本身從聲明和解析上就能知道類型,不需要計算出結果才知道呀,哎喲喂!!

所以回到題目的問題:

第2個問題: sizeof(++i) 中 i 值為什麼沒有 +1 , 表達式都沒有執行! sizeof(++i) 和 sizeof(i) 和 sizeof(int) 沒有區別!

第1個問題: sizeof("123456789") 為什麼是 10 ? 因為 "123456789" 的類型是 const char[10] !(注意 "123456789"後面有一個默默無聞的 "" 字元) 這是一個 10 個字元的數組類型, sizeof(const char[10]) == sizeof(char)*10,所以是10 !至於為什麼 "123456789" 的類型是 const char[10], 那不是 sizeof 的問題,是 c/c++ 類型系統就是這麼規定的!

sizeof("12345689"+1) == 10 ???!!!! 如果結果是 10, 只能說編譯器有問題吧 !"123456789"+1 的類型是 char*, 所以 sizeof("123456789"+1) == sizeof(char*) == sizeof(int*) == 4 or 8 !!!(10 不能說不對, 但是感覺略奇葩,就像 c 中 sizeof(char) 其實可以不等於 1, 一個位元組可以不是 8 bit 一樣! )

這裡就不得不提 c/c++ 中的類型轉換, 除了 sizeof 和 一個數組 T[N] 之外譬如 sizeof(char[5]), 其他時候數組類型 T[N] 都隱式轉換為 T* , so, "123456789"+1 的類型是 char* !

c/c++ 的類型以及類型轉換這裡省略一篇作文的長度,%&>_&<%


sizeof由編譯器實現,strlen由標準庫實現。你可以寫int a[sizeof(int)];但strlen不能,可以自己研究


1、b應該等於sizeof( std::ptrdiff_t ),通常這個值是4或者8;

2、姑且不說實現,C++標準要求不對用作sizeof、alignof、decltype、typeid、noexcept操作數的表達式求值;

我剛才說的noexcept不是noexcept specification,而是noexcept operator;


sizeof 這個編譯時函數的目的是得到一個變數或類型佔用的位元組數,它的求值是由編譯器完成時,沒有運行時邏輯。像「123456789」這樣的字元串字面量,類型為char[10],因為C的字元串需要末尾有一個來表示結束的。 所以b的結果為10應該是一個bug。

strlen 是一個C的庫函數,是運行時的求值過程,這個應該很好理解。


自己做實驗研究 居然用vc6?

看到第一行就不想看下去了


推薦閱讀:

C++20 有哪些值得注意的新特性?
剛學完c++primer的前18章,為了準備春招,現在是刷leetcode 更好,還是做些小項目?
學習 OpenGL 用哪個版本好?
如何理解 C++ 中的深拷貝和淺拷貝?
c/c++視頻教程哪個比較好? 能學下去的?

TAG:C |