彙編中的alloc的無用stack內存是用來幹嘛的，或者是為什麼會產生的？

01-08

第4行和第15行是用來幹嘛的，怎麼產生的這些彙編碼？

同時為什麼不用ecx,edx caller寄存器 ,這樣不就少了多次pop，push操作進出stack么,是因為第一行先聲明前面兩個變數了么

啊。原來是CS:APP1e中第三章的fib_rec例子。太久遠都不記得這個版本里的描述了…

這個版本的CS:APP在演示C代碼與對應的彙編代碼時，用的是GCC 2.95.3來編譯，用的是gcc -O2來編譯的。

其實指定個 -mpreferred-stack-boundary=2 參數的話（意思是棧幀要2^2 == 4位元組對齊；默認值等價於設定該參數為4，也就是2^4 == 16位元組對齊），這個例子多半就不會有那些「多餘的棧空間」問題了。我現在手上沒有環境能直接測試GCC 2.95.3在Linux/i386上的行為，不過在GCC 4.4.7上可以看到這個參數有效果的。

@aoumeior 的回答提醒了我，CS:APP2e原版的第226頁其實有這塊的說明：

Aside Why does gcc allocate space that never gets used?
We see that the code generated by gcc for caller allocates 24 bytes on the stack even though it only makes use of 16 of them. We will see many examples of this apparent wastefulness. gcc adheres to an x86 programming guideline that the total stack space used by the function should be a multiple of 16 bytes. Including the 4 bytes for the saved value of %ebp and the 4 bytes for the return address, caller uses a total of 32 bytes. The motivation for this convention is to ensure a proper alignment for accessing data. We will explain the reason for having alignment conventions and how they are implemented in Section 3.9.3.

然後第249頁：

Aside A case of mandatory alignment
For most IA32 instructions, keeping data aligned improves efficiency, but it does not affect program behavior. On the other hand, some of the SSE instructions for implementing multimedia operations will not work correctly with unaligned data. These instructions operate on 16-byte blocks of data, and the instructions that transfer data between the SSE unit and memory require the memory addresses to be multiples of 16. Any attempt to access memory with an address that does not satisfy this alignment will lead to an exception, with the default behavior for the program to terminate.
This is the motivation behind the IA32 convention of making sure that every stack frame is a multiple of 16 bytes long (see the aside of page 226). The compiler can allocate storage within a stack frame in such a way that a block can be stored with a 16-byte alignment.

演示一下，在GCC 4.4.7上，使用該參數前，sub esp, 24 說明這個函數的棧幀大小是24位元組，其中包括了callee-save register的save空間（4 bytes * 2 == 8 bytes、傳參數的空間（outgoing argument，4 byte * 1 == 4 bytes），以及額外的padding（4 bytes * 3 == 12 bytes）：

此時棧幀布局是這樣的：

esp -&> [ outgoing argument 0 ] esp+0 / ebp-24 [ padding 2 ] ebp-20 [ padding 1 ] ebp-16 [ padding 0 ] ebp-12 [ callee saved ebx ] ebp-8 [ callee saved esi ] ebp-4 ebp -&> [ old frame pointer ] ebp+0 [ return address ] ebp+4 [ incoming argument 0 ] ebp+8

這跟CS:APP1e里的例子的棧幀布局雖然還是不完全相同，但是在「存在padding」意義上說是相似的。注意這個例子在GCC 4.4.7上要用-O1來編譯得到的代碼才會跟GCC 2.95.3用-O2相似…orz

在使用了 -mpreferred-stack-boundary=2 參數後，sub esp, 12說明加參數前的padding全部沒有了。正好跟題主的例子中第10行和第15行分配的12位元組padding一致：

此時的棧幀布局為：

esp -&> [ outgoing argument 0 ] esp+0 / ebp-12 [ callee saved ebx ] ebp-8 [ callee saved esi ] ebp-4 ebp -&> [ old frame pointer ] ebp+0 [ return address ] ebp+4 [ incoming argument 0 ] ebp+8

等我有機會找個環境驗證一下GCC 2.95.3的實際情況再回來更新哈。GCC 2.95系列是個優化程度還頗低的老版本…開頭的sub esp, 16到底是啥，是否能通過 preferred-stack-boundary 參數來控制，我也不是很確定。

=====================================

在GCC 2.95.3的源碼中，一小塊相關的處理在：

gcc/config/i386/i386.c

392 /* Validate -mpreferred_stack_boundary= value, or provide default. 393 The default of 128 bits is for Pentium III"s SSE __m128. */ 394 i386_preferred_stack_boundary = 128; 395 if (i386_preferred_stack_boundary_string) 396 { 397 i = atoi (i386_preferred_stack_boundary_string); 398 if (i &< 2 || i &> 31) 399 fatal ("-mpreferred_stack_boundary=%d is not between 2 and 31", i); 400 i386_preferred_stack_boundary = (1 &<&< i) * BITS_PER_UNIT; 401 }

然後在同一文件中的 ix86_compute_frame_size() 有使用這個參數來計算棧幀大小。

這個參數的意思可以參考文檔：

i386 and x86-64 Options

-mpreferred-stack-boundary=num
Attempt to keep the stack boundary aligned to a 2 raised to num byte boundary. If -mpreferred-stack-boundary is not specified, the default is 4 (16 bytes or 128 bits).

以上～

我想說明一下，樓上RednaxelaFX 所說的16位元組棧對齊未必是真正的緣故。至少同樣的理論不能解釋別的情況。 @RednaxelaFX

這裡先假設i386下按16位元組對齊，我們以此為依據進一步推理。

首先「stack align」是一個很多人都沒說清楚的概念：如果要求16byte對齊，（I386下）到底是指函數入口的esp的值是16的倍數呢，還是指進入函數後push ebp後，esp的值需要是16的倍數？由於i386下面函數入口總會push ebp一下，鄙人當年（業務背景還是有點商業機密，這句話刪除掉），被這些說法搞的暈乎乎的。。。

這裡我用實際的反彙編代碼來說解。

代碼：

int z =0; __attribute__((noinline)) void func(long long x , int y ){z=2;} int main (){ func(0,0); return 0; }

先編譯2個實例a1和a2，並且給看下gdb和gcc版本

這2個實例只有優化選項的不同（一個有O2,一個沒有O2），毫無疑問，他們應該遵循相同的棧對齊規則吧。此外，如果是「函數入口的esp的值是16的倍數」，那麼函數入口列印出來的esp的值應該16進位末位是0；如果是「進入函數後push ebp後，esp的值需要是16的倍數」，esp的值應該16進位末位是4。不過不管怎麼說，a1和a2都應該是一種情況（不可能O2編譯的二進位文件不能鏈接O0編譯的二進位文件吧？）。

結果是這樣的：

1個是4，1個是8？這....難道不是按照16位元組對齊的？等等，難道是函數調用者和被調用者放一起了導致的GCC優化（GCC覺得這個函數就自己一個文件內部調用，然後就不按規矩對齊？）？或者gcc執行了葉函數優化？

分成t.c和 t1.c 並且加了一個printf調用的結果進一步論證了I386 gcc默認不是16位元組對齊的

4，8，c 都出現了。。毫無疑問：linux gcc i386函數調用堆棧是按4位元組對齊的（不用理睬到底是push ebp之後對齊還是之前，4位元組對齊這個問題不存在了）。遵守system V i386 abi。sco.com 的頁面

蘋果i386是個例外，是16位元組對齊的，我猜主要是

1.蘋果轉x86晚，沒有必要繼承一些包袱。

2.為了方便sizeof(long double)==16。

這裡也有說IA-32 Function Calling Conventions

那麼樓上RednaxelaFX 說如果不是16byte對齊，如何支持SSE呢？

這個問題太簡單了，經常去反彙編main函數的人絕對更熟悉（因為過去的I386可執行程序是可在別的系統上啟動的，main函數入口的棧是有未對齊的風險的）：

and $0xfffffff0,%esp

手工執行一下對齊即可。

--------------------------------------------------------------------------------------------------------------------------------------

很抱歉了問題主，我的回答是用來回應RednaxelaFX的回答（本來打算評論他的回答，但是我打算還是貼一些圖來做例證），你的問題主要是因為我手頭沒有你的源碼，不能做有保證的推斷（我不打算空口白牙就來解釋你的情況），不能回答你的問題了深表歉意。

32位沒有優化過的臨時變數都是用棧空間的，至於用寄存器做臨時變數，代碼優化過才會有。臨時變數少，能用寄存器優化，多了就只能棧空間了。你用個數組試試int a[100];

因為每個棧幀有約定的對齊規則呀……才看完這本不久……