GCC中-O1 -O2 -O3 優化的原理是什麼？

01-11

做OI時候發現C++開了O2優化後可以在不超時的情況下隨便用STL，非常好奇這是怎麼實現的？

俺來打個醬油。後面有誰來展開解答這個話題就功德無量了。

@vczh 大大講了C++的語言和STL的一側。這隻覆蓋了要優化的對象但沒覆蓋如何優化。

@空明流轉大大講了編譯器優化話題太巨大。話題是巨大，不過總有切入點。

樓主只是想大概知道些名詞但不想太深入的話，可以先從GCC的文檔入手：https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

這份文檔講解了每層-O參數對應哪些具體的優化。每個具體的優化參數是「-f + 優化名」，例如-fdce的優化名是DCE，也就是dead code elimination。

知道優化的名字就可以去搜搜相關資料了。唯一的問題就是GCC用的這些優化名稱可能跟教科書叫法不一樣? 例如-fmove-loop-invariant在其它資料里可能更多叫做loop-invariant code motion（LICM）。

更高層一點，還可以從文檔中了解一下GCC的工作流程：https://gcc.gnu.org/onlinedocs/gccint/Passes.html

一下內容摘自Using the GNU Compiler Collection (GCC)

一般來說，如果不指定優化標識的話，gcc就會產生可調試代碼，每條指令之間將是獨立的：可以在指令之間設置斷點，使用gdb中的 p命令查看變數的值，改變變數的值等。並且把獲取最快的編譯速度作為它的目標。

當優化標識被啟用之後，gcc編譯器將會試圖改變程序的結構（當然會在保證變換之後的程序與源程序語義等價的前提之下），以滿足某些目標，如：代碼大小最小或運行速度更快（只不過通常來說，這兩個目標是矛盾的，二者不可兼得）。

在不同的gcc配置和目標平台下，同一個標識所採用的優化種類也是不一樣的，這可以使用-Q --help =optimizers來獲取每個優化標識所啟用的優化選項。

下面每個-f**優化標識都可以在上述鏈接中找到解釋

1.-O，-O1：

這兩個命令的效果是一樣的，目的都是在不影響編譯速度的前提下，盡量採用一些優化演算法降低代碼大小和可執行代碼的運行速度。並開啟如下的優化選項：

-fauto-inc-dec -fbranch-count-reg -fcombine-stack-adjustments -fcompare-elim -fcprop-registers -fdce -fdefer-pop -fdelayed-branch -fdse -fforward-propagate -fguess-branch-probability -fif-conversion2 -fif-conversion -finline-functions-called-once -fipa-pure-const -fipa-profile -fipa-reference -fmerge-constants -fmove-loop-invariants -freorder-blocks -fshrink-wrap -fshrink-wrap-separate -fsplit-wide-types -fssa-backprop -fssa-phiopt -fstore-merging -ftree-bit-ccp -ftree-ccp -ftree-ch -ftree-coalesce-vars -ftree-copy-prop -ftree-dce -ftree-dominator-opts -ftree-dse -ftree-forwprop -ftree-fre -ftree-phiprop -ftree-sink -ftree-slsr -ftree-sra -ftree-pta -ftree-ter -funit-at-a-time

2. -O2

該優化選項會犧牲部分編譯速度，除了執行-O1所執行的所有優化之外，還會採用幾乎所有的目標配置支持的優化演算法，用以提高目標代碼的運行速度。

-fthread-jumps -falign-functions -falign-jumps -falign-loops -falign-labels -fcaller-saves -fcrossjumping -fcse-follow-jumps -fcse-skip-blocks -fdelete-null-pointer-checks -fdevirtualize -fdevirtualize-speculatively -fexpensive-optimizations -fgcse -fgcse-lm -fhoist-adjacent-loads -finline-small-functions -findirect-inlining -fipa-cp -fipa-cp-alignment -fipa-bit-cp -fipa-sra -fipa-icf -fisolate-erroneous-paths-dereference -flra-remat -foptimize-sibling-calls -foptimize-strlen -fpartial-inlining -fpeephole2 -freorder-blocks-algorithm=stc -freorder-blocks-and-partition -freorder-functions -frerun-cse-after-loop -fsched-interblock -fsched-spec -fschedule-insns -fschedule-insns2 -fstrict-aliasing -fstrict-overflow -ftree-builtin-call-dce -ftree-switch-conversion -ftree-tail-merge -fcode-hoisting -ftree-pre -ftree-vrp -fipa-ra

3. -O3

該選項除了執行-O2所有的優化選項之外，一般都是採取很多向量化演算法，提高代碼的並行執行程度，利用現代CPU中的流水線，Cache等。

-finline-functions // 採用一些啟發式演算法對函數進行內聯 -funswitch-loops // 執行循環unswitch變換 -fpredictive-commoning // -fgcse-after-reload //執行全局的共同子表達式消除 -ftree-loop-vectorize　 // -ftree-loop-distribute-patterns -fsplit-paths -ftree-slp-vectorize -fvect-cost-model -ftree-partial-pre -fpeel-loops -fipa-cp-clone options

這個選項會提高執行代碼的大小，當然會降低目標代碼的執行時間。

４. -Os

這個優化標識和-O3有異曲同工之妙，當然兩者的目標不一樣，-O3的目標是寧願增加目標代碼的大小，也要拚命的提高運行速度，但是這個選項是在-O2的基礎之上，盡量的降低目標代碼的大小，這對於存儲容量很小的設備來說非常重要。

為了降低目標代碼大小，會禁用下列優化選項，一般就是壓縮內存中的對齊空白(alignment padding)

-falign-functions -falign-jumps -falign-loops -falign-labels -freorder-blocks -freorder-blocks-algorithm=stc -freorder-blocks-and-partition -fprefetch-loop-arrays

5. -Ofast:

該選項將不會嚴格遵循語言標準，除了啟用所有的-O3優化選項之外，也會針對某些語言啟用部分優化。如：-ffast-math ，對於Fortran語言，還會啟用下列選項：

-fno-protect-parens -fstack-arrays

6.-Og:

該標識會精心挑選部分與-g選項不衝突的優化選項，當然就能提供合理的優化水平，同時產生較好的可調試信息和對語言標準的遵循程度。

-On(n＝0,1,2,3,也可以是其它單詞)是gcc為了一般人方便而做的設定，根據n值大小包含預設標準由低到高的一些優化選項，均為-fxxx(xxx為優化項)，但注意，即使是最高優化選項-O3，也不是包含所有的-f選項，這只是為大多數人的使用方便而預設的。

-O0: This level (that is the letter "O" followed by a zero) turns off optimization entirely and is the default if no -O level is specified in CFLAGS or CXXFLAGS. This reduces compilation time and can improve debugging info, but some applications will not work properly without optimization enabled. This option is not recommended except for debugging purposes.

-O1: the most basic optimization level. The compiler will try to produce faster, smaller code without taking much compilation time. It is basic, but it should get the job done all the time.

-O2: A step up from -O1. The recommended level of optimization unless the system has special needs. -O2 will activate a few more flags in addition to the ones activated by -O1. With -O2, the compiler will attempt to increase code performance without compromising on size, and without taking too much compilation time.

-O3: the highest level of optimization possible. It enables optimizations that are expensive in terms of compile time and memory usage. Compiling with -O3 is not a guaranteed way to improve performance, and in fact, in many cases, can slow down a system due to larger binaries and increased memory usage. -O3 is also known to break several packages. Using -O3 is not recommended.

-Os: optimizes code for size. It activates all -O2 options that do not increase the size of the generated code. It can be useful for machines that have extremely limited disk storage space and/or CPUs with small cache sizes.

-Og: In GCC 4.8, a new general optimization level, -Og, has been introduced. It addresses the need for fast compilation and a superior debugging experience while providing a reasonable level of runtime performance. Overall experience for development should be better than the default optimization level -O0. Note that -Og does not imply -g, it simply disables optimizations that may interfere with debugging.

-Ofast: New in GCC 4.7, consists of -O3 plus -ffast-math, -fno-protect-parens, and -fstack-arrays. This option breaks strict standards compliance, and is not recommended for use.

你開了-O2，stl里大量的函數都被inline了，故而比原來快好多，毋庸置疑，stl沒有inline基本上是威力全無，切記。

雖然-O3的建議程度不比-O2，但個人推薦-O3作為release版本的選項，因為它開啟了重要的-finline-functions，對於實在是要在乎code size的情況，配合使用noinline。這總比費勁心思profiling然後寫inline函數來得簡單。

不不，你完全理解反了。開著-O2、-O3才是人家期望你用來生產最終產品的用法，無優化是給你調試用的。C++標準容器本來就應當是（在開啟優化的時候）幾乎不產生額外代價的。

C++的template就設計成，只要你開了優化就可以把封裝帶來的中間層去掉的形式。因此STL拚命的抽象，然後你開了O2，這些抽象編譯後就「不佔地方」了，就跟你直接用C為每個類型精心設計的容器類型一樣。當然C++顯然方便多了。但是你不開O2，你就能感受到那些抽象帶來的效果。

有區別的一個個試過來就是了嘛，比如

gcc -Q -O1 --help=optimizers

我只知道gcc -O4 優化

把代碼發給Jeff Dean重寫

總體來說，優化的目標主要是更小的目標文件和更短的運行速度。此外，編譯本身需要的內存和處理器資源也是需要考慮的。

不幸的是，很多時候這些目標是互相矛盾的，所以有了O[s123]這些選項。比如，O3會做更激進的循環展開, 函數內聯, 更耗時的指針別名分析, 更複雜的指令模式匹配 (比如指令的向量化), 更耗時的指令調度, 寄存器分配。甚至做好幾遍：寄存器分配前調度一次，之後再調度一次。

最基本的O0就是最簡單的編譯，通常用於Debug版本。

更高級的還有鏈接時優化(全程序)、PGO

舉個不恰當的例子，編譯器和這個凈水器類似，包含有很多代碼優化模塊，但是為了不讓你選擇到底打開哪些優化模塊（給你選你未必也看得懂），所以一般有這麼幾種選擇:自來水，礦泉水，純凈水，蒸餾水，恆大冰泉，依雲。這就夠了。