C語言if與else if寫成的這樣一段代碼效率上或者編譯完成後的結構上是否有區別(主要看補充內容中的詳細)?

int compare(int a, int b)
{
if (a &< b) return -1; else if (a &> b) return 1;
else return 0;
}
int compare(int a, int b)
{
if (a &< b) return -1; if (a &> b) return 1;
return 0;
}


聯動傳送門:程序函數條件與返回的區別? - RednaxelaFX 的回答


另寫了三個函數,一併附上:

int f1(int a, int b)
{
return (a &> b) | -(b &> a);
}

int f2(int a, int b)
{
return (a &> b) - (a &< b); } int f3(int a, int b) { __asm__ __volatile__ ( "sub %1, %0 " "jno 1f " "cmc " "rcr %0 " "1: " : "+r"(a) : "r"(b) : "cc"); return a; } int compare1(int a, int b) { if (a &< b) return -1; else if (a &> b) return 1;
else return 0;
}
int compare2(int a, int b)
{
if (a &< b) return -1; if (a &> b) return 1;
return 0;
}

編譯命令用

gcc -g -O2 test.c -o test -lrt

使用隨機數生成進行測試,結果如下(ubuntu 12.04):

f1: diff:10269370, start:655642, end:10925012, ret:0
f2: diff:10187843, start:10975174, end:21163017, ret:0
f3: diff:15027995, start:21183528, end:36211523, ret:0
compare1: diff:11095015, start:36236973, end:47331988, ret:0
compare2: diff:10858906, start:47362289, end:58221195, ret:0

註:diff 即為時間差。

可以看出 f1 / f2 較佔優勢,但大多在10%以內,畢竟這些簡單演算法不會差距太大。

compare1和compare2的彙編是完全相同的,都使用了5條指令,沒有效率差別。

0x4005a0 & xor %edx,%edx
0x4005a2 & cmp %esi,%edi
0x4005a4 & mov0xffffffff,%eax
0x4005a9 & setg %dl
0x4005ac & cmovge %edx,%eax

0x4005b0 & xor %edx,%edx
0x4005b2 & cmp %esi,%edi
0x4005b4 & mov0xffffffff,%eax
0x4005b9 & setg %dl
0x4005bc & cmovge %edx,%eax

f1 使用了八條指令,但在這裡卻令人訝異的性能也不錯,應該是都是簡單指令的緣故:

xorl %eax, %eax
cmpl %edi, %esi
setg %al
xorl %edx, %edx
negl %eax
cmpl %esi, %edi
setg %dl
orl %edx, %eax

f2 使用了六條指令,性能也不錯:

xorl %eax, %eax
cmpl %esi, %edi
setl %dl
setg %al
movzbl %dl, %edx
subl %edx, %eax

注意還有一個很常見的實現:

return (a &< b) ? -1 : (a &> b);

它的彙編和 compare1 / compare2 一樣,所以不單獨列出了。

綜上:

compare1 / compare2 編譯結果相同,都僅需五條指令,但性能反而不如 f1 / f2,大概是因為其中有一條指令是cmovge,條件語句比較吃性能。

f1 和 f2 的實現都比較漂亮,而且性能較優,在實際應用中建議使用。

完整代碼庫,包含源碼、彙編、數據:

https://github.com/geekan/c-algorithm/tree/master/integer_comparison

參考:

Efficient integer compare function


先說結論:不開編譯器優化,略有區別,打開優化,完全相同

測試環境:win7 + gcc4.9.2

測試代碼

int compare1(int a, int b)

{

if (a &< b) return -1;

else if (a &> b) return 1;

else return 0;

}

int compare2(int a, int b)

{

if (a &< b) return -1;

if (a &> b) return 1;

return 0;

}

gcc -c之後objdump -d

Disassembly of section .text:

00000000 &<_compare1&>:

0: 55 push %ebp

1: 89 e5 mov %esp,%ebp

3: 8b 45 08 mov 0x8(%ebp),%eax

6: 3b 45 0c cmp 0xc(%ebp),%eax

9: 7d 07 jge 12 &<_compare1+0x12&>

b: b8 ff ff ff ff mov $0xffffffff,%eax

10: eb 14 jmp 26 &<_compare1+0x26&>

12: 8b 45 08 mov 0x8(%ebp),%eax

15: 3b 45 0c cmp 0xc(%ebp),%eax

18: 7e 07 jle 21 &<_compare1+0x21&>

1a: b8 01 00 00 00 mov $0x1,%eax

1f: eb 05 jmp 26 &<_compare1+0x26&>

21: b8 00 00 00 00 mov $0x0,%eax

26: 5d pop %ebp

27: c3 ret

00000028 &<_compare2&>:

28: 55 push %ebp

29: 89 e5 mov %esp,%ebp

2b: 8b 45 08 mov 0x8(%ebp),%eax

2e: 3b 45 0c cmp 0xc(%ebp),%eax

31: 7d 07 jge 3a &<_compare2+0x12&>

33: b8 ff ff ff ff mov $0xffffffff,%eax

38: eb 14 jmp 4e &<_compare2+0x26&>

3a: 8b 45 08 mov 0x8(%ebp),%eax

3d: 3b 45 0c cmp 0xc(%ebp),%eax

40: 7e 07 jle 49 &<_compare2+0x21&>

42: b8 01 00 00 00 mov $0x1,%eax

47: eb 05 jmp 4e &<_compare2+0x26&>

49: b8 00 00 00 00 mov $0x0,%eax

4e: 5d pop %ebp

4f: c3 ret

使用-O2優化之後

Disassembly of section .text:

00000000 &<_compare1&>:

0: 8b 44 24 08 mov 0x8(%esp),%eax

4: 39 44 24 04 cmp %eax,0x4(%esp)

8: ba ff ff ff ff mov $0xffffffff,%edx

d: 0f 9f c0 setg %al

10: 0f b6 c0 movzbl %al,%eax

13: 0f 4c c2 cmovl %edx,%eax

16: c3 ret

17: 89 f6 mov %esi,%esi

19: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi

00000020 &<_compare2&>:

20: 8b 44 24 08 mov 0x8(%esp),%eax

24: 39 44 24 04 cmp %eax,0x4(%esp)

28: ba ff ff ff ff mov $0xffffffff,%edx

2d: 0f 9f c0 setg %al

30: 0f b6 c0 movzbl %al,%eax

33: 0f 4c c2 cmovl %edx,%eax

36: c3 ret


假設源碼如圖1所示:

int compare1(int a, int b) {
if (a &< b) return -1; else if (a &> b) return 1;
else return 0;
}

int compare2(int a, int b) {
if (a &< b) return -1; if (a &> b) return 1;
return 0;
}

以下是不帶優化參數的clang37產生的LLVM IR,就是完全按照原版語義翻譯過來的(圖2),不過只要帶了O1或者以上,兩個函數產生的IR全都一樣(圖3),IR都一樣了後面的產生的彙編不出錯肯定也一樣啦。

; Function Attrs: nounwind ssp uwtable
define i32 @compare1(i32 %a, i32 %b) #0 {
entry:
%retval = alloca i32, align 4
%a.addr = alloca i32, align 4
%b.addr = alloca i32, align 4
store i32 %a, i32* %a.addr, align 4
store i32 %b, i32* %b.addr, align 4
%0 = load i32, i32* %a.addr, align 4
%1 = load i32, i32* %b.addr, align 4
%cmp = icmp slt i32 %0, %1
br i1 %cmp, label %if.then, label %if.else

if.then: ; preds = %entry
store i32 -1, i32* %retval
br label %return

if.else: ; preds = %entry
%2 = load i32, i32* %a.addr, align 4
%3 = load i32, i32* %b.addr, align 4
%cmp1 = icmp sgt i32 %2, %3
br i1 %cmp1, label %if.then.2, label %if.else.3

if.then.2: ; preds = %if.else
store i32 1, i32* %retval
br label %return

if.else.3: ; preds = %if.else
store i32 0, i32* %retval
br label %return

return: ; preds = %if.else.3, %if.then.2, %if.then
%4 = load i32, i32* %retval
ret i32 %4
}

; Function Attrs: nounwind ssp uwtable
define i32 @compare2(i32 %a, i32 %b) #0 {
entry:
%retval = alloca i32, align 4
%a.addr = alloca i32, align 4
%b.addr = alloca i32, align 4
store i32 %a, i32* %a.addr, align 4
store i32 %b, i32* %b.addr, align 4
%0 = load i32, i32* %a.addr, align 4
%1 = load i32, i32* %b.addr, align 4
%cmp = icmp slt i32 %0, %1
br i1 %cmp, label %if.then, label %if.end

if.then: ; preds = %entry
store i32 -1, i32* %retval
br label %return

if.end: ; preds = %entry
%2 = load i32, i32* %a.addr, align 4
%3 = load i32, i32* %b.addr, align 4
%cmp1 = icmp sgt i32 %2, %3
br i1 %cmp1, label %if.then.2, label %if.end.3

if.then.2: ; preds = %if.end
store i32 1, i32* %retval
br label %return

if.end.3: ; preds = %if.end
store i32 0, i32* %retval
br label %return

return: ; preds = %if.end.3, %if.then.2, %if.then
%4 = load i32, i32* %retval
ret i32 %4
}

; Function Attrs: nounwind readnone ssp uwtable
define i32 @compare1(i32 %a, i32 %b) #0 {
entry:
%cmp = icmp slt i32 %a, %b
%cmp1 = icmp sgt i32 %a, %b
%. = zext i1 %cmp1 to i32
%retval.0 = select i1 %cmp, i32 -1, i32 %.
ret i32 %retval.0
}

; Function Attrs: nounwind readnone ssp uwtable
define i32 @compare2(i32 %a, i32 %b) #0 {
entry:
%cmp = icmp slt i32 %a, %b
%cmp1 = icmp sgt i32 %a, %b
%. = zext i1 %cmp1 to i32
%retval.0 = select i1 %cmp, i32 -1, i32 %.
ret i32 %retval.0
}


cygwin gcc 默認選項 是一樣的

00401190 &<_compare1&>:
401190: 55 push %ebp
401191: 89 e5 mov %esp,%ebp
401193: 8b 45 08 mov 0x8(%ebp),%eax
401196: 3b 45 0c cmp 0xc(%ebp),%eax
401199: 7d 07 jge 4011a2 &<_compare1+0x12&>
40119b: b8 ff ff ff ff mov $0xffffffff,%eax
4011a0: eb 14 jmp 4011b6 &<_compare1+0x26&>
4011a2: 8b 45 08 mov 0x8(%ebp),%eax
4011a5: 3b 45 0c cmp 0xc(%ebp),%eax
4011a8: 7e 07 jle 4011b1 &<_compare1+0x21&>
4011aa: b8 01 00 00 00 mov $0x1,%eax
4011af: eb 05 jmp 4011b6 &<_compare1+0x26&>
4011b1: b8 00 00 00 00 mov $0x0,%eax
4011b6: 5d pop %ebp
4011b7: c3 ret

004011b8 &<_compare2&>:
4011b8: 55 push %ebp
4011b9: 89 e5 mov %esp,%ebp
4011bb: 8b 45 08 mov 0x8(%ebp),%eax
4011be: 3b 45 0c cmp 0xc(%ebp),%eax
4011c1: 7d 07 jge 4011ca &<_compare2+0x12&>
4011c3: b8 ff ff ff ff mov $0xffffffff,%eax
4011c8: eb 14 jmp 4011de &<_compare2+0x26&>
4011ca: 8b 45 08 mov 0x8(%ebp),%eax
4011cd: 3b 45 0c cmp 0xc(%ebp),%eax
4011d0: 7e 07 jle 4011d9 &<_compare2+0x21&>
4011d2: b8 01 00 00 00 mov $0x1,%eax
4011d7: eb 05 jmp 4011de &<_compare2+0x26&>
4011d9: b8 00 00 00 00 mov $0x0,%eax
4011de: 5d pop %ebp
4011df: c3 ret

只是題主為何不直接這樣寫:

int compare(int a, int b)
//update: may overflow,
//for example:when int is 32-bit,
//compare(2147483647,-1) will return negative number(-2147483648)
{
return (a-b);
}


看彙編不就好了?

if...else if...else 並聯

if...if 串聯

再針對同一權重問題時優先選擇並聯,減少不必要的開銷


推薦閱讀:

如何實現 C 語言編譯器?
做編譯器優化開發是一種什麼樣的體驗?
如何看待以及理解Python的這種尾遞歸優化?

TAG:C編程語言 | 源代碼 | 編譯原理 | CC | 編譯器 |