關於 C++ 浮點數計算誤差問題？

02-17

int main(int argc, char* argv[]) { int a = 68280;
int result = static_cast&(a * 0.3); printf("%d ", result); double result1 = a * 0.3; printf("%f ", result1); float result2 = static_cast&(a * 0.3); printf("%f ", result2); int result3 = static_cast&(68280 * 0.3); printf("%d ", result3);
int result4 = static_cast&(a * 0.3f); printf("%d ", result4); }
gcc4.9.2 (32位) 編譯的程序運行結果是：
20483 20484.000000 20484.000000 20484 20484
為什麼只第一種情況會有誤差？

浮點數0.3的確是會比整數0.3少一點，所以，我們產生彙編文件的話，我們會看到類似這樣的話

.LCPI0_0: .quad 4599075939470750515 # double 0.29999999999999999 .text .globl main .align 16, 0x90 .type main,@function

然而，若我們使用clang或者上面一位知友說的Visual C++是不會產生20483結果的，而是20484。

而且你使用gcc使用-m64編譯這段程序，或者在Mac下面使用-m32編譯，都會出現20484，而不是20483，唯獨在Linux下面使用-m32會出現這樣的結果。

那麼讓我們看gcc -m32的彙編文件，不過我這裡把程序代碼精簡了一下

#include & int main(int argc, char *argv[]) { int a = 68280;


  int result = a * 0.3;

  printf("%d

", result);

}

那麼產生的彙編文件是

main: .LFB0: .cfi_startproc pushl %ebp .cfi_def_cfa_offset 8 .cfi_offset 5, -8 movl %esp, %ebp .cfi_def_cfa_register 5 andl $-16, %esp subl $32, %esp movl $68280, 24(%esp) fildl 24(%esp) fldl .LC0 fmulp %st, %st(1) fnstcw 14(%esp) movzwl 14(%esp), %eax movb $12, %ah movw %ax, 12(%esp) fldcw 12(%esp) fistpl 28(%esp) fldcw 14(%esp) movl 28(%esp), %eax movl %eax, 4(%esp) movl $.LC1, (%esp) call printf

一切都是我們想像的進行，a 從int 轉為 double , 進行相乘等等，所以你要得到20484, 你需要進行round等操作。

那麼，為什麼clang / VC++, gcc -m64, Mac下面的gcc -m32/-m64可以得到20484呢？

讓我們看看它們的彙編文件

main: # @main # BB#0: pushl %ebp movl %esp, %ebp subl $40, %esp movl 12(%ebp), %eax movl 8(%ebp), %ecx leal .L.str, %edx movsd .LCPI0_0, %xmm0 movl %ecx, -4(%ebp) movl %eax, -8(%ebp) movl $68280, -12(%ebp) # imm = 0x10AB8 cvtsi2sdl -12(%ebp), %xmm1 mulsd %xmm0, %xmm1 cvttsd2si %xmm1, %eax movl %eax, -16(%ebp) movl -16(%ebp), %eax movl %edx, (%esp) movl %eax, 4(%esp) calll printf xorl %ecx, %ecx movl %eax, -20(%ebp) # 4-byte Spill movl %ecx, %eax addl $40, %esp popl %ebp

這裡，你可以注意到它們有一條指令是cvttsd2si, 這是SSE2引入的指令，可以參考這裡描述http://x86.renejeschke.de/html/file_module_x86_id_66.html

Converts a double-precision floating-point value in the source operand (second operand) to a signed doubleword integer in the destination operand (first operand). The source operand can be an XMM register or a 64-bit memory location. The destination operand is a general-purpose register. When the source operand is an XMM register, the double-precision floating-point value is contained in the low quadword of the register. When a conversion is inexact, a truncated (round toward zero) result is returned. If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value (80000000H) is returned.

同時可以看這裡的注釋：CS 301 Lecture

Convert to ("2", get it?) Single Integer (si, stored in register like eax) or four DWORDs (dq, stored in xmm register). "cvtt" versions do truncation (round down); "cvt" versions round to nearest.

剩下的就不用解釋了。

那麼，為什麼GCC 32位要這樣做呢？原因就是為了兼容老的架構。而Clang則並未這麼做。Mac下面的Intel晶元則是一開始就有SSE。

那麼，如果在Linux下面，使用gcc -m32 a.c -msse2, 你也會得到20484了，而非20483了。