Docker 17.06-1發布，兩個重要的BUG修復

02-12

八月15日（話說Docker官方真會挑日子），Docker CE發布了17.06版本的第一個修復版本。該修復版本似乎並沒有大張旗鼓的宣傳（不過怎麼才算「大張旗鼓」呢？）。但是修復了兩個重要的BUG，值得在此一記。

Fix copy --from conflict with force pull

這個BUG是怎麼回事兒呢？話說17.06這個版本，Docker發布了非常給力的「Multi-Stage Build」新特性，解放了一眾先編譯後打包的docker用戶。我也在第一時間修改了項目中的dockerfile，合併了兩個編譯和打包的dockerfile。然而當我像往常一樣調用CI工具編譯時，卻發現報錯了。

報錯的原因是什麼呢？原來是我的CI工具每次編譯，都會強制docker engine去pull基礎鏡像，以確保基礎鏡像最新。然而在17.06這個版本里，這個選項和「Multi-Stage Build」特性衝突了。

這個版本修復了這個問題。目前據我親測兩者可以友好的一塊兒玩耍了。

Fixed issue with overlay network IP address reuse

相關的PR請參閱：[17.06] vndr libnetwork to bring in fix for overlay network ip reuse。

先說句題外話，Docker真是不靠譜。同一個版本把修改記錄從Github謄到release note，都能謄掉（一看就不是個作弊的好人才）。掉了的剛好就是這個修復overlay網路IP地址重綁定的問題。

這個問題具體是什麼，可以看看Github中PR的描述，內容如下：

libnetwork IPAM recycles the IP address when a task goes down on a node and brought up in another node. For remote tasks overlay network namespace has one static fdb entry programmed by the driver and one dynamic entry learned by the bridge from the data path when a packet is received from the remote container. The dynamic entry ages out after 300 seconds. If a task on a remote node goes down and gets scheduled on a node the dynamic fdb entry still remains. Unless the container generates some data traffic it wont be updated. This can lead to unpredictability in accessing the container; sometimes it will work pretty quickly if there is some traffic from the container and the mac entry gets updated. If the container is completely silent it can lead to upto 300 seconds of traffic loss.

大致就是說，因為設計機制的問題，某些overlay網路中的service task容器長期沒有網路請求，會導致其他節點上的該任務路由表不能及時的更新。甚至當這個任務掛掉的時候，如果新啟動起來的任務剛好在其他未更新節點上時，會導致對該task容器的訪問出現一些不可預料的錯誤。

再簡單的說，就是在overlay網路中的service容器有一定概率無法訪問，甚至串訪。當然這個簡單版本的描述，並不全是這裡的問題所致。由於之前被這個問題搞得整個docker swarm集群幾乎殘廢。之前發現nginx的DNS緩存是其中一個原因，但是後來發現docker本身有更嚴重的問題。所以我會謹慎的確定是否這個問題已經得到了完全的解決。