AnandTech對Gpu Turbo技術的解析(第三篇)

AnandTech對Gpu Turbo技術的解析(第三篇)

25 人贊了文章

點擊鏈接查看第二篇:AnandTech對Gpu Turbo技術的解析(第二篇)

原鏈接:Huawei』s GPU Turbo: Valid Technology with Overzealous Marketing

原標題:The Detailed Explanation of GPU Turbo

原標題翻譯:對於Gpu Turbo的具體解釋

Under the hood, Huawei uses TensorFlow neural network models that are pre-trained by the company on a title-by-title basis. By examining the title in detail, over many thousands of hours (real or simulated), the neural network can build its own internal model of how the game runs and its power/performance requirements. The end result can be put into one dense sentence:

華為使用了TensorFlow神經網路模型,逐一針對各種不同的場景進行訓練。經過數千小時的實機或模擬運行,神經網路可以建立一個模型,以了解遊戲在不同時段對功耗和性能的需求。最終的結論可以總結為一句話:

Optimized Per-Device Per-Game DVFS Control using Neural Networks

利用神經網路針對每個遊戲在硬體上進行DVFS(動態頻率調整)。

In the training phase, the network analyzes and adjusts the SoC』s DVFS parameters in order to achieve the best possible performance while minimizing power consumption. This entails trying its best to hit the nearest DVFS states on the CPUs, GPU, and memory controllers that still allow for hitting 60fps, yet without going to any higher state than is necessary (in other words, minimizing performance headroom). The end result is that for every unit of work that the CPU/GPU/DRAM has to do or manage, the corresponding hardware block has the perfectly optimized amount of power needed. This has a knock-on effect for both performance and power consumption, but mostly in the latter.

在訓練階段,神經網路分析和調整SoC的DVSF參數以便實現性能和功耗的最佳平衡,以便讓遊戲幀數達到60fps的同時不浪費任何多餘的能耗。

The resulting model is then included in the firmware for devices that support GPU Turbo. Each title has a specific network model for each smartphone, as the workload varies with the title and the resources available vary with the phone model. As far as we understand the technology, on the device itself there appears to be an interception layer between the application and GPU driver which monitors render calls. These serve as inputs to the neural network model. Because the network model was trained to output the DVFS settings that would be most optimal for a given scene, the GPU Turbo mechanism can apply this immediately to the hardware and adjust the DVFS accordingly.

訓練所得到的模型內置在支持Gpu Turbo的系統固件上。每種手機的每個遊戲都會使用不同的神經網路模型,以便動態的調整頻率。就我們所了解的而言,系統在程序和GPU驅動程序之間增添了一個監聽層,用於監視性能調用,這些作為神經網路模型的輸入,而該網路可以輸出對於該場景下的最佳調頻策略,因此Gpu Turbo可以對各種不同的情況做出快速的反應。

For SoCs that have them, the inferencing (execution) of the network model is accelerated by the SoC』s own NPU. Where GPU Turbo is introduced in SoCs that don』t sport an NPU, a CPU software fall-back is used. This allows for extremely fast prediction. One thing that I do have to wonder is just how much rendering latency this induces, however it can』t be that much and Huawei says they focus a lot on this area of the implementation. Huawei confirmed that these models are all 16-bit floating point (FP16), which means that for future devices like the Kirin 980, further optimization might occur through using INT8 models based on the new NPU support.

對於擁有NPU的Soc來說,NPU加速了神經網路模型的運行。而沒有NPU的SoC中使用軟體反饋機制,這允許系統進行非常快的預測。我們必須要考慮這樣做所帶來的延遲,華為表示他們非常關注這一問題,並表示這些模型都是基於16位浮點數的,這意味著對於980這樣的設備來說,在新的NPU加持之下,可以使用新的INT8模型。

Essentially, because GPU Turbo is in effect a DVFS mechanism that works in conjunction with the rendering pipeline and with a much finer granularity, it』s able to predict the hardware requirements for the coming frame and adjust accordingly. This is how GPU Turbo in particular is able to make claims of much reduced performance jitter versus more conventional "reactive" DVFS drivers, which just monitor GPU utilization rate via hardware counters and adapt after-the-fact.

本質上,Gpu Turbo是一種動態調頻策略,它與流處理器一同工作,並且粒度更高,所以它能夠預測下一幀的硬體需求並及進行相應的調整。這讓Gpu Turbo相對傳統的被動動態調頻驅動來說,大幅降低了性能抖動。

Thoughts After A More Detailed Explanation

我們在深入解釋之後的思考

What Huawei has done here is certainly an interesting approach with the clear potential for real-world benefits. We can see how distributing resources optimally across available hardware within a limited power budget will help the performance, the efficiency, and the power consumption, all of which is already a careful balancing act in smartphones. So the detailed explanation makes a lot of technical sense, and we have no issues with this at all. It』s a very impressive feat that could have ramifications in a much wider technology space, eventually including PCs.

華為使用了一個很有趣的方法,這種方法有著明顯的潛在現實意義,可以在有限的功耗內將資源優化地分配到可用硬體上,這將有助於提升能耗比。因此我們認為這是一個令人影響深刻的壯舉,可以拓展到諸如PC一類的更多硬體上。

The downside to the technology is the per-device & per-game nature of it. Huawei did not go into detail about long it took to train a single game: the first version of GPU Turbo supports PUBG and a Chinese game called Mobile Legends: Bang Bang. The second version, coming with the Mate 20, includes NBA 2K18, Rules of Survival, Arena of Valor, and Vainglory.

這項技術的缺點是,他需要針對每個設備和每個遊戲進行訓練。華為沒有告訴我們訓練一款遊戲需要多久。

Technically the granularity is per-SoC rather than per-device, although different devices will have different limits in thermal performance or memory performance. But it is obvious that while Huawei is very proud of the technology, it is a slow per-game roll out. There is no silver bullet here – while an ideal goal would be a single optimized network to deal with every game in the market, we have to rely on default mechanisms to get the job done.

從技術上來講,可以針對Soc而非針對設備進行優化,儘管每個設備的熱性能和存儲器性能不同。但顯而易見的是,雖然華為對這項技術非常有信心,但是它們對遊戲和設備的適配的進度比較緩慢。這裡不存在黑科技,雖然理想的情況是可以用單一的網路適應所有的設備和遊戲,但目前為止我們必須依靠現在這種方式。

Huawei is going after its core gaming market first with GPU Turbo, which means plenty of Battle Royale and MOBA action, like PUBG and Arena of Valor, as well as tie-ins with companies like EA/Tencent for NBA 2K18. I suspect on the back of this realization, some companies will want to get in contact with Huawei to add their title to the list of games to be optimized. Our only request is that you also include tools so we can benchmark the game and output frame-time data, please!

華為將繼續推廣Gpu Turbo。而我們唯一的要求是,允許我們運行測試軟體!

On the next page, we go into our analysis on GPU Turbo with devices on hand. We also come across an issue with how Arm』s Mali GPU (used in Huawei Kirin SoCs) renders games differently to Huawei』s competitor devices.

下一章我們會通過具體設備上的表現來分析Gpu Turbo。同時我們還遇到了一個問題,Arm的Mali Gpu和競爭對手的Gpu渲染出的畫質似乎並不相同。

點擊鏈接查看第四篇:AnandTech對Gpu Turbo技術的解析(第四篇)


推薦閱讀:

【深度學習之美01】什麼是(機器/深度)學習?
中國機械簡史
看圖聊天的騷操作,MIT開發精準到單詞的語音-圖像配對系統
【大放送】18頁PPT|從數據分析師到機器學習工程師進階之路
以賽引才 之江實驗室吹響「人工智慧」大賽集結號

TAG:圖形處理器GPU | 華為 | 人工智慧 |