OPPO Find 7 所用的 Multi-Shot 技術具體是什麼原理或演算法？

12-09

很希望有人能普及一下multi-shot技術原理（就是Oppo find 7攝像頭原1300萬底擴成5000萬像素的），主要是簡單講一下演算法。其實我特別能理解類似Lucky Imaging多幀降噪和HDR多幀高動態範圍合成的原理，但解析度擴充是個什麼情況？大部分文章談到了一個「像素偏移」，這個聽起來貌似有點靠譜，這是個什麼演算法？

步步高公布的一組樣張表現出，5000萬像素的成片在細節表現方面確實比1300萬厲害，這一點是我想不通的。因為1300萬像素的圖像感測器攏共就那些信息量，如何通過多張照片的合成來產生更多的信息量呢？難道是靠演算法猜出來的？

2015.03.30補充一篇ppt
圖像超解析度技術

--------------------------------------------------------------------------------------------------------------------------------------------
本回答僅代表個人觀點，未經嚴謹證實，且有大段英文
在回答這個問題之前，我們先來看一下對岸網站Mobile01對Find7做的測評，拍照部分可以查看原片。拍照有趣閃充犀利！OPPO Find7輕裝版搶先試玩
這篇回答將引用其中樣片
1300萬像素樣張放大

5000萬像素樣張放大

1300萬像素樣張放大

5000萬像素樣張放大

可見，5000萬像素照片比1300萬多出了許多細節。
-----------------------------------------------我是分割線--------------------------------------------------
Find7配備了一顆1300萬像素的攝像頭，要輸出5000萬像素的照片，可能採用的技術有以下幾種：
1.插值
2.移動cmos前濾色鏡
3.某種新的演算法
-----------------------------------------------我是分割線------------------------------------------------------
下面逐一分析每一種的特點和可能性
1.插值技術
通過插值技術生成的高像素照片，每一個多出來的像素都是計算而來的，而通過計算是不可能憑空多出細節來的。舉個例子，某物理像素130w的攝像頭，通過插值生成300萬、500萬像素的照片。圖片來自：一看嚇一跳!插值像素真相大揭秘
130萬像素百分百放大截圖：

300萬像素百分百放大截圖：

500萬像素百分百放大截圖：

請注意看聖誕節的誕字。由於物理解析度的限制，130萬像素無法記錄下誕字的所有細節，在後面的插值照片中，不論是插值到300萬還是500萬，誕字始終是糊的。所以，通過觀察Find7所輸出的5000萬像素照片是否比1300萬像素照片有更多的細節就可以判斷是否使用了計算插值的方法。
2.移動cmos前濾色鏡
CMOS前的濾色鏡是這樣排布的：

目前除過適馬的Foveon X3晶元外，市面上絕大多數的相機感光元件濾色鏡都是這種馬賽克式的濾色鏡排布。假設我們的相機是2000萬像素的，那麼實際的彩色濾鏡是
紅：500萬綠：1000萬藍：500萬
那麼2000萬像素的照片是怎麼輸出的呢？通過計（猜）算（測）！拜耳排布的濾鏡通過計算周邊像素的光線信息來計（猜）算（測）本像素的色彩信息。這種成像方式會造成所拍攝物體邊緣不夠銳利。針對這種缺點，可以通過移動每顆像素的方式，拍攝四張照片，使每個點都得到真實的RGB信息，原理如下圖：

目前，我本人所知道的相機中只有哈蘇採用了這一技術。H4D-200MS物理像素5000萬，通過移動色彩濾鏡，可以拍攝輸出兩億像素的照片。但是！這種拍攝方式必須上腳架，而且單張拍攝時間長達30分鐘！Find7作為手持設備，通過這樣的方式實現高像素輸出是不可能的。Find7無論是官方宣傳還是實際測評，都沒有要求手機必須嚴格固定。所以Find7一定採用了其他的方法。
3.其他方法
Find7輸出5000萬像素照片，需要高速連拍十張之後再合成。這種生成高像素照片的方式和一款名叫Photoacute的軟體是一樣的，實際使用效果也是一樣的！感興趣的朋友可以自行下載此軟體體驗，這裡不再贅述。
Photoacute效果圖

那麼研究一下Photoacute這款軟體的處理方式，我們就能了解Find7採用了什麼樣的技術。這款軟體的官網上給出了這種解析度增強技術（Super-resolution）的QA。
一下內容全部摘自Superresolution FAQ
內容全英文，請讀者自行翻譯。

Q: What is super-resolution?

A: Super-resolution is a technique to enhance the resolution of an imaging system. In this FAQ we will refer to the particular type of super-resolution which can improve resolution of digital imaging systems beyond their sensor and optics limits.

Q: So, is it real?

A: It looks like a science fiction, but there are solid physical concepts behind the process. To be sure, there are limits to what you can achieve with super-resolution processing, which depends on numerous factors (see ""What levels of increased resolution are realistic?" for an in-depth discussion on limits).

Q: Why does it work?

A: For a concise answer on all types of super-resolution, please consult Wikipedia. There you will find a deeper explanation for any particular case of multi-image digital super-resolution: There are two key components in every digital imaging system: the sensor and the lens. There are two different types of image degradation introduced by these two components individually:

Optical blur.
Limit on the highest spatial frequency the given sensor can record.

Optical blur is simply a reduction in amplitude of high spatial-frequency components of the image. It should have been possible to reconstruct a perfect, high-resolution image after optical blur by applying an inverse sharpening. Unfortunately, this is followed by degradations cause by the sensor and simple sharpening is not going to work. The key to super-resolution is the presence of so-called aliased components in the sensor output. These are present due to the fact that the sensor is constructed from a finite number of discrete pixels. These are higher spatial-frequency components than the sensor can handle that should not normally be present in the sensor output. Fortunately, due to imperfect anti-aliasing filters in the imaging system (or the complete lack of them) and due to lower than 100% fill-factor (the percentage of the area that is sensitive to light in each sensor pixel) the aliased components remain in the image. Even the best anti-aliasing filter can only lower these components by some amount but cannot eliminate them completely. Aliased components are typically unwanted in the normal image since they might manifest themselves in a form of moire effect or other unwanted artefacts. Another, photography-specific reason why super-resolution works is that real sensors are composed of Color Filter Arrays (CFAs). CFA can record only a single color at each pixel location. This lowers the upper spatial frequency that can be recorded by the sensor even more. But having multiple, slightly shifted images makes it possible to reconstruct full color at each pixel site.

Q: Aliasing components? Do they really exist?

A: This is a long one. Let us model an "ideal" camera - with ideal lens (no blur, no distortions) and a sensor completely covered by an array of pixels. Every pixel registers a signal proportional to the amount of light it received.

How would such camera image a target of black-and-white lines, if the width of a line were exactly the same as the dimension of a pixel. The image will de quite different in case all the lines fall exactly to the pixels and in case the lines fall between the pixels:

Luckily, real scenes usually do not have exactly the same structure as the sensor has. To make our model more realistic, we will tilt the lines - so if in some part of the picture the edges of the lines match the edges of the pixels in the sensor, they will not match in other parts. This is how the tilted lines will be imaged by our ideal camera:

The contrast between black and white lines differs from 100% of the original contrast to none. Looks strange already, doesn"t it?

What happens if we try to image line pairs of higher frequency? See the pictures below: the lines are visible, but they have different directions, and, moreover, thicker width - that is, lower frequency than in the original!

This is caused by so-called aliasing. The sensor, which is not able to image a pattern of frequency higher than 0.5 cycles/pixel, delivers not only lower contrast but completely wrong pictures. If the scene being imaged has a regular pattern, the artifacts are known as Moiré pattern.

Digital cameras usually have anti-aliasing filters in front of the sensors. Such filters prevent the appearance of aliasing artifacts, simply blurring high-frequency patterns. With the ideal anti-aliasing filter, the patterns shown above would have been imaged as a completely uniform grey field. Fortunately for us, no ideal anti-aliasing filter exists and in a real camera the aliased components are just attenuated to some degree.

Q: How does it work

A: The first step is to accurately align individual low-resolution images with sub-pixel precision.

After the images are aligned, a number of techniques are possible, both iterative and non-iterative, complex or simple, slow or fast. What is common in all of the techniques is that information encapsulated in the aliased components is used to recover spatial frequencies beyond sensor resolution and a de-blurring is used to reverse degradation caused by the optical system.

Of course, the real reconstruction process is much more complex due to the presence of at least the following phenomena:

Sensor noise. The noise itself degrades the image quality, but most importantly it reduces the ability to recover and separate aliased components that are low in amplitude and typically buried under noise floor.
Uncertainty in real registration offsets of individual images. Since the precise camera position and orientation in space is not known during super-resolution processing, it has to be estimated from the low resolution scenes themselves, which introduces errors.
Diffraction limit. It is said that the optical system has fundamental limits on resolution where two close subjects cannot be resolved one from another. There are methods that allow breaking this limit as well under certain assumptions (see Wikipedia).

Q: What levels of increased resolution are realistic?

A: It is highly variable depending on the optical system exposure conditions and what post-processing is applied. As a rule of thumb, you can expect and increase of 2x effective resolution from a real-life average system (see MTF measurements) using our methods. We"ve seen up to a 4x increases in some cases. You can get even higher results under controlled laboratory conditions, but that"s only of theoretical interest.

Q: Any suggestions for more scientific reading?

A: There are lots of good papers available on the internet; here are just two of them to start:

One of the first papers on super-resolution which seemed to inspire some of the modern methods:

Michal Irani and Shmuel Peleg, "Super Resolution From Image Sequences", ICPR, 2:115--120, June 1990.

A paper from Microsoft Research that attempts to estimate the practical limits of super-resolution. The scope of this paper is limited to a particular subclass of linear-only, reconstruction-based super-resolution algorithms. In any case, the obtained bounds do correlate well with the practical results (top limit is ~5x under ideal conditions, ~2x in real life).

Zhouchen Lin and Heung-Yeung Shum, "Fundamental Limits of Reconstruction-Based Superresolution Algorithms under Local Translation"

多圖預警。

先就問題里的「像素偏移」直接回答一下。

像素偏移，是指在對同一景象（scene）進行拍攝時，由器件的物理偏移所產生的，圖像信號在感測器上的偏移。

好的我們來看圖：

上面這張圖，命名為 目標圖，也就是 高解析度圖（high resolution image）。黃色的網格（grid）就是理想狀況下，高解析度圖對圖像的劃分方式，如果想要得到這樣特定的高解析度圖像，網格的每一個交點上都應有一個測量值。其上藍色的點，就是這樣的測量值。有了這些測量值所生成的圖像，就達到了 高解析度圖，即目標圖的要求。

但素！！！事情往往不是這麼簡單！
由於種種原因，我們的器件無法達到上圖網格所劃分的測量精度。（這有很多原因，比如資金不足買不起高精度感測器，或者就是很任性地想要超越感測器的設定極限，想要得到更高解析度的圖像，題中的這部手機就是懷著如此豪邁的想法）

我們的器件得到的測量值往往成這樣：

上面這張圖，我們叫它 測量一。測量一由我們實際的感測器測量得到，由於感測器的最大精度達不到網格所劃分的精度，它所得到的測量值如圖所示，只有1/4的網格交點得到了測量值。

這，絕不是，我們，想要，的。
我們的目標是：黃色網格的焦點都有測量值！

於是，在理想情況下，你又以同一個姿勢和角度拿著手機，對著同一場景，拍了一張照片，然後你得到：

上面這張照片，我們叫它 測量二。我們發現，人的肌肉沒有想像的那麼堅實，即使用了三腳架，拍出來的照片上的測量值已經不在剛才的那些位置上了（比較測量二上綠色的測量值和 測量一上藍色的測量值的位置差異）。這些綠色的測量值仍然保持著和 測量一藍色的測量值同樣的測量精度。

然後，你想，不如，再拍，幾張。
（你就是這麼想了，沒有為什麼，沒有膠捲不要錢，任性）

於是，在理想情況下，你又以同一個姿勢和角度拿著手機，對著同一場景，拍了一張照片，然後你得到：

嗯，我們叫它 測量三。 測量三上深藍色的測量值的位置和測量一，測量二都有不同。

然後，你又拍了一張：

叫它 測量四。 測量四上黑色的測量值的位置和測量一，測量二，測量三都有不同。

然後，對著這四幅照片，我們陷入了深深的沉思。

電光火石之間！
Tada!

我們把 測量一，測量二，測量三，測量四 疊放在一起。然後發現它們完成了對所有黃色網格交點的測量。

於是，我們用一個低解析度感測器相機，對著同一景物拍攝四次：
第二次拍攝較第一次拍攝的位置向右平移一個像素；
第三次拍攝較第一次拍攝的位置向下平移一個像素；
第四次拍攝較第一次拍攝的位置向右和向下各平移一個像素。
將這四幅圖疊放起來，我們就得到了一張高解析度的目標圖。

好了，這就是傳統超級解析度技術（conventional approach）的核心思路：使用多張同一景像的低解析度圖像生成高解析度圖像。

正如題上說的：
「因為1300萬像素的圖像感測器攏共就那些信息量，如何通過多張照片的合成來產生更多的信息量呢？難道是靠演算法猜出來的？」

正因為低解析度感測器至多能測量這麼多信息，為了得到更多的信息，只好多測量幾次了。

可是！！如何平移一個像素呢？！

如果我們拿著手機去照，毫無疑問得到的會是這樣。就這個例子而言，四次測量值混雜在一起，每次測量的解析度不變，但是不是整齊得排列在網格之上。

在這種情況下，如果不考慮模糊，雜訊和幾何變換，可以使用圖像插值法來估算黃色網格交點處的數值。

插值法有很多種類，就不展開了。

再回到題上，這部手機使用多重拍攝的技術，結合「像素偏移」（這個名字還挺貼切），得到解析度倍於其圖像感測器物理極限的數字影像。

－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－
下面。。。。請允許我。。。借這個機會把 超級解析度技術做個分類。[分錯了不要怪我]

1 傳統方法：使用同一景像多重低解析度圖像，根據已知或測量得到的觀測模型進行圖像重構。簡單的說，就是多次測量後進行重構。但重構過程不僅是上面例子那樣簡單，要考慮圖像是否被模糊，是否添加了雜訊，是否進行了幾何變換。

2 圖像插值：簡單地擴充圖像，在word里放大縮小圖像就是簡單的插值。

3 基於機器學習：基於單張低解析度圖像，結合預先得到的字典 (dictionary) 進行高解析度圖像重構。此方向一直處於研究活躍階段。

－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－
下面說一些正式的定義。

自己總結來的，如果有誤，請一定指正，多謝～

超級解析度 （super resolution）是一種圖像重建技術。這項技術可以彌補感測器解析度的不足。簡單的說，就是「使用低解析度的感測器所接收到的圖像信號，經過一系列演算法，得到高解析度的圖像信號」。（下文稱「這一處理過程」）

上面說的「一系列演算法」，就是super resolution所著重研究的對象。

想要了解這「一系列演算法」，首要要明確「這一處理過程」的數學模型。

Y = DHF*X + V

Y 低解析度圖像，即測量值
D 降採樣操作，(downsample)
H 模糊操作，(optical blur)
F 幾何變換操作，(geometric transformation)
X 高解析度圖像，目標值
V 雜訊

這「一系列演算法」就是為了求解上面這個方程。
已知：Y，D(測量值與目標圖像之間的比值)
目標：X

其中的D, H, F, V，在有些應用中是已知的，在有些應用中是未知的。
H：
已知：根據物理器件的特性得知光學模糊的模型，某些器件出廠時會附帶模糊模型。
未知：可由點分布模型計算模糊模型

F：
已知：器件對於景物的幾何關係已知，如宇宙望遠鏡對於星體的相對關係模型，無論是旋轉還是平移，或是透視。
未知：可由圖像註冊演算法估測（此處與計算機視覺技術交叉）

V:
已知：已知所疊加在觀測值上的雜訊模型。如測量信號收到特定頻率的雜訊干擾。
未知：可由圖像處理降噪技術進行測量。
關於雜訊：一般的超級解析度技術會默認所得到的測量值無雜訊，即忽略V項，或者對有雜訊的低解析度圖像先進行降噪處理，然後再進行超級解析度處理。這一點缺陷在新興的多種第三類超級解析度技術中得到了解決。
－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－
忽然想到！
電影裡面警察叔叔對著監視器的圖像，不停地進行放大，然後那些像素點就很神奇的漸漸變清晰了。然後周圍的小夥伴：「叔叔！就是他！我認識他！！」

這個例子生動形象地告訴我們超級解析度技術有多重要（並沒有。。

這個例子在第三類技術中得到了很好體現，放張圖：

[Image Super-Resolution Via Sparse Representation, Jianchao Yang ; Beckman Inst., Univ. of Illinois Urbana-Champaign, Urbana, IL, USA ; Wright, J. ; Huang, T.S. ; Yi Ma, Image Processing, IEEE Transactions on (Volume:19 , Issue: 11 ),2010 ]

左邊是低解析度的輸入圖像，單張。右邊是使用了第三類超級解析度技術得到的輸出。

嘖嘖稱奇（嘖嘖嘖嘖嘖嘖。。。。。

－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－
［轉載請私信］
［浪了這麼久］
［第一次答］
［久久不能平復內心］
［么～］
［圖片來自教學課件 Image Super-Resolution @ PSU］

我對這項技術也挺感興趣，覺得這是真正適合手機影像系統的一個發展方向。我在 [科技美學]oppo find7視頻測評當中對這項技術的真實性和實用性進行了實驗。

視頻中的原話：「經過實驗我們認為OPPO並沒有採用，我們之前猜測的：使用10張同樣像素的照片合成而來，事實上它所採用的拍攝過程是第一張照片全像素再加上後續多張照片的像素補充，最終來合成完整的照片。這才是5000萬像素的秘密。」

視頻原文比較長，這裡不再重複。總之，我認為這項依託於後期處理的影像技術大有發展。

我的愚見是：
1.當我認識到find7的多張合成技術之後，不久我就發現奧林巴斯的數碼相機也有類似的技術，不禁讓我發問究竟是相機界影響手機界還是手機界反作用於相機界？
2.不久我又在一本關於單反相機的書中看到類似的技術，其實就是很多年以前，單反相機拍攝夜景的時候那種演算法，連續告訴拍攝5張照片，之後通過ISP高速運算，通過合成每一張照片中最佳的部分，最終輸出一張最有方案。
3.這種照片的原理有點像HDR合成演算法，只不過HDR是人為地調整三種不同的曝光值之後再合成，作用都是選取每一張照片中最符合用戶需求的部分進行合成。
4.而oppo的做法正如科技美學的那岩老師所說，是通過第一張相片採用全像素，後面9張照片採用每一部分比最初的相片要好的部分進行填充，正如視頻中的實驗所得，如果抖動幅度過大，後面9張照片都會作廢，而不會像HDR合成那樣必須取某一部分。
5.其實5000萬像素的秘密就是能否把原本是1300萬像素的照片完全還原出來，邊緣解析力很重要，所以後面9張照片有大部分時間都應該是在還原鏡頭邊緣細節丟失比較嚴重的部分。也就是做到蘋果那種真正的800萬像素值。

確實是靠演算法算出來的。

至少和像素偏移無關係。

最近我在把玩奧林巴斯em5 II，他是典型的像素偏移多張合成4000w像素。

但是無論你快門速度再快，從拍完到合成結束至少需要5秒，而且被拍對象稍微一動就前功盡棄。

oppo那個5000w，其實也就是一個玩。對細節提升最多也就是個百分之一二十的樣子。

看一下szelinski cv書 10.3節？這個回答已經非常詳細了吧！