如何理解壓縮感知(compressive sensing)?

比如在圖像復原應用中。壓縮感知的意思是不是,用隨機pattern採樣某一圖像,然後再通過演算法恢復原始圖?對隨機pattern有什麼要求么? 還有可不可以多次隨機採樣該圖,然後利用類似multi-frame blind deconvolution的方法復原呢?

希望有大神能給科普下~


我不是搞 compressed sensing 的,但是這學期在數學系旁聽相關討論班,拋磚引玉一下。由於以上原因我很難用非常簡單的語言解釋,所以我假設題主和閱讀者學習過 Fourier / wavelet analysis 以及基本的 L1 recovery 等內容。如果有能讓一般人也能理解的答案我也非常歡迎!

=====================================

很多時候一個信號太複雜太「長」,我們想通過 sampling 取得部分信息,然後根據這部分信息最大限度的完美重建信號。數學化這個問題,就是給一個 sampling matrix 作用在未知信號上,然後根據採樣後結果來恢複信號。compressed sensing 尤其是針對稀疏信號(更多是在變換後稀疏的信號上),使用 L1 recovery 來重建信號。這裡有幾個問題需要回答:樣本數量(也就是 matrix 行數)至少要多少?matrix 需要怎樣的性質?我們如何更好的 design 這些 matrix?

最開始一個 general 的結論是藉助了 random matrix 的:

1. sub-gaussian matrix 滿足 RIP (可以理解成空間映射能基本保持稀疏向量長度,「伸縮度」很小);

2. 如果我要找一個 m 行 N 列的滿足 RIP 的 matrix 用來做 L1 recovery 並且想以高概率恢復任意稀疏信號,那麼 m 大概是 O(u NlogN) 量級的。當然我省去了許多限定條件和常數。

你可能注意到了在以上描述中的一些重要信息:

一個是「任意稀疏信號」,在實踐中許多信號是沒有實際意義的,我們不需要這麼強的 matrix 給我們保證任意信號的 recovery。

另一個是量級描述中的 u 因素 (在學術paper 里寫作 mu 為了方便我換了標記),它稱為 incoherence 用來描述矩陣的相關程度。簡單來說,一個矩陣 incoherence 越大,你需要採樣的數量 m 越大。問題是,一些 sub-gaussian matrix 的 incoherence 係數是1——等於說我們需要比全信號長度還大的樣本量,這自然不現實。

最後一點,有很多現實生活中我們感興趣的信號(圖片,視頻等)看上去並不是稀疏的。

工程應用中怎麼解決這些問題呢?

首先是替換表示空間。圖像在 wavelet/curvelet/shearlet 等表示下通常是稀疏的。如果我們能重建這些稀疏的 wavelet coefficient,那麼最後恢復圖片的時候只要做反變換就好了。當然工程中(尤其是 MRI)你能看到的 raw signal 已經經過了 Fourier transform ,那你可以理解整個 sampling matrix 成真正的 row sampling matrix 右乘 Fourier transform matrix 再右乘 wavelet 反變換即可。

為什麼選擇 wavelet ? 因為你看到的圖片是有很強的 structure information 的。如果你忽略了這些信息隨便 sample ,你需要的樣本數量當然也很多。講數學的話這體現在 incoherence,如果你能發現我們最開始的結論中 incoherence 是度量在整個 sampling matrix 上的話。為了解決這個問題,我們在想:能不能根據 structure 信息來訂製我們的 sampling 策略呢?

答案是能,就隱藏在 wavelet coefficient 的性質裡面,學術上我們稱為 multi-level sampling。當然這不是你說的多次 sampling,實際上一次就夠了。

通過對大量圖片的統計,我們發現 wavelet coefficient 是 asymptotic sparse 的:在最初的部分很密集,越到後面越稀疏!換句話說,你可以把 wavelets coefficient 切成若干個 level,然後你會發現每段 vector 越來越稀疏。因此你可以 design 一個 block diagonal transform matrix 對應這些 level,然後在最後 sampling —— 左乘真正的 row sampling matrix ——的時候針對每個 level 來設計你的策略。這樣從數學上來說你的 incoherence 度量是一一放在每個 block 上的,然後原來的定理推廣後一樣適用在每個 block 上(當然換了一些常數)。簡單總結這樣的策略:你會把大量的 sample 放在前面密集的部分,後面因為越來越稀疏,你需要的樣本就越來越少。和同樣數量但是隨便 sample 的策略相比,因為我們有針對性的挑選了最重要的部分,自然恢復效果要好很多。

如果你能大致理解了以上內容,那麼你會發現其實 random matrix 並不是那麼的重要。random matrix 是為了保證任意稀疏信號 —— 或者任意在各 level 內稀疏的信號 —— 的重建,實際上我們並不需要這麼強的條件。打個比方,如果你知道了各個 level 的信息,那麼即使你在各個 level 內等距 sampling,在大部分圖片上也能有不錯的效果。當然針對每張圖片都會有一個最優的 sampling 策略,但是在我們不知道該圖片信息的情況下,我們只能通過對以往圖片的大量統計信息來確定一個次優的方案。

總而言之,compressed sensing 是一個設計壓縮演算法的過程。雖然它的理論是基於 random matrix 的,但是工程中我們一般都是用 deterministic 的演算法(比如我提到的 MRI, 他們的 sampling pattern 通常是多條射線狀)。至於為什麼很多理論是基於 random matrix 的,至少我覺得一開始作general 的結論是為了能夠找到一個 滿足 RIP 的,而且還要能比較方便證明它滿足 RIP 的 matrix。另外稀疏恢復也是很有意思的研究方向,L1 做的比較多了,現在也有一些其他的研究,比如基於 total variance 的。

========================

希望答案對你有幫助。


為了減少一個知乎三零用戶的數量,我來答答這個問題吧。233!!!

首先,我們必須要認識到這一點,即CS(Compressed Sensing)中的compressed不同於傳統資訊理論和率失真意義上的compression。在CS中,"compressed"一詞更加準確的描述是一個降維採樣的過程,而不是在信源編碼意義上的「compression」。在CS中,我們是沒有關於原始信號像素域的任何信息,僅僅只有觀測域信息的,而這一點是常常被人忽略的。

其次,我們要明白CS理論的意義何在,往往有很多人忽略了這一理論的實際意義。CS最重要的意義在於其可以極大地減輕信號採集端的複雜度,在信號採集端低的採樣率的情況下,在信號接收端能夠以比較大的概率重構出原始信號。CS的應用比如,在醫學圖像上,由於CS理論,我們可以減少對於病人傷害部位的觀測,從而減少檢查(如CT,X光)對人體產生的副作用——輻射;再比如,如果信號採集設備地處偏遠、環境惡劣,那麼這個設備的能源供應系統就成了問題,而我們可以通過降低信號採集端設備的複雜度,從而減少採集端設備的能源消耗,如此CS理論就有很現實的意義了。

壓縮感知的幾個看似稀鬆平常,但是很關鍵的理論基礎如下:

  • 壓縮感知最初提出時,是針對稀疏信號x,給出觀測模型y=Φ*x時,要有怎麼樣的Φ,通過什麼樣的方式可以從y中恢復出x。(PS:稀疏信號,是指在這個信號x中非零元素的個數遠小於其中零元素的個數。)

  • 然而,很多信號本身並非稀疏的,比如圖像信號。此時可以通過正交變換Ψ』,將信號投影到另外一個空間,而在這個空間中,信號a=Ψ"*x(analysis model)變得稀疏了。然後我們可以由模型y=Φ*a,即y=Φ*Ψ"*x,來恢復原始信號x。
  • 後來,人們發現不僅僅能夠通過正交變換,得到稀疏的信號;還可以通過一個字典D,得到稀疏信號x=D*a(synthesis model),a是稀疏的,為了增強變換後信號的稀疏性,通常D是過完備的。即模型y=Φ*x=Φ*D*a,此時記A^{CS}=Φ*D,即為感知矩陣。這個模型,是我們現在最常用的。

當D可逆的時候,後面兩個模型等價。(常用的DCT變換,小波變換,雙樹小波變換,均滿足此條件。)

第一個模型,只需考慮Φ的RIP;模型二,由於我們通常不會再採樣端對信號進行任何的稀疏變換,所以通常只是在理論上考慮;模型三,不僅要求Φ的RIP,而且要求Φ和D極度不相干。

壓縮感知的重構演算法目前有以下幾類:

  1. BP和BPDN模型,利用線性和二次圓錐的方法高效求解,但是這樣的凸規劃問題在高維度的圖像和視頻中演算法複雜度高;改進的基於gradient-descent methods來求解BPDN的方法在實際應用中比interior-point法快速,基於梯度的方法有iterative splitting and thresholding(IST),sparse reconstruction via separable approximation (SpaRSA),和gradient projection for sparse reconstruction (GPSR);

  2. 貪婪演算法,包括,matching
    pursuits,orthogonal matching
    pursuits (OMP) ,和compressive sampling
    matching pursuit (CoSaMP),在實際中,這些演算法能夠很好的較少演算法複雜度,但是這是以降低重構圖像質量為代價的;

  3. 迭代閾值(Iterative-thresholding)法是貪婪演算法的一種替代演算法,迭代閾值演算法通過連續投影和取閾值來產生 。這個過程是PL(Projected Landweber)演算法的一個特例,它不僅能夠降低演算法複雜度,而且還可以結合一些額外的優化準則,如Wiener濾波,來提高圖像恢復質量;

  4. 基於圖像結構的 bayesian演算法,對這個重構演算法的了解相當少。。。

壓縮感知的一般模型是醬子的

因為在壓縮感知中,涉及圖像的應用比較多,這裡以X是圖像信號來說明。

由於圖像信號數據量非常大,所以在處理時,通常按列或者按塊採樣、重構。上圖模型是以列為前提,來說明的。

  1. 採用光柵得到圖像信號的一列;
  2. 利用觀測矩陣,獲得觀測域信號;
  3. 傳輸到達重構端;
  4. 通過重構演算法,恢復稀疏信號;
  5. 將恢復的稀疏信號光柵反投射,放到相應的位置;
  6. 待所有列恢復後,做稀疏反變換,得到原始信號。

Rice大學單像素相機是一個典型的應用,原理結構是這樣子的

  1. 圖像經透鏡投影到DMD;
  2. DMD上所有反射鏡處於偽隨機狀態1,它們的狀態構成觀測矩陣的第一行;
  3. 經反射後的信號在單點感測器上重合,即產生相加效應,則本次觀測得到觀測值的第一行;
  4. 變換DMD反射鏡狀態,重複上面步驟M次,即構成觀測矩陣Φ,等價於y=Φ*x;
  5. 圖像重構端就是純數學的問題了,利用上面提到的重構演算法可以恢復原始圖像。

多次隨機採樣?感覺沒有這個必要。首先,每次隨機採樣能夠重建圖像的質量是一定的;其次,多次隨機採樣會增加採樣端複雜度;再者,多次採樣現實中你如何實現呢?

PS:我對multi-frame blind deconvolution不了解。

但是,我們可以在重構端對信號採用多假設,重疊分塊重構的演算法進行重構,這一演算法可以很明顯的改善圖像恢復的質量。

以上。


以上答案或者過多著墨於細節,或者不盡準確,下面我來簡單科普一下,儘管我並不研究這一領域,但是我有不少同事研究這個方向,也聽過很多的報告,相信我的解釋還是比較靠譜的

壓縮感知基於一個大前提,就是對於需要恢復的信號,存在一種稀疏表達,我們甚至可以假定,它是所有可能的表達中最稀疏的一種

在此前提下,我們可以用低於一般情況下理論值的方式來測量數據,這點具體表現為求解Ax=b, 其中A的列數多於甚至遠多於行數

為了找到唯一的x, 利用大前提,對所有可能的x作l_0 minization。這本身是NP Hard的問題,但有人觀察到,很多情況下 l_1 minimization的結果 與 l_0 minimization一致,而前者至少有一些方法可以計算

Terence Tao刻划了何時這兩者是一致的,比如當A具有RIP或者NSP性質。不過這些性質本身也不可或難以驗證,因此目前的方法都是構造一個隨機矩陣,然後證明只要行數一定多,就有很大的概率有以上性質。如何有效地構造這樣的矩陣也是壓縮感知的核心問題

至於它的優點,主要是兩條,一是可以(大幅)減少測量,二是可以得到比l2意義上的表達更精確的解,反映在圖像或信號上就是能夠更加清晰


我不是搞圖像的,但是還是想聊聊我自己對compressed sensing的理解。根據上Donoho的課的理解,compressed sensing是想解決一個如何找到一個underdetermined system的解。假設,我們有一個線性系統 y=Ax, y in R^n, x in R^N, A in R^{n 	imes N}我們說這個系統是underdetermined是因為 n < N. 也就是說給定一個y,有很多個x 可以讓這個系統成立。compressed sensing的主要目的是在給定一定的條件下,x是可以uniquely solved。有三個最基本的假設 (Fundamental Theorem of Sparse Solution): 1. A 的每一列是linearly independent 2. x 只有k個元素是非零 3. n geq 2k+1. 如果滿足這3個假設,那麼我們可以通過解決L_0 norm 的方法來找到唯一解, i.e. min |x|_0 quad s.t.  quad y = AX.

上面講的只是最初的和最基本的compressed sensing。現在compressed sensing以後衍生出了很多不同的分支,也有不同的solver和不同的constraints。還有人關注在不同sparsity下解的不同。同時,在應用上,也從過去的image和signal衍生到了很多其他領域,例如network等等。

compressed sensing有一個很經典的應用是MRI。在測量MRI的時候,有時可能需要測量很久,因為要不停地scan。因此,有人提出是否可以通過少量的測量來還原出整個MRI 的圖像。這個應用在統計上已經有了很多不錯的結果,但是在醫學院,很多人還是很抗拒這個技術。很多醫生的論點是compressed sensing還原出來的圖像可能會有誤差。那麼假如一個病人本來有腦瘤但是被誤診的怎麼辦。或者一個病人本來沒有病,但是因為錯誤而呈現出了陰影怎麼辦。這個討論對於我最大的感觸是作為一名工程師,不應該僅僅關注到技術的本事,同時也應該考慮在技術推廣過程中對其他群體的影響以及他們的顧慮(好像跑題了。。。)


第一次寫長文。

聲明:轉載請註明出處!!!!!(說得好像有人會轉載一樣╮(╯3╰)╭)

上面很多答案在CS的理論方面好像說得有點問題,特別是有關非相關測量的。

先說說理論性的一點東西。壓縮感知(Compressed Sensing, CS)裡面主要有兩大原則,稀疏表達以及非相關測量

稀疏表達比較簡單,其實很多上面答案都提到了,想法如果這個信號是稀疏的,意味著這個信號裡面很多數值是0,這樣我們就可以把這些數值丟掉,直接記錄非零值。也就是說壓縮感知裡面的稀疏表達要求目標信號必須是可壓縮的(compressible)。如果信號在時域或者空間域不是稀疏怎麼辦?可以試一下用DFT,DWT等轉換。

稀疏表達當然很好,很多現在主流的壓縮方法的思想都是用各種轉換來稀疏表達信號然後記錄非零值以及它們的位置,這樣別人想恢復的時候那就很簡單啦 。但是這裡有一個很大的問題,之前提到的壓縮方法是以知道完整原信號為前提的,在CS裡面我們不知道啊,那我們怎麼知道哪些位置上的數值是非零呢?當然想法就是我們可以採樣啊,採樣了不就知道了嗎?好吧,先上一個圖

這是一個正弦信號的DFT模長圖,總長度是256,可以看到,256個數裡面只有2個是非零的,我們希望在不知道原信號的前提下想把這兩個數值以及它們的位置採集回來,我們要怎麼做到呢?答案是..........................................................................................................做!!不!!到!!如果我們想不做任何處理直接在這個稀疏基上採樣的話,必然會採集的很多的0,這是我們不願意看到的,這意味著我們這個採集沒有採到任何信息,我們也沒辦法根據這些0恢複信號。這要怎麼辦呢?這個時候非相關測量就登場了。

非相關測量指的是我們用來採集信號的採樣矩陣phi和用來稀疏表達信號的基psi不相關,也就是phi在psi上不可以稀疏表達目標信號。這樣的話phi就會將上面兩個非零分量分散到稀疏基上面,這樣我們在採樣的時候拿到的任何的一個數值都含有部分的信號的信息,然後我們就有可能根據這一堆的數字找出符合的稀疏表達,進而重建信號。然後人們發現一個隨機產生的矩陣,比如高斯矩陣,伯努利矩陣等和一些已知常用的基是有很大的幾率不相關的,所以我們才說在CS裡面用隨機採樣。

-----------------------------------------------------------------------------------------------------------------------------------------

說完了理論性的東西(也不是很理論嘛,一條公式都沒寫~~), 就說一下在實際運用中怎麼做非相關測量的。我們在Matlab模擬的時候可以建立一個隨機矩陣然後矩陣和信號相乘就完成了一個採樣了,但是在實際工程中就做不到這樣了,我們不可能先建立一個矩陣然後把完整的信號才回來再做乘法吧?都已經把完整的信號才回來了還用CS幹嘛。。所以在實際中,那個採樣矩陣對應的更多是一個模擬而不是數字,或者說至少不是全數字的採樣過程。這裡也是舉一個上面已經有人提到過的單像素相機(single pixel camera)。

現在的數碼相機是用一個像素感測器矩陣,當圖片照到這個矩陣的時候,電路會記錄各個像素點的電壓值,然後量化插值,最後轉換成顏色空間(RGB, HSI等)存儲。但是這裡的單像素相機就沒有了這個感測器矩陣,圖片會被投射到這個DMD矩陣上。這個DMD是一個有很多個微型反光鏡組成的矩陣,每個反光鏡則代表一個通常意義的像素。我們先建立一個伯努利矩陣作為採樣矩陣,是由0和1組成的,然後這個採樣矩陣會控制這個DMD,這樣有些光就會被反射到那個光電感測器上,有一些就沒有。也就是說那個感測器採集到的信號就是有不同像素點合成的信號,接下來用ADC進行量化,一次測量就完成了,然後我們重複這一過程,直到達到一定數量的測量數據。那麼這整一個過程就相當於我們模擬的時候做的矩陣相乘運算了,先是兩個數相乘然後相加,最後得到一組測量數據。

接下來就是怎麼重建信號了,有很多演算法,BP,OMP及其變種,IHT,MMSE等等,上面有人提到過的,就不說了。

就這樣吧。


來來來,他們講的都太過專業了呢,我來舉個例子,包你懂,當然不懂我也不退錢~

我碰巧做過關於壓縮感知的演講,我為了盡量淺顯易懂,我想了一個比較合適的類比。這個回答我就盡量不寫什麼專有名詞啊,嚴謹的公式推導了(寫了可能既不容易看明白,而且我寫的肯定也不如劍橋大學的Compressed Sensing Theory and Applications 那本書寫得好??????),我的這個回答就起一個促進大家理解的作用吧。

------------------------------------------------------------------------------------------------

你看「壓縮感知」這個辭彙有「壓縮」這兩個字,那麼肯定和壓縮有點關係,那麼壓縮感知和傳統的壓縮有啥子區別呢?

首先我們來看一堆誘人的檸檬~

如果你今天特別想喝檸檬汁了(不是雪碧哈~),那麼大體上有兩種方法可以幫你達到這個目標:那麼大體上有兩種方法可以幫你達到這個目標:

第一種,就是直接買鮮榨的檸檬汁帶回家喝~ 我把這起個專業點的詞兒-「前端壓縮」

第二種,就是買回檸檬回家自己榨汁~ 我把這也起個專業點的詞兒-「後端壓縮」

這兩者其實呢就對應了壓縮感知理論和傳統壓縮,那麼我們來更專業向的說一下兩者不同:

  1. 壓縮感知理論是預先知道自己需要什麼,不需要什麼,並且知道相應的規律,採集手法,直接採取「最有效信息」(檸檬汁)。
  2. 而傳統壓縮理論就是不知道自己需要什麼,先盡量全尺寸高解析度接收(買檸檬回家),然後看看有什麼規律,然後根據這個規律進行壓縮數據(榨汁機榨),達到很好的近似效果並且縮小存儲空間(檸檬汁)。

為了保持嚴肅性,我再進一步給出一個更專(zhuang)業(bi)的在信號領域的解釋,為了輔助說明,我在括弧里標註了水果以便理解~

  • 只要信號在某個變換域是稀疏的(在我當前的價值觀里,眼裡只有檸檬汁,其他都是0)

  • 就可以用一個與變換基無關的測量矩陣將稀疏的高維變換域信號投影到低維空間(檸檬在我眼裡就是一袋生物包裝的檸檬汁)

  • 然後可以通過優化求解從低維空間以高概率重構出原信號(我喝著檸檬汁都猜的到檸檬原來啥樣,因為除了果汁部分其它方面都差不多)。

  • 極大地降低了存儲空間和計算複雜度(還用說嗎?買檸檬回回家榨汁肯定比直接買檸檬汁累啊)

其實想想,很多抽象的,難理解的概念在生活中都能找到相應的簡單模型呢~~~

???????


簡單看過幾個視頻,說說體會。

1:稀疏信號假定,既一個能量有限信號即L2信號,總可以被一組基展開,如果大多數係數都為零,那麼信號可以被稀疏表示,即壓縮了。參見小波變換。

2:觀測矩陣:y=A·M·X。注意,a是觀測矩陣,m是空間的基,x是信號的稀疏表示。假定x是1000個元素的數組,因為稀疏,大多都是0。而我的觀測輸出y只有20個數。設計一個好的觀測矩陣a,讓我們用20個y來恢復1000個x。既信號表壓縮了。

3:如何解壓?20個方程解出1000個未知數顯然不現實。但巧了,如果滿足信號稀疏性即x大多為0和rip條件,在範數下的最優就是應有的最優。反解x變成了凸優化的問題。

最後,該方法壓縮極快,矩陣乘法。解壓極慢,凸優化。


在現有的傳統的信號處理模式中,信號要採樣、壓縮然後再傳輸,接收端要解壓再恢復原始信號。採樣過程要遵循奈奎斯特採樣定理,也就是採樣速率不能小於信號最高頻率的兩倍,這樣才能保證根據採樣所得的信息可以完整地恢復出原始信號。在現實中採樣頻率往往高於2倍的最高頻率,達到5倍到10倍最高頻率。這樣一來採樣頻率變高,採樣數據增多,需要更多的存儲空間,增大設備的硬體複雜度和計算量。因此採樣後要進行壓縮。壓縮過程實際上是把數據變換到一個變換域上,然後把此變換域上數值較大的數據保留下來,而大部分數值較小的數據被丟棄。這樣一來採樣到的絕大多數數據都在壓縮過程中被丟棄了,保留下來的一部分數據繼續進行傳輸。

在壓縮感知理論中,採樣和壓縮可以看作是同時進行的,在採樣壓縮時不需要遵循奈奎斯特採樣定理。在接收端通過合適的重構演算法就可以恢復出原始信號,因此壓縮感知可以避免在傳統的信號處理模式中的問題,即採樣速率過大帶來的數據量多、存儲空間大、硬體複雜度高,計算量大的問題以及壓縮過程中的數據浪費和資源浪費問題。

那麼壓縮感知是怎樣實現的呢?

壓縮感知的主要流程圖如下:

壓縮採樣過程實際上可以被表示為:

式子表示為:y=φx=φψs

x為原始信號,即n×1的陣列,y為壓縮採樣後得到的觀測值,也就是採樣值,為m×1的陣列(一般m&<&至於恢復演算法我不是很了解。。。等啥時候了解了再來補充。。。


檢索論文的時候用sparse更可能會找到相關文獻


Compressed sensing and single-pixel cameras

by Terence Tao

(一般貼了鏈接沒必要原文抄錄,不過鑒於GFW的淫威,還是轉了。另,原文中的例子鏈接壞了,新鏈接在這裡l1-Magic)

I』ve had a number of people ask me (especially in light of some recent publicity) exactly what 「compressed sensing」 means, and how a 「single pixel camera」 could possibly work (and how it might be advantageous over traditional cameras in certain circumstances). There is a large literature on the subject, but as the field is relatively recent, there does not yet appear to be a good non-technical introduction to the subject. So here』s my stab at the topic, which should hopefully be accessible to a non-mathematical audience.

For sake of concreteness I』ll primarily discuss the camera application, although compressed sensing is a more general measurement paradigm which is applicable to other contexts than imaging (e.g. astronomy, MRI, statistical selection, etc.), as I』ll briefly remark upon at the end of this post.

The purpose of a camera is, of course, to record images. To simplify the discussion, let us think of an image as a rectangular array, e.g. a 1024 x 2048 array of pixels (thus there are 2 megapixels in all). To ignore the (minor) issue of colour, let us assume that we are just taking a black-and-white picture, so that each pixel is measured in grayscale as an integer (e.g. an 8-bit integer from 0 to 255, or a 16-bit integer from 0 to 65535) which signifies the intensity of each pixel.

Now, to oversimplify things quite a bit, a traditional digital camera would take one measurement of intensity for each of its pixels (so, about 2 million measurements in the above example), resulting in a relatively large image file (2MB if one uses 8-bit grayscale, or 4MB if one uses 16-bit grayscale). Mathematically, this file can be represented by a very high-dimensional vector of numbers (in this example, the dimension is about 2 million).

Before I get to the new story of 「compressed sensing」, I have to first quickly review the somewhat older story of plain old 「compression」. (Those who already know how image compression works can skip forward a few paragraphs.)

The images described above can take up a lot of disk space on the camera (or on some computer where the images are later uploaded), and also take a non-trivial amount of time (and energy) to transfer from one medium to another. So, it is common practice to get the camera to compress the image, from an initial large size (e.g. 2MB) to a much smaller size (e.g. 200KB, which is 10% of the size). The thing is that while the space of all images has 2MB worth of 「degrees of freedom」 or 「entropy」, the space of all interesting images is much smaller, and can be stored using much less space, especially if one is willing to throw away some of the quality of the image. (Indeed, if one generates an image at random, one will almost certainly not get an interesting image; instead, one will just get random noise looking much like the static one can get on TV screens.)

How can one compress an image? There are many ways, some of which are rather technical, but let me try to give a non-technical (and slightly inaccurate) sketch of how it is done. It is quite typical for an image to have a large featureless component – for instance, in a landscape, up to half of the picture might be taken up by a monochromatic sky background. Suppose for instance that we locate a large square, say pixels, which are all exactly the same colour – e.g. all white. Without compression, this square would take 10,000 bytes to store (using 8-bit grayscale); however, instead, one can simply record the dimensions and location of the square, and note a single colour with which to paint the entire square; this will require only four or five bytes in all to record, leading to a massive space saving. Now in practice, we don』t get such an impressive gain in compression, because even apparently featureless regions have some small colour variation between them. So, given a featureless square, what one can do is record the average colour of that square, and then subtract that average off from the image, leaving a small residual error. One can then locate more squares where the average colour is significant, and subtract those off as well. If one does this a couple times, eventually the only stuff left will be very small in magnitude (intensity), and not noticeable to the human eye. So we can throw away the rest of the image and record only the size, location, and intensity of the 「significant」 squares of the image. We can then reverse this process later and reconstruct a slightly lower-quality replica of the original image, which uses much less space.

Now, the above algorithm is not all that effective in practice, as it does not cope well with sharp transitions from one colour to another. It turns out to be better to work not with average colours in squares, but rather with average colourimbalances in squares – the extent to which the intensity on (say) the right half of the square is higher on average than the intensity on the left. One can formalise this by using the (two-dimensional) Haar wavelet system. It then turns out that one can work with 「smoother」 wavelet systems which are less susceptible to artefacts, but this is a technicality which we will not discuss here. But all of these systems lead to similar schemes: one represents the original image as a linear superposition of various 「wavelets」 (the analogues of the coloured squares in the preceding paragraph), stores all the significant (large magnitude) wavelet coefficients, and throws away (or 「thresholds」) all the rest. This type of 「hard wavelet coefficient thresholding」 compression algorithm is not nearly as sophisticated as the ones actually used in practice (for instance in theJPEG 2000 standard) but it is somewhat illustrative of the general principles in compression.

To summarise (and to oversimplify somewhat), the original image may have two million degrees of freedom, and in particular if one wants to express this image in terms of wavelets then one would need thus need two million different wavelets in order to reconstruct all images perfectly. However, the typical interesting image is very sparse or compressible in the wavelet basis: perhaps only a hundred thousand of the wavelets already capture all the notable features of the image, with the remaining 1.9 million wavelets only contributing a very small amount of 「random noise」 which is largely invisible to most observers. (This is not always the case: heavily textured images – e.g. images containing hair, fur, etc. – are not particularly compressible in the wavelet basis, and pose a challenge for image compression algorithms. But that is another story.)

Now, if we (or the camera) knew in advance which hundred thousand of the 2 million wavelet coefficients are going to be the important ones, then the camera could just measure those coefficients and not even bother trying to measure the rest. (It is possible to measure a single coefficient by applying a suitable 「filter」 or 「mask」 to the image, and making a single intensity measurement to what comes out.) However, the camera does not know which of the coefficients are going to be the key ones, so it must instead measure all 2 million pixels, convert the image to a wavelet basis, locate the hundred thousand dominant wavelet coefficients to keep, and throw away the rest. (This is of course only a caricature of how the image compression algorithm really works, but we will use it for sake of discussion.)

Now, of course, modern digital cameras work pretty well, and why should we try to improve on something which isn』t obviously broken? Indeed, the above algorithm, in which one collects an enormous amount of data but only saves a fraction of it, works just fine for consumer photography. Furthermore, with data storage becoming quite cheap, it is now often feasible to use modern cameras to take many images with no compression whatsoever. Also, the computing power required to perform the compression is manageable, even if it does contribute to the notoriously battery-draining energy consumption level of these cameras. However, there are non-consumer imaging applications in which this type of data collection paradigm is infeasible, most notably in sensor networks. If one wants to collect data using thousands of sensors, which each need to stay in situ for long periods of time such as months, then it becomes necessary to make the sensors as cheap and as low-power as possible – which in particular rules out the use of devices which require heavy computer processing power at the sensor end (although – and this is important – we are still allowed the luxury of all the computer power that modern technology affords us at the receiver end, where all the data is collected and processed). For these types of applications, one needs a data collection paradigm which is as 「dumb」 as possible (and which is also robust with respect to, say, the loss of 10% of the sensors, or with respect to various types of noise or data corruption).

This is where compressed sensing comes in. The main philosophy is this: if one only needs a 100,000 components to recover most of the image, why not just take a 100,000 measurements instead of 2 million? (In practice, we would allow a safety margin, e.g. taking 300,000 measurements, to allow for all sorts of issues, ranging from noise to aliasing to breakdown of the recovery algorithm.) In principle, this could lead to a power consumption saving of up to an order of magnitude, which may not mean much for consumer photography but can be of real importance in sensor networks.

But, as I said before, the camera does not know in advance which hundred thousand of the two million wavelet coefficients are the important ones that one needs to save. What if the camera selects a completely different set of 100,000 (or 300,000) wavelets, and thus loses all the interesting information in the image?

The solution to this problem is both simple and unintuitive. It is to make 300,000 measurements which are totally unrelated to the wavelet basis – despite all that I have said above regarding how this is the best basis in which to view and compress images. In fact, the best types of measurements to make are (pseudo-)random measurements – generating, say, 300,000 random 「mask」 images and measuring the extent to which the actual image resembles each of the masks. Now, these measurements (or 「correlations」) between the image and the masks are likely to be all very small, and very random. But – and this is the key point – each one of the 2 million possible wavelets which comprise the image will generate their own distinctive 「signature」 inside these random measurements, as they will correlate positively against some of the masks, negatively against others, and be uncorrelated with yet more masks. But (with overwhelming probability) each of the 2 million signatures will be distinct; furthermore, it turns out that arbitrary linear combinations of up to a hundred thousand of these signatures will still be distinct from each other (from a linear algebra perspective, this is because two randomly chosen 100,000-dimensional subspaces of a 300,000 dimensional ambient space will be almost certainly disjoint from each other). Because of this, it is possible in principle to recover the image (or at least the 100,000 most important components of the image) from these 300,000 random measurements. In short, we are constructing a linear algebra analogue of a hash function.

There are however two technical problems with this approach. Firstly, there is the issue of noise: an image is not perfectly the sum of 100,000 wavelet coefficients, but also has small contributions from the other 1.9 million coefficients also. These small contributions could conceivably disguise the contribution of the 100,000 wavelet signatures as coming from a completely unrelated set of 100,000 wavelet signatures; this is a type of 「aliasing」 problem. The second problem is how to use the 300,000 measurements obtained to recover the image.

Let us focus on the latter problem first. If we knew which 100,000 of the 2 million wavelets involved were, then we could use standard linear algebra methods (Gaussian elimination, least squares, etc.) to recover the signal. (Indeed, this is one of the great advantages of linear encodings – they are much easier to invert than nonlinear ones. Most hash functions are practically impossible to invert – which is an advantage in cryptography, but not in signal recovery.) However, as stated before, we don』t know in advance which wavelets are involved. How can we find out? A naive least-squares approach gives horrible results which involve all 2 million coefficients and thus lead to very noisy and grainy images. One could perform a brute-force search instead, applying linear algebra once for each of the possible set of 100,000 key coefficients, but this turns out to take an insanely impractical amount of time (there are roughly combinations to consider!) and in any case this type of brute-force search turns out to be NP-complete in general (it contains problems such assubset-sum as a special case). Fortunately, however, there are two much more feasible ways to recover the data:

  • Matching pursuit: locate a wavelet whose signature seems to correlate with the data collected; remove all traces of that signature from the data; and repeat until we have totally 「explained」 the data collected in terms of wavelet signatures.
  • Basis pursuit (or minimisation): Out of all the possible combinations of wavelets which would fit the data collected, find the one which is 「sparsest」 in the sense that the total sum of the magnitudes of all the coefficients is as small as possible. (It turns out that this particular minimisation tends to force most of the coefficients to vanish.) This type of minimisation can be computed in reasonable time via convex optimisation methods such as thesimplex method.

Note that these image recovery algorithms do require a non-trivial (though not ridiculous) amount of computer processing power, but this is not a problem for applications such as sensor networks since this recovery is done on the receiver end (which has access to powerful computers) rather than the sensor end (which does not).

There arenowrigorousresultswhich show that these approaches can reconstruct the original signals perfectly or almost-perfectly with very high probability of success, given various compressibility or sparsity hypotheses on the original image. The matching pursuit algorithm tends to be somewhat faster, but the basis pursuit algorithm seems to be more robust with respect to noise. Exploring the exact range of applicability of these methods is still a highly active current area of research. (Sadly, there does not seem to be an application to ; the type of sparse recovery problems which are NP-complete are the total opposite (as far as the measurement matrix is concerned) with the type of sparse recovery problems which can be treated by the above methods.)

As compressed sensing is still a fairly new field (especially regarding the rigorous mathematical results), it is still a bit premature to expect developments here to appear in actual sensors. However, there are proof-of-concept prototypes already, most notably the single-pixel camera developed at Rice.

Finally, I should remark that compressed sensing, being an abstract mathematical idea rather than a specific concrete recipe, can be applied to many other types of contexts than just imaging. Some examples include:

  • Magnetic resonance imaging (MRI). In medicine, MRI attempts to recover an image (in this case, the water density distribution in a human body) by taking a large but finite number of measurements (basically taking a discretised Radon transform (or x-ray transform) of the body), and then reprocessing the data. Because of the large number of measurements needed, the procedure is lengthy for the patient. Compressed sensing techniques can reduce the number of measurements required significantly, leading to faster imaging (possibly even to real-time imaging, i.e. MRI videos rather than static MRI). Furthermore, one can trade off the number of measurements against the quality of the image, so that by using the same number of measurements as one traditionally does, one may be able to get much finer scales of resolution.
  • Astronomy. Many astronomical phenomena (e.g. pulsars) have various frequency oscillation behaviours which make them very sparse or compressible in the frequency domain. Compressed sensing techniques then allow one to measure these phenomena in the time domain (i.e. by recording telescope data) and being able to reconstruct the original signal accurately even from incomplete and noisy data (e.g. if weather, lack of telescope time, or simply the rotation of the earth prevents a complete time-series of data).
  • Linear coding. Compressed sensing also gives a simple way for multiple transmitters to combine their output in an error-correcting way, so that even if a significant fraction of the output is lost or corrupted, the original transmission can still be recovered. For instance, one can transmit 1000 bits of information by encoding them using a random linear code into a stream of 3000 bits; and then it will turn out that even if, say, 300 of the bits (chosen adversarially) are then corrupted, the original message can be reconstructed perfectly with essentially no chance of error. The relationship with compressed sensing arises by viewing the corruption itself as the sparse signal (it is only concentrated on 300 of the 3000 bits).

Many of these applications are still only theoretical, but nevertheless the potential of these algorithms to impact so many types of measurement and signal processing is rather exciting. From a personal viewpoint, it is particularly satisfying to see work arising from pure mathematics (e.g. estimates on the determinant or singular values of Fourier minors) end up having potential application to the real world.


壓縮感知鼻祖Terence Tao在UCLA上課時講解的視頻,鏈接:陶哲軒介紹壓縮感知


壓縮感知是一種新的信號採樣方法。傳統的採樣需要遵循Nyquist採樣定理,採樣得到的信號在經過信源編碼扔掉大部分的冗餘信息,但是壓縮感知可以極大的降低採樣數據量,不需要浪費資源採集再扔掉。

壓縮感知主要涉及到如何採樣和如何恢復的問題。首先壓縮感知是利用測量矩陣進行線性測量,測量矩陣的性質關係到能否恢復出原始信號。正如你所說的,測量矩陣需要有隨機性,測量矩陣行或者列之間要求必須相關性很低即需要滿足RIP性質。

對於恢復演算法,目前有的演算法基本上劉文文都說了,但是目前還有一類新方法,基於圖模型推斷的貝葉斯方法——AMP演算法,這類演算法有很低的計算複雜度和快速的收斂性。在信號維度較大的時候可以體現出較大的優勢。AMP演算法在i.i.d. Gaussian 矩陣下的收斂性和最優性已經有證明了,目前基於AMP演算法已經有很多研究進展,有把i.i.d.矩陣推廣到正交矩陣的新演算法,此類演算法可以用快速演算法實現,在信號維度很大的時候優勢明顯。

最後吐槽一句,為啥把壓縮感知跟貪婪演算法合在一起作為一個話題???壓縮感知跟貪婪演算法完全是兩回事啊,知乎人還需要提高姿勢水平啊。

最後貼一個我在另外一個壓縮感知問題裡面的回答,大家也可以看看壓縮感知(compressed sensing)只適合稀疏數據,不適合大數據嗎? - 薛志鵬的回答,如果有什麼關於壓縮感知的問題,歡迎大家直接與我聯繫,交流。


舉個簡單的例子,一隻海綿魚(卧槽怎麼會有這種魚!)想要從河的下游游到上游,它需要攜帶大量的水(數據量),很累的,而且這這魚越大越困難。

而我們換一種思路,將魚拿出來擰乾(姑且說它在變身吧,biubiubiu,就是這麼叼)這就是從水域(原始空間)變換到空氣域(變換域),而且魚的身體成分也變少了,這就是稀疏表示的方法。

魚在空氣中,從下游到上游,隨便坐船坐車啦,這是數據傳輸,如果還是在水中的話是不是你家20M光纖要承受不了這條魚,而在空氣中,你玩lol的時候這魚就能輕鬆到達。

魚到達之後要把它恢復成水中的樣子,不管你是用淋浴還是用水桶,這其中恢復的方式就叫做重構演算法。

.

.

大體就是這麼個流程.......其實突破了奈奎斯特採樣定理,這太特么叼了,有興趣可以去看看陶哲軒


一句話介紹壓縮感知:

壓縮感知致力於從不充分、線性量測中恢復原始稀疏信號。

剩下雲里霧裡的解釋,都是關於壓縮感知的應用了,至於應用是否合理、是否有價值,就看是否有公司利用壓縮感知賺錢了。迄今為止,我沒有看到哪個公司利用壓縮感知賺錢。


我是做認知無線電中的寬頻頻譜檢測,正在入門中。看過一些壓縮感知方面的文章,這裡說說我自己的簡單理解。

我們知道,傳統的奈奎斯特抽樣定理要求抽樣頻率是帶寬的2倍,但是如果在無線信道中傳輸的信號的帶寬過大,就會要求很大的抽樣頻率。而這樣的AD不僅難做也很貴。於是就有人提出了在這方面運用壓縮感知的理論。

傳統的壓縮是扔用奈奎斯特抽樣定理進行大量的抽樣,然後再壓縮,即適當的選擇一部分數據,丟棄了大多數。但壓縮感知是將抽樣和壓縮一步完成的,比如用AIC, MWC 等欠奈奎斯特抽樣定理,只從模擬信號中抽出了一小部分數據,即s。這樣在根據感知矩陣和基矩陣,我們就可以恢復出原信號的離散形式,這一過程叫信號重構。最常用的是凸優化演算法和貪婪演算法。這就是把硬體上實現的難度減輕,而增加了軟體的複雜度。

如今,壓縮感知被用在很多領域,無線通信,雷達,醫學器械,圖像處理等等等~~~望各位大神多多指點


推薦閱讀:

圖像降噪和圖像濾波的區別是什麼?
在SIFT和SURF之後,有哪些比較新的且具有一定影響力的自然圖像配准演算法?
同一個卷積層中的 Feature Map有什麼區別?
如何理解卷積,另外如何理解圖像處理中的卷積?
哪裡可以找到比較全的關於processing代碼的解釋?

TAG:圖像處理 | 機器學習 | 數字信號處理 | 壓縮感知貪婪演算法 |