【論文筆記】HDR

07-27

【論文筆記】HDR

來自專欄 zls的日常碎碎念http://static.googleusercontent.com/media/www.hdrplusdata.org/en//hdrplus.pdf?

static.googleusercontent.com

abstract

problems about cell phone cameras:

1.small apertures ->noisy images in low light

2.small sensor pixels -> limit dynamic range

goal:

a computational photography pipeline that captures, aligns and merges a burst of frames to reduce noise and increase dynamic range

work:

1.capture frames of constant exposure, which makes alignment more robust. And set this

exposure low enough to avoid blowing out highlights

bracketed exposures

alignment

blowing out highlights

HDR tone mapping method

2.begin from Bayer raw frames, which gives us more bits per pixel and allows us to circumvent tone mapping and spatial denoising

3.a FFT-based alighment algorithm and a hybrid 2D/3D Wiener filter to denoise and merge the frames in a burst

introduction

lack of light → apply analog or digital gain but amplifies noise and lengthen exposure time, which cause motion blur due to camera shake and subject motion

problem range: indoor or night-time shot, daytime shot with high dynamic range

to gather light: a larger-aperture lens, optical image stabilization, exposure bracketing, or flash, but each method is a tradeoff

camera system → capture a burst of images and combining them with dynamic range compression

design principle for camera system

be immediate: produce a photograph within a few seconds and display it on the camera
be automatic: the method must be parameter-free and fully automatic
be natural: be faithful to the appearance of the scene, limit the amount of local tone mapping, in very low-light scenes we must not brighten the image so much
be conservative

low constant exposure--align and merge multiple frames--

capture each image in the burst with the same exposure time, dont bracket

HDR fusion methods solve the varying exposure with sophisticated alignment and inpainting

choose a low enough exposure to avoid clipping for the given scene, i.e.deliberately down-expose to capture more dynamic range

choose shorter than typical exposure times to mitigate came shake blur

though using lower exposures leads to worse noise, offset this effect by capturing and merging multiple frames

select one of the images in the burst as a reference frame, then align and merge into this frame

to reduce computational complexity, merge only a single patch from each alternate frame

by aligning and merging multiple frame, produce an intermediate image with higher bit depth, higher dynamic range, and reduce noise compared to our input frames

HDR tone mapping--boost shadows, preserving local contrast while sacrificing global contrast

overview of capture and processing

two pipelines

the input to both pipelines is a stream of Bayer images at full sensor

when the app is launched, only the viewfinder is active, this pipeline converts raw images into low-resolution images for display on the mobile phone screen

when the shutter is pressed,a burst of frames is captured at constant exposure, store in main memory and the software is acitvated. It aligns and merges the frames in the burst, producing a single intermediate image of high bit depth, then applies color and tone mapping(white balance, demosaic, chroma denoise, exposure fusion, global tone map, sharpen, hue and saturation) to produce a full-resolution 8-bit output photograph for compression and storage

the former pipeline is computed by a hardware Image Signal Processor

while the latter is computed in software running on application processor

advantages of using raw images:

increase dynamic range: the pixels in raw images are 10 bits, whereas the RGB(YUV) pixels produced by mobile ISPs are 8 bits, but the actual advantage is less than 2 bits, because raw is linear and YUV has a gamma curve
linearity: ISPs include nonlinear tone mapping while raw images is linearity, which let model sensor noise accurately to make alignment and merge more reliable, and also makes auto-exposure easier
protability:

auto-exposure

reuse the capture settings from a recent viewfinder frame when requesting our constant-exposure burst

it is good for scenes with moderate dynamic range but for scenes with high dynamic range, the captured images may include blown highlights or underexposed subjects

develop a auto-exposure algorithm, determining not only the overall exposure but dynamic range compression to come, which consists of 3 steps:

deliberately underexpose so that fewer pixels saturate
capture multiple frames to reduce noise in the shadows
compress the dynamic range using local tone mapping

capture a burst to reduct noise so that we can underexposure

how much to underexpose, how much to compress the dynamic range, how many frames to capture

underexposure as dynamic range compression

underexposure at capture is tightly coupled with the dynamic range compression applied in processing

fuse 2 gamma-corrected images, an underexposed input frame and a brighter version of the same frame, where digital gain compensates for underexposure, i.e. a short exposure for the highlights to capture the scene, and a synthetic long exposure for the shadows using in HDR tone mapping. 8 in a proper range

auto-exposure by example

exposure factorization

factorize it into exposure time and gain and use a fixed schedule to balance motion blur against noise

for the brightest scenes, hold gain at its minimum level, allowing the times to increase up to 8ms

as scenes become darker, we hold exposure time at 8ms and increase gain up to 4 $imes$

burst size

limit bursts to 2-8 images, in low light and high dynamic range, need more frames while in bright scenes, 1-2 images is suffierent

viewfinder integration

to improve latency and save power, only run auto-exposure one in every 4 frames

aligning frames

alignment consist of finding a dense correspondence from each alternate frame of our burst to a chosen reference frame

because merging procedure is robust to both small and gross alignment errors, can construct a simple algorithm meeting our requirement, which use a frequency-domain acceleration method

reference frame selection

choose the reference frame to be the sharpest frame to address blur induced by hand and scene motion, by a simple metric based on gradients in the green channel of the raw input

to minimize perceived shutter leg, choose the reference frame from the first 3 frames in the burst

handling raw images

input consists of Bayer raw images, the four color planes of a raw image are undersampled, making alignment an ill-posed problem.

to solve this problem, estimate displacement only up to a multiple of 2 pixels

implement it by averaging 2 $imes$ 2 blocks of Bayer RGGB samples, so that we align downsampled 3Mpix grayscale images instead of 12 Mpix raw images

Hierarchical alignment

perform a coarse-to-fine alignment on four-level Gaussian pyramids of the downsampled-to-gray raw input

each reference tiles alignment is the offset that minimizes the following distance measure relating it to candidate tiles in the alternate image

$D_p(u,v)=sum_{y=0}^{n-1}sum_{x=0}^{n-1}left| T(x,y)-I(x+u+u_0,y+v+v_0) ight|^p$

where T is a tile of the reference image, I is a larger search area of the alternate image, p is the power of the norm used for alignment(1 or 2), n is the size of the tile(8 or 16)