【論文筆記】HDR
來自專欄 zls的日常碎碎念http://static.googleusercontent.com/media/www.hdrplusdata.org/en//hdrplus.pdf
abstract
problems about cell phone cameras: 1.small apertures ->noisy images in low light2.small sensor pixels -> limit dynamic range
goal:a computational photography pipeline that captures, aligns and merges a burst of frames to reduce noise and increase dynamic range
work:1.capture frames of constant exposure, which makes alignment more robust. And set this
exposure low enough to avoid blowing out highlightsbracketed exposures
alignment
blowing out highlights
HDR tone mapping method
2.begin from Bayer raw frames, which gives us more bits per pixel and allows us to circumvent tone mapping and spatial denoising
3.a FFT-based alighment algorithm and a hybrid 2D/3D Wiener filter to denoise and merge the frames in a burst
introduction
lack of light → apply analog or digital gain but amplifies noise and lengthen exposure time, which cause motion blur due to camera shake and subject motion
problem range: indoor or night-time shot, daytime shot with high dynamic range
to gather light: a larger-aperture lens, optical image stabilization, exposure bracketing, or flash, but each method is a tradeoffcamera system → capture a burst of images and combining them with dynamic range compression
design principle for camera system
- be immediate: produce a photograph within a few seconds and display it on the camera
- be automatic: the method must be parameter-free and fully automatic
- be natural: be faithful to the appearance of the scene, limit the amount of local tone mapping, in very low-light scenes we must not brighten the image so much
- be conservative
low constant exposure--align and merge multiple frames--
capture each image in the burst with the same exposure time, dont bracket
HDR fusion methods solve the varying exposure with sophisticated alignment and inpainting
choose a low enough exposure to avoid clipping for the given scene, i.e.deliberately down-expose to capture more dynamic range
choose shorter than typical exposure times to mitigate came shake blur
though using lower exposures leads to worse noise, offset this effect by capturing and merging multiple frames
select one of the images in the burst as a reference frame, then align and merge into this frame
to reduce computational complexity, merge only a single patch from each alternate frame
by aligning and merging multiple frame, produce an intermediate image with higher bit depth, higher dynamic range, and reduce noise compared to our input frames
HDR tone mapping--boost shadows, preserving local contrast while sacrificing global contrast
overview of capture and processing
two pipelines
the input to both pipelines is a stream of Bayer images at full sensor
when the app is launched, only the viewfinder is active, this pipeline converts raw images into low-resolution images for display on the mobile phone screen
when the shutter is pressed,a burst of frames is captured at constant exposure, store in main memory and the software is acitvated. It aligns and merges the frames in the burst, producing a single intermediate image of high bit depth, then applies color and tone mapping(white balance, demosaic, chroma denoise, exposure fusion, global tone map, sharpen, hue and saturation) to produce a full-resolution 8-bit output photograph for compression and storage
the former pipeline is computed by a hardware Image Signal Processor
while the latter is computed in software running on application processor
advantages of using raw images:
- increase dynamic range: the pixels in raw images are 10 bits, whereas the RGB(YUV) pixels produced by mobile ISPs are 8 bits, but the actual advantage is less than 2 bits, because raw is linear and YUV has a gamma curve
- linearity: ISPs include nonlinear tone mapping while raw images is linearity, which let model sensor noise accurately to make alignment and merge more reliable, and also makes auto-exposure easier
- protability:
auto-exposure
reuse the capture settings from a recent viewfinder frame when requesting our constant-exposure burst
it is good for scenes with moderate dynamic range but for scenes with high dynamic range, the captured images may include blown highlights or underexposed subjects
develop a auto-exposure algorithm, determining not only the overall exposure but dynamic range compression to come, which consists of 3 steps:
- deliberately underexpose so that fewer pixels saturate
- capture multiple frames to reduce noise in the shadows
- compress the dynamic range using local tone mapping
capture a burst to reduct noise so that we can underexposure
how much to underexpose, how much to compress the dynamic range, how many frames to capture
underexposure as dynamic range compression
underexposure at capture is tightly coupled with the dynamic range compression applied in processing
fuse 2 gamma-corrected images, an underexposed input frame and a brighter version of the same frame, where digital gain compensates for underexposure, i.e. a short exposure for the highlights to capture the scene, and a synthetic long exposure for the shadows using in HDR tone mapping. 8 in a proper range
auto-exposure by example
exposure factorization
factorize it into exposure time and gain and use a fixed schedule to balance motion blur against noise
for the brightest scenes, hold gain at its minimum level, allowing the times to increase up to 8ms
as scenes become darker, we hold exposure time at 8ms and increase gain up to 4
burst size
limit bursts to 2-8 images, in low light and high dynamic range, need more frames while in bright scenes, 1-2 images is suffierent
viewfinder integration
to improve latency and save power, only run auto-exposure one in every 4 frames
aligning frames
alignment consist of finding a dense correspondence from each alternate frame of our burst to a chosen reference frame
because merging procedure is robust to both small and gross alignment errors, can construct a simple algorithm meeting our requirement, which use a frequency-domain acceleration method
reference frame selection
choose the reference frame to be the sharpest frame to address blur induced by hand and scene motion, by a simple metric based on gradients in the green channel of the raw input
to minimize perceived shutter leg, choose the reference frame from the first 3 frames in the burst
handling raw images
input consists of Bayer raw images, the four color planes of a raw image are undersampled, making alignment an ill-posed problem.
to solve this problem, estimate displacement only up to a multiple of 2 pixels
implement it by averaging 2 2 blocks of Bayer RGGB samples, so that we align downsampled 3Mpix grayscale images instead of 12 Mpix raw images
Hierarchical alignment
perform a coarse-to-fine alignment on four-level Gaussian pyramids of the downsampled-to-gray raw input
each reference tiles alignment is the offset that minimizes the following distance measure relating it to candidate tiles in the alternate image
where T is a tile of the reference image, I is a larger search area of the alternate image, p is the power of the norm used for alignment(1 or 2), n is the size of the tile(8 or 16)
推薦閱讀:
※在unity引擎中的工程如何適配HDR電視?
※玩ps4,除了4k hdr電視,有4k hdr顯示器嗎?
※HDR 電影技術,目前(2012 年)發展到什麼程度了?
※如何評價小米電視3S 65寸/55寸平面電視?
※電影《少年斯派維的奇異旅行》中的畫面如何做到像 HDR 一樣?