【論文筆記】SID

08-13

【論文筆記】SID

來自專欄 zls的日常碎碎念https://arxiv.org/pdf/1805.01934.pdf?

arxiv.org

abstract

imaging in low light

a pipeline for processing low-light images, based on end to end training of a fully-convolutional network

operate directly on raw sensor data

analyze factors that affect performance and highlight opportunities for future work

introduction

noise makes imaging particularly challenging in low light

high ISO(底片曝光率) can be used to increase brightness, but it amplifies noise

scaling or histogram stretching can be applied but dont resolve the low signal-to-noise ratio(SNR) due to low photon counts

increase exposure time can introduce blur due to camera shake or object motion

extreme low-light imaging with severely limited illumination(e.g.,moonlight) and short exposure(at video rate)

train DNN to learn the image processing pipeline for low-light raw data, including color transformations, demosaicing, noise reduction, and image enhancement

the pipe is trained end-to-end to avoid the noise amplification and error accumulation

related work

image denoising
BM3D
the application of deep networks to denoising, including stacked sparse denoising auto-encoders(SSDA), trainable nonlinear reaction diffusion(TNRD)
Gaussian or salt&pepper noise

multiple-image denoising can achieve better results since more information is collected form the scene
to denoise a burst of images from the same scene
low-light image enhancement
histogram equalization to balance the histogram of the entire image
gamma correction, which increase the brightness of dark regions while compressing bright pixels
noisy image datasets
raw short-exposure images, each with a reference long-exposure image

See-in-the-Dark Dataset

multiple short-exposure images can correspond to the same long-exposure reference image

Method

1.pipeline

L3: a large collection of local, linear and learned filters to approximate the complex nonlinear pipelines

single image processing of fast low-light imaging, operate on raw sensor data

for Bayer arrays, pack the input into 4 channels and reduce the spatial resolution

$H imes W ightarrowfrac{H}2 imesfrac{W}2 imes4$

for X-Trans arrays, the raw data is arranged in $6 imes 6$ blocks, pack it into 9 channels

subtract the black level and scale the data by the desired amplification ratio

then feed it into FCN, the output is a 12-channel image with half the spatial resolution, and the output is processed by a sub-pixel layer to recover the original resolution

pack--subtract black level--amplify--feed into FCN--sub pixel layer to recover

U-net is the default architecture

residual connection isnt beneficial in this setting because input and output are represented in different color spaces

the amplification ratio determines the brightness of the output

train

from scratch using the $L_1$ loss and the Adam optimizer

in each iteration, randomly crop a $512 imes512$ patch for training and apply random flipping and rotation for data augmentation

u-net

https://arxiv.org/pdf/1505.04597.pdf?

arxiv.org

Experiment

nonblind denoising method BM3D

this pipeline perform blind noise suppression that can locally adapt to the data

a dedicated network which is trained for a specific camera sensor may not always be necessary

U-net is better than CAN, sometimes colors are not recovered correctly by the CAN

operating(denoising) directly on raw sensor data is much more effective in extreme low-light conditions

for Bayer data, pack the color values into different channels with correspondingly lower spatial resolution yields better than duplicating and masking the different colors

its hard for network to learn histogram stretching, the accuracy drops significantly when histogram stretching is applied to the reference images, so exclude it from the pipeline and optionally apply it as postprocessing

noise suppression and color transformation

dont:

address HDR tone mapping

contain human and dynamic objects

amplification ratio must be chosen externally

runtime optimization