【論文筆記】SID
來自專欄 zls的日常碎碎念https://arxiv.org/pdf/1805.01934.pdf
abstract
imaging in low light
a pipeline for processing low-light images, based on end to end training of a fully-convolutional network
operate directly on raw sensor data
analyze factors that affect performance and highlight opportunities for future work
introduction
noise makes imaging particularly challenging in low light
high ISO(底片曝光率) can be used to increase brightness, but it amplifies noise
scaling or histogram stretching can be applied but dont resolve the low signal-to-noise ratio(SNR) due to low photon counts
increase exposure time can introduce blur due to camera shake or object motion
extreme low-light imaging with severely limited illumination(e.g.,moonlight) and short exposure(at video rate)
train DNN to learn the image processing pipeline for low-light raw data, including color transformations, demosaicing, noise reduction, and image enhancement
the pipe is trained end-to-end to avoid the noise amplification and error accumulation
related work
- image denoisingBM3Dthe application of deep networks to denoising, including stacked sparse denoising auto-encoders(SSDA), trainable nonlinear reaction diffusion(TNRD) Gaussian or salt&pepper noise
multiple-image denoising can achieve better results since more information is collected form the scene
to denoise a burst of images from the same scene - low-light image enhancementhistogram equalization to balance the histogram of the entire image gamma correction, which increase the brightness of dark regions while compressing bright pixels
- noisy image datasetsraw short-exposure images, each with a reference long-exposure image
See-in-the-Dark Dataset
multiple short-exposure images can correspond to the same long-exposure reference image
Method
1.pipeline
L3: a large collection of local, linear and learned filters to approximate the complex nonlinear pipelines
single image processing of fast low-light imaging, operate on raw sensor data
for Bayer arrays, pack the input into 4 channels and reduce the spatial resolution
for X-Trans arrays, the raw data is arranged in blocks, pack it into 9 channels
subtract the black level and scale the data by the desired amplification ratio
then feed it into FCN, the output is a 12-channel image with half the spatial resolution, and the output is processed by a sub-pixel layer to recover the original resolution
pack--subtract black level--amplify--feed into FCN--sub pixel layer to recover
U-net is the default architecture
residual connection isnt beneficial in this setting because input and output are represented in different color spaces
the amplification ratio determines the brightness of the output
train
from scratch using the loss and the Adam optimizer
in each iteration, randomly crop a patch for training and apply random flipping and rotation for data augmentation
u-net
https://arxiv.org/pdf/1505.04597.pdf
Experiment
nonblind denoising method BM3D
this pipeline perform blind noise suppression that can locally adapt to the data
a dedicated network which is trained for a specific camera sensor may not always be necessary
U-net is better than CAN, sometimes colors are not recovered correctly by the CAN
operating(denoising) directly on raw sensor data is much more effective in extreme low-light conditions
for Bayer data, pack the color values into different channels with correspondingly lower spatial resolution yields better than duplicating and masking the different colors
its hard for network to learn histogram stretching, the accuracy drops significantly when histogram stretching is applied to the reference images, so exclude it from the pipeline and optionally apply it as postprocessing
noise suppression and color transformation
dont:
address HDR tone mapping
contain human and dynamic objects
amplification ratio must be chosen externally
runtime optimization
推薦閱讀:
※『從零開始寫渲染器』 (七) 渲染模型
※Spherical Harmonics 101
※聊聊Unity的Gamma校正以及線性工作流
※Signed Distance Field Shadow in Unity
※製作簡易碰撞體線框