Learning a Hierarchical Latent-Variable Model of 3D Shapes 筆記
來自專欄 V.DeepLearninghttps://arxiv.org/pdf/1705.05994.pdf
2D圖片的風格化處理已應用到日常的方方面面,本文介紹了基於資料庫ModelNet40 (3D CAD benchmark) 用於無監督學習的多物體3D voxel data的latent-variable model(隱變數模型) —— VSL (variational shape learner)。
以往的神經網路多是學習簡單的隱變數表現,如deep belief networks, deep auto-encoders & 3D CNN。但是以上皆是基於single vector representation of 3D shape,生成模型局限於single layer of latent variable。
1. 原理&結構
VSL則基於Multilevel的隱變數結構可以利用底層的隱變數(例如Edges的shape或者place
of edge)來抽象的描述出高層隱變數的特點,並利用bayesian的變形作為loss
function。接近input的local層含有較多的low-level feature,描述了each level of feature abstraction, skip-connection 以top-down方向鏈接了每個local層,遠離input的local層含有較多的higher-level feature信息。Global層包含了每個local層的feature信息和相對應的位置。
這樣的結構帶來2個好處:
1)Straightforward parametrization of generative model
2)Cutting-off overfitting
1.1 Latent Loss
Loss function
1.2 Encoder: 3D-Conv + Skip-Connections
3層CNN {6,5,4}, {2,2,1}, {32,64,128} + 2層fully-connected 每層100 neurons
單個voxe的結構結構近似
1.3 Decoder: 3D-DeConvNet
如圖片右邊藍色的dash line的single vector, 這裡的3D-DeConvNet結構與encode的是對稱的,output層選用element-wise
logistic sigmoid。1.4 Image Regressor: 2D-ConvNet
4層fully-conv, kernel {32,15,5,3}, strides {2,2,2,1},
channels {16,32,64,128}, 最後一層較為特殊——flatted, fed into 2 層fully-connectedwith 200, 100 neurons each.
Dropout在最後一個fully connected前。
2.實驗結果
2.1 shape classification
2.2 Single Image 3D Model Retrieval
2.3 Shape Arithmetic
3. 小結論
1. shape arithmetic: no need to match actual 3D shapes from the original dataset
2. image reconstruction中,warming-up可以提高performance,Loss接近10-1
3. smooth transition between 2 objects
推薦閱讀:
TAG:深度學習DeepLearning | 計算機視覺 |