Learning a Hierarchical Latent-Variable Model of 3D Shapes 筆記

Learning a Hierarchical Latent-Variable Model of 3D Shapes 筆記

來自專欄 V.DeepLearninghttps://arxiv.org/pdf/1705.05994.pdf?

arxiv.org

2D圖片的風格化處理已應用到日常的方方面面,本文介紹了基於資料庫ModelNet40 (3D CAD benchmark) 用於無監督學習的多物體3D voxel data的latent-variable model(隱變數模型) —— VSL (variational shape learner)。

以往的神經網路多是學習簡單的隱變數表現,如deep belief networks, deep auto-encoders & 3D CNN。但是以上皆是基於single vector representation of 3D shape,生成模型局限於single layer of latent variable。

1. 原理&結構

VSL則基於Multilevel的隱變數結構可以利用底層的隱變數(例如Edges的shape或者place

of edge)來抽象的描述出高層隱變數的特點,並利用bayesian的變形作為loss

function。

接近input的local層含有較多的low-level feature,描述了each level of feature abstraction, skip-connection 以top-down方向鏈接了每個local層,遠離input的local層含有較多的higher-level feature信息。Global層包含了每個local層的feature信息和相對應的位置。

這樣的結構帶來2個好處:

1)Straightforward parametrization of generative model

2)Cutting-off overfitting

1.1 Latent Loss

Loss function

1.2 Encoder: 3D-Conv + Skip-Connections

3層CNN {6,5,4}, {2,2,1}, {32,64,128} + 2層fully-connected 每層100 neurons

單個voxe的結構結構近似

1.3 Decoder: 3D-DeConvNet

如圖片右邊藍色的dash line的single vector, 這裡的3D-DeConvNet結構與encode的是對稱的,output層選用element-wise

logistic sigmoid。

1.4 Image Regressor: 2D-ConvNet

4層fully-conv, kernel {32,15,5,3}, strides {2,2,2,1},

channels {16,32,64,128}, 最後一層較為特殊——flatted, fed into 2 層fully-connected

with 200, 100 neurons each.

Dropout在最後一個fully connected前。

2.實驗結果

2.1 shape classification

2.2 Single Image 3D Model Retrieval

2.3 Shape Arithmetic

3. 小結論

1. shape arithmetic: no need to match actual 3D shapes from the original dataset

2. image reconstruction中,warming-up可以提高performance,Loss接近10-1

3. smooth transition between 2 objects


推薦閱讀:

TAG:深度學習DeepLearning | 計算機視覺 |