標籤:

gluon源碼分析1

gluon源碼分析1

4 人贊了文章

如何使用及前後向結果分析

from mxnet import ndfrom mxnet.gluon import nnfrom mxnet import gluonfrom mxnet import autogradclass Net(nn.Block): def __init__(self, **kwargs): super(Net, self).__init__(**kwargs) self.dense0 = nn.Dense(4, use_bias=False) self.dense1 = nn.Dense(2, use_bias=False) def forward(self, x): return self.dense1((self.dense0(x)))def train(): net = Net() net.initialize() w = net.dense0.weight print (weight shape after initialize, w.shape, weight params, w.data()) trainer = gluon.Trainer(net.collect_params(), sgd, {learning_rate: 1}) data = nd.ones(shape=(1, 1, 28, 28)) label = nd.ones(shape=(1, 10)) loss = gluon.loss.L2Loss() with autograd.record(): res = net(data) w = net.dense0.weight print (net[0] name, net.dense0.name, weight shape, w.shape,
params, w.data(), grad, w.grad()) L = loss(res, label) L.backward() trainer.step(batch_size=1) print (net[0] name, net.dense0.name, weight shape, w.shape,
params, w.data(), grad, w.grad())if __name__ == __main__: train()

執行輸出結果:

(weight shape after initialize, (4, 0))(net[0] name, dense0, weight shape, (4L, 784L),
params, [[ 0.04118239 0.05352169 -0.04762455 ..., 0.03089482 -0.00140258 0.01266012] [-0.00697319 -0.00986735 -0.03128323 ..., 0.02195714 -0.04105704 0.01050965] [ 0.02380178 -0.04182156 0.04908523 ..., -0.05005977 -0.0463761 0.0436078 ] [-0.04813539 -0.03545294 -0.01216894 ..., 0.06526501 -0.00576673 -0.02751607]]<NDArray 4x784 @cpu(0)>, grad, [[ 0. 0. 0. ..., 0. 0. 0.] [ 0. 0. 0. ..., 0. 0. 0.] [ 0. 0. 0. ..., 0. 0. 0.] [ 0. 0. 0. ..., 0. 0. 0.]]<NDArray 4x784 @cpu(0)>)(net[0] name, dense0, weight shape, (4L, 784L),
params, [[ 0.02016377 0.03250307 -0.06864318 ..., 0.00987619 -0.0224212 -0.00835851] [-0.05362909 -0.05652324 -0.07793912 ..., -0.02469876 -0.08771293 -0.03614624] [ 0.0333778 -0.03224555 0.05866124 ..., -0.04048375 -0.03680009 0.05318382] [-0.03410936 -0.02142691 0.00185709 ..., 0.07929104 0.00825929 -0.01349004]]<NDArray 4x784 @cpu(0)>, grad, [[ 0.02101862 0.02101862 0.02101862 ..., 0.02101862 0.02101862 0.02101862] [ 0.04665589 0.04665589 0.04665589 ..., 0.04665589 0.04665589 0.04665589] [-0.00957601 -0.00957601 -0.00957601 ..., -0.00957601 -0.00957601 -0.00957601] [-0.01402603 -0.01402603 -0.01402603 ..., -0.01402603 -0.01402603 -0.01402603]]<NDArray 4x784 @cpu(0)>)

以上代碼包含了一個神經網路的典型結構:

  • 定義網路,上面是一個mlp
  • 網路初始化
  • 訓練網路
    • 前向傳播
    • 計算loss
    • 反向傳播得到梯度
    • 更新權重參數

上面的代碼證明了兩樣東西:

  • 定義網路後初始化給出的第二維是0, 這個是由於mxnet參數初始化延遲推導, 不知道輸入,沒辦法知道第二維參數,相比於pytorch,優點是不用定義每一層網路的輸入大小,但是一次forward之前就不知道參數的形狀了
  • weight=weight-lr*grad, 以第一個參數為例,上述列印的結果前向傳播的時候大小是0.04118239, 梯度是0, 一次反向傳播後,梯度是0.02101862, 新的參數是0.02016377=0.04118239-0.02101862

以上涉及了gluon的關鍵組件:

  • gluon.nn.Block,Sequential, HybridBlock, HybridSequential的父類
  • loss
  • gluon.Trainer, 用來輔助更新模型參數的輔助類
  • mxnet.optimizer
  • mxnet.nd

推薦閱讀:

GluonNLP 0.3.3 新功能及重現報告
MXNet 視頻I/O讀取速度優化
tensorflow separable_conv2d
如何對比 PyTorch 和 Gluon 的煉丹體驗?

TAG:MXNet |