2017 年 8 月 6 日發布的 pytorch 0.2.0 哪個特性最吸引你？

01-13

https://github.com/pytorch/pytorch/releases/tag/v0.2.0
你覺得新版本中哪個新引入的特性最贊，為什麼？

higher order auto diff。

（雖然早就在 auto-diff 的 branch 能用了……）

我倒是覺得 broadcasting 不是什麼好 feature，顯式 expand 感覺更易讀。

掃了一眼其他答案提到 dynet 的 auto-batching，辦公室對面的東歐小哥告訴我，他在 PyTorch 上實現了手動 batching，在 GPU 上比 dynet 快了 3 倍……

感覺 PyTorch 的 NLP helper func 非常跛腳，之前做另外一個項目的時候寫了好多好用的 utils（譬如說 dataloader 和基於 synonym 的 data augmentation）。。。

小哥 share 了一段 manual pack 的示例代碼段給我，有興趣可以參考下（似乎是hierachical attention rnn的attention forward部分）：

import numpy as np import torch from torch import nn from torch.autograd import Variable from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence


def pad_sentences(doc):

    """Converts a list of indices to a padded torch Variable"""

    n_sents = len(doc)

    len_ix = np.argsort([-len(sent) for sent in doc], kind="mergesort")

    doc = [doc[ix] for ix in len_ix]

    lens = [len(sent) for sent in doc]

    out = doc[0].new(n_sents, lens[0]).fill_(Constants.PAD)

    for i in range(n_sents):

        sent_length = doc[i].size(0)

        out[i].narrow(0, 0, sent_length).copy_(doc[i])
    inv_ix = np.zeros_like(len_ix)

    inv_ix[len_ix] = np.arange(len(len_ix))

    inv_ix = Variable(torch.LongTensor(inv_ix),

                      requires_grad=False).detach()

    out = Variable(out, requires_grad=False)
    if out.is_cuda:

        inv_ix = inv_ix.cuda()
    return out, lens, inv_ix
def _make_detached_long(x, cuda=False):

    y = Variable(torch.LongTensor(x), requires_grad=False).detach()

    if cuda:

        y = y.cuda()

    return y
def forward(self, X_train):

    # first, collect all sentences in the batch

    all_sents = [sent for doc in X_train for sent in doc]

    lengths = [len(sent) for sent in all_sents]

    offsets = np.cumsum([len(doc) for doc in X_train])

    all_sents_padded, sorted_lens, inv_ix = pad_sentences(all_sents)

    all_sents_embed = self.embed(all_sents_padded)
    all_sents_pack = pack_padded_sequence(all_sents_embed, sorted_lens,

                                          batch_first=True)

    all_sents_enc_pack, _ = self.word_encoder(all_sents_pack)

    all_sents_enc, _ = pad_packed_sequence(all_sents_enc_pack,

                                           batch_first=True)
    # undo sorting

    all_sents_enc = all_sents_enc.index_select(0, inv_ix)
    # test unsorting

    # for sent, sent_enc in zip(all_sents, all_sents_enc):

    #     assert len(sent) == torch.prod(sent_enc != 0, 1).sum()
    # word-level attention

    _, sents_repr = self.word_attn(all_sents_enc, lengths)
    # group sentences by doc

    doc_ixs = np.split(np.arange(len(all_sents)), offsets[:-1])

    doc_ixs = [_make_detached_long(ix, cuda=sents_repr.is_cuda)

               for ix in doc_ixs]

    sents_by_doc = [sents_repr.index_select(0, ix) for ix in doc_ixs]

    # check correctness

    #  assert len(sents_by_doc) == len(X_train)

    #  for doc, sents_ in zip(X_train, sents_by_doc):

    #      assert(len(doc) == len(sents_))
    # pad, pack and apply sentence-level RNN

    doc_lens = [len(doc) for doc in X_train]

    docs_padded, doc_lens_sorted, doc_inv_ix = pad_docs(sents_by_doc)

    docs_pack = pack_padded_sequence(docs_padded, doc_lens_sorted,

                                     batch_first=True)

    docs_enc_pack, _ = self.sent_encoder(docs_pack)

    docs_enc, _ = pad_packed_sequence(docs_enc_pack, batch_first=True)
    # undo sorting

    docs_enc = docs_enc.index_select(0, doc_inv_ix)
    # sentence-level attention

    _, docs_repr = self.sent_attn(docs_enc, doc_lens)

out_score = self.out(docs_repr) return out_score

當然是 broadcasting 和 advanced indexing了。

之前為了把 numpy 的代碼轉寫成 pytorch 的，因為 pytorch 不滋磁這些搗鼓了很久都沒有完美的解決方案。開始還以為是 doc 沒有看全或者解決這類問題的邏輯和 numpy 不一樣，結果發現更新了，加入這些功能。真是淚流滿面。

順便希望下一個版本加入像 numpy 一樣支持複數個 axis 的相關函數，比如 sum 和 max 等。

torch.autograd.grad 也是很重要的

官方做了一點微小的工作

當然是advanced indexing拉~~

畢竟我還沒有資源可以用distributed。。。

科科

LR Schedule作者默默飄過終於不用再像example里花好多行來寫learn rate decay了

高階梯度

wgan gp可以方便地實現了

有不少東西其實Release notes裡面沒提，不過有些東西其實在Master branch裡面呆了挺久了，上個版本沒有。比如 AlphaDropout，SeLU。。。

torch的很多基本操作比如sort支持out參數了，這樣可以重用Tensor了，不用每次都返回新的。

Distributed感覺GPU支持不是很好啊，Distributed communication package - torch.distributed

支持了分散式，torch.distributed包，讓用戶能在多台機器之間交換tensor，進而支持將神經網路的訓練擴展到多台機器上，也支持更大的mini-batch size。 facebook 1小時訓練ImagenNet 就是用了這個特性來完成的

高階導數還有分散式（至少看到pytorch進一步取代tensorflow的可能，這也是之前覺得tf少數夠牛的地方）。其他諸如broadcast和有趣的index技巧也很貼心。新層的加入也頗為激動人心，比如奇文共賞的selu。

要說最最最吸引的話，分散式吧。我不想碰tensorflow了。覺得將來，吃棗是TPU支持pytorch而不是pytorch支持TPU。

希望pytorch能繼續優雅下去！

Affine transform and sampler

當然是Bug Fixes！之前bug太多了。。。

learning rate schedule 也不錯嘛(∩_∩)

卧槽，expand不backward compatible了，差評！！！

---------------------------------------------

advanced indexing對於寫sampler太友好了，至於broadcasting之前用expand也還可以接受

可以很方便的寫WGAN-GP了

新的索引也很nice，像numpy靠攏才是程序員友好的，靜態計算圖簡直反程序員。

broadcast！pytorch終於完美支持我的機器學習技能樹啦

broadcast很重要

（0.2版還是沒有autobatch，這點dynet還是不知道高到哪裡去了