PyTorch初探遷移學習

01-23

前言：

什麼是遷移學習（Transfer Learning）？簡單的理解就是使用一些已經訓練好的模型遷移到類似的新的問題進行使用，而不必對新問題重新建模，從頭訓練和優化參數。這些訓練好的模型同時包含了優化好的參數，在使用的時候只需要做一些簡單的調整就可以應用到新問題中了。

本文需要解決的問題使用了遷移過來的VGG16模型，本文最終會得到一個能對貓狗圖片進行辨識的CNN（卷積神經網路），測試集用來驗證我的模型是否能夠很好的工作。

使用PyTorch搭建遷移學習模型：

VGG是由K. Simonyan和A. Zisserman 在論文《Very Deep Convolutional Networks for Large-Scale Image Recognition》中創建的一種CNN（卷積神經網路）模型。該模型在 ImageNet：ImageNet（對百萬級圖片進行分類的比賽）挑戰中取得過輝煌戰績。

VGG16模型的結構，如下圖：

VGG16模型結構

從圖中可以看出，模型包括了多個卷積層、池化層、全連接層，作為輸入的是一個224*224*3的圖片（224*224位解析度，3為RGB3個通道），輸出是包含1000個分類的結果（本文只是做兩個分類的應用，所以需要對最後一層進行改寫）。使用PyTorch下載模型和參數很方便，使用方法如下：

from torchvision import modelsmodel = models.vgg16(pretrained=True)

pretrained設置為True，程序會自動下載已經訓練好的參數。

本為使用遷移學習實現貓狗圖片的分類，數據集自來自Kaggle的一個比賽：Dogs vs. Cats Redux: Kernels Edition。

首先做圖片的導入和預覽，代碼如下：

path = "dog_vs_cat"transform = transforms.Compose([transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize([0.5,0.5,0.5], [0.5,0.5,0.5])])data_image = {x:datasets.ImageFolder(root = os.path.join(path,x), transform = transform) for x in ["train", "val"]}data_loader_image = {x:torch.utils.data.DataLoader(dataset=data_image[x], batch_size = 4, shuffle = True) for x in ["train", "val"]}

因為輸入的圖片需要解析度為224*224，所以使用transforms.CenterCrop(224)對原始圖片進行裁剪。載入的圖片訓練集合為20000和驗證集合為5000（原始圖片全部為訓練集合，需要自己拆分出一部分驗證集合），輸出的Label，1代表是狗，0代表的貓。

X_train, y_train = next(iter(data_loader_image["train"]))mean = [0.5,0.5,0.5]std = [0.5,0.5,0.5]img = torchvision.utils.make_grid(X_train)img = img.numpy().transpose((1,2,0))img = img*std+meanprint([classes[i] for i in y_train])plt.imshow(img)

["cat", "dog", "cat", "dog"]

預覽貓狗圖片

上圖可以看出來，將要訓練圖片都是224*224*3。

遷移模型然後列印出模型的結構：

model = models.vgg16(pretrained=True)print(model)VGG ( (features): Sequential ( (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU (inplace) (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (3): ReLU (inplace) (4): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1)) (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (6): ReLU (inplace) (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (8): ReLU (inplace) (9): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1)) (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (11): ReLU (inplace) (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (13): ReLU (inplace) (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (15): ReLU (inplace) (16): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1)) (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (18): ReLU (inplace) (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (20): ReLU (inplace) (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (22): ReLU (inplace) (23): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1)) (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (25): ReLU (inplace) (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (27): ReLU (inplace) (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (29): ReLU (inplace) (30): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1)) ) (classifier): Sequential ( (0): Linear (25088 -> 4096) (1): ReLU (inplace) (2): Dropout (p = 0.5) (3): Linear (4096 -> 4096) (4): ReLU (inplace) (5): Dropout (p = 0.5) (6): Linear (4096 -> 1000) ))

可以看出模型的結構和最開始展示的VGG16圖片結構是一樣的，只是這裡還包含了模型每層中實際需要傳遞的參數，想要遷移過來的VGG16模型適應新的需求，達到對貓狗圖片很好的識別，需要改寫VGG16的全連接層的最後一部分並且重新訓練參數（即使只是訓練整個全連接層的全部參數，普通的電腦也會花費大量的時間，所以這裡只訓練全連接層的最後一層），就能達到很好的效果了：

for parma in model.parameters(): parma.requires_grad = Falsemodel.classifier = torch.nn.Sequential(torch.nn.Linear(25088, 4096), torch.nn.ReLU(), torch.nn.Dropout(p=0.5), torch.nn.Linear(4096, 4096), torch.nn.ReLU(), torch.nn.Dropout(p=0.5), torch.nn.Linear(4096, 2))for index, parma in enumerate(model.classifier.parameters()): if index == 6: parma.requires_grad = True if use_gpu: model = model.cuda()cost = torch.nn.CrossEntropyLoss()optimizer = torch.optim.Adam(model.classifier.parameters())

parma.requires_grid = False目的是凍結參數，即使發生新的訓練也不會進行參數的更新。

這裡還對全連接層的最後一層進行了改寫，torch.nn.Linear(4096, 2)使得最後輸出的結果只有兩個（只需要對貓狗進行分辨就可以了）。

optimizer = torch.optim.Adam(model.classifier.parameters())只對全連接層參數進行更新優化，loss計算依然使用交叉熵。

對改寫後的模型進行查看：

VGG ( (features): Sequential ( (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU (inplace) (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (3): ReLU (inplace) (4): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1)) (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (6): ReLU (inplace) (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (8): ReLU (inplace) (9): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1)) (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (11): ReLU (inplace) (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (13): ReLU (inplace) (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (15): ReLU (inplace) (16): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1)) (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (18): ReLU (inplace) (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (20): ReLU (inplace) (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (22): ReLU (inplace) (23): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1)) (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (25): ReLU (inplace) (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (27): ReLU (inplace) (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (29): ReLU (inplace) (30): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1)) ) (classifier): Sequential ( (0): Linear (25088 -> 4096) (1): ReLU () (2): Dropout (p = 0.5) (3): Linear (4096 -> 4096) (4): ReLU () (5): Dropout (p = 0.5) (6): Linear (4096 -> 2) ))

然後進行1次訓練，查看訓練結果：

Epoch0/1----------Batch 500, Train Loss:0.8073, Train ACC:88.4500Batch 1000, Train Loss:1.0141, Train ACC:89.9500Batch 1500, Train Loss:0.8976, Train ACC:91.2333Batch 2000, Train Loss:0.8154, Train ACC:91.9500Batch 2500, Train Loss:0.7552, Train ACC:92.3500Batch 3000, Train Loss:0.6801, Train ACC:92.8083Batch 3500, Train Loss:0.6457, Train ACC:93.0500Batch 4000, Train Loss:0.6467, Train ACC:93.1875Batch 4500, Train Loss:0.6263, Train ACC:93.3722Batch 5000, Train Loss:0.5983, Train ACC:93.4950train Loss:0.5983, Correct93.4950val Loss:0.4096, Correct95.8400Training time is:32m 11s

看到訓練的Loss為0.5983， Accuraty準確率為93.495%。驗證集的Loss為0.4096，Accuraty準確率為95.84%。因為只是一次訓練（訓練一次需要花費32分鐘），更加多次的訓練可能會得到一個更加好的結果。

隨機輸入測試集合產看預測結果：

Pred Label: ["dog", "cat", "cat", "dog", "dog", "cat", "cat", "dog", "cat", "cat", "cat", "cat", "cat", "cat", "cat", "dog"]

預測結果沒有出現錯誤，但是還有進一步改進所的空間（本文輸入時採用了隨機裁剪，如果對原始圖片進行縮放可能也會提升模型的預測準確率，還有增加訓練次數和數據增強處理）。

完整代碼鏈接：JaimeTang/PyTorch-and-TransferLearning

小結：

遷移學習的方法有快速解決同類問題的優點，類似問題不用再從頭到尾對模型參數進行優化和訓練。複雜模型的參數訓練優化可能需要數周的時間，所以這個思路大大節約了時間成本。如果對模型訓練結果不理想，還可以凍結更少分層次，訓練更多的層次，而不是盲目的一開始便從頭訓練。也許正是這些優點也決定了遷移學習在實際中得到廣泛應用的原因。