Pytorch用GPU到底能比CPU快多少?

給躊躇於要不要買GPU的朋友們做一點微小的貢獻:

同一段腳本,同樣的數據量,同樣的神經網路配置,用cpu和gpu分別計算,看看分別用了多長時間。

嫌麻煩的同學可以提前看結論:

同樣的錢,買GPU得到的計算能力,是CPU的15倍。

15倍。

神經網路配置:

5層hidden layer,每層500個nodes,一共500個epochs,做一次實驗;

5層hidden layer,每層1000個nodes,一共1000個epochs,再做一次實驗;

CPU信息:

4 Intel(R) Core(TM) i5-6600K CPU @ 3.50GHz

4核的inteli5-6600K處理器,主頻3.50GHz。目前市場價大約250刀。

如圖所示,確實4個cpu核心都用上了,都在幹活兒。

GPU信息:

NVIDIA GeForce GTX 1070 8GB

一塊1070的GPU。我用的是的1070,550刀; 假如是業界挖礦明星1080Ti,應該會更快,1080Ti目前大約950刀。我買1070的原因是便宜。根據下圖userbenchmark網站的統計結果,1080Ti的速度比1070高56%,但價格高了近一倍,所以我覺得1080Ti不太划算,最終買的1070。(不要被下圖的價格誤導了,那個價格是三個月內的全網最低價,實際上是買不到的,我說的價格才是市場平均價。)

結果:

對於500nodes,500epoches的case:

CPU用時2分30秒;

GPU用時4秒;

GPU速度是CPU的37倍;

考慮到GPU速度太快,作為樣本不太好,裡面不確定因素多,所以我們再做一次實驗。

對於1000nodes,1000epoches的case:

CPU用時11分18秒;

GPU用時21秒;

GPU速度是CPU的32倍;

可以算出大致時間相差32-37倍。

比較價格,

CPU250刀;

GPU550刀;

計算性價比:

32×250/550=14.5

37×250/550=16.8

結論:

對於3.50GHz的CPU和8G的GPU,兩者的速度差大約在32-37倍;

性價比上,同樣的錢買GPU和買CPU,在做神經網路的時候,速度上大約有14.5~16.8倍的差距。

對比其他人的研究:

「GPUS ARE ONLY UP TO 14 TIMES FASTER THAN CPUS」 SAYS INTEL | The Official NVIDIA Blog?

blogs.nvidia.com

nvidia官網引用interl的研究,表示有14倍的差距; 我們的計算結果相差不大。

附件:

腳本:

import torchimport torch.nn as nnimport torch.nn.functional as Fimport numpy as npimport matplotlib.pyplot as pltfrom torch.autograd import Variableimport time# print start timeprint "Start time = "+time.ctime()# read datainp = np.loadtxt("input" , dtype=np.float32)oup = np.loadtxt("output", dtype=np.float32)#inp = inp*[4,100,1,4,0.04,1]oup = oup*500inp = inp.astype(np.float32)oup = oup.astype(np.float32)# Hyper Parametersinput_size = inp.shape[1]hidden_size = 1000output_size = 1num_epochs = 1000learning_rate = 0.001# Toy Datasetx_train = inpy_train = oup# Linear Regression Modelclass Net(nn.Module): def __init__(self, input_size, hidden_size, output_size): super(Net, self).__init__() #self.fc1 = nn.Linear(input_size, hidden_size) self.fc1 = nn.Linear(input_size, hidden_size) self.l1 = nn.ReLU() self.l2 = nn.Sigmoid() self.l3 = nn.Tanh() self.l4 = nn.ELU() self.l5 = nn.Hardshrink() self.ln = nn.Linear(hidden_size, hidden_size) self.fc2 = nn.Linear(hidden_size, output_size) def forward(self, x): out = self.fc1(x) out = self.l3(out) out = self.ln(out) out = self.l1(out) out = self.fc2(out) return outmodel = Net(input_size, hidden_size, output_size)# Loss and Optimizercriterion = nn.MSELoss()optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate) ###### GPUif torch.cuda.is_available(): print "We are using GPU now!!!" model = model.cuda()# Train the Model for epoch in range(num_epochs): # Convert numpy array to torch Variable if torch.cuda.is_available(): inputs = Variable(torch.from_numpy(x_train).cuda()) targets = Variable(torch.from_numpy(y_train).cuda()) else: inputs = Variable(torch.from_numpy(x_train)) targets = Variable(torch.from_numpy(y_train)) # Forward + Backward + Optimize optimizer.zero_grad() outputs = model(inputs) loss = criterion(outputs, targets) loss.backward() optimizer.step() if (epoch+1) % 5 == 0: print (Epoch [%d/%d], Loss: %.4f %(epoch+1, num_epochs, loss.data[0]))# print end timeprint "End time = "+time.ctime()# Plot the graphif torch.cuda.is_available(): predicted = model(Variable(torch.from_numpy(x_train).cuda())).data.cpu().numpy()else: predicted = model(Variable(torch.from_numpy(x_train))).data.numpy()plt.plot( y_train/500, r-, label=Original data)plt.plot( predicted/500,-, label=Fitted line)#plt.plot(y_train/500, predicted/500,., label=Fitted line)plt.legend()plt.show()# Save the Modeltorch.save(model.state_dict(), model.pkl)

推薦閱讀:

TAG:PyTorch | Torch深度學習框架 | 機器學習 |