從零開始PyTorch項目：YOLO v3目標檢測實現（下）

05-23

來自專欄機器之心

選自Medium，作者：Ayoosh Kathuria，機器之心編譯。

前幾日，機器之心編譯介紹了《從零開始 PyTorch 項目：YOLO v3 目標檢測實現》的前 3 部分，介紹了 YOLO 的工作原理、創建 YOLO 網路層級和實現網路的前向傳播的方法。本文包含了該教程的後面兩個部分，將介紹「置信度閾值設置和非極大值抑制」以及「設計輸入和輸出流程」的方法。總體而言，本教程的目的是使用 PyTorch 實現基於 YOLO v3 的目標檢測器，後者是一種快速的目標檢測演算法。

本教程使用的代碼需要運行在 Python 3.5 和 PyTorch 0.3 版本之上。你可以在以下鏈接中找到所有代碼：https://github.com/ayooshkathuria/YOLO_v3_tutorial_from_scratch

所需背景知識

1. 本教程 1-3 部分

2. 了解 PyTorch 基本工作方式，包括使用 nn.Module、nn.Sequential 和 torch.nn.parameter 類創建自定義架構的方式

3. NumPy 基本知識

4. OpenCV 基本知識

如果你缺少這些預備知識，可參閱文末擴展閱讀部分了解。

置信度閾值設置和非極大值抑制

在前面 3 部分中，我們已經構建了一個能為給定輸入圖像輸出多個目標檢測結果的模型。具體來說，我們的輸出是一個形狀為 B x 10647 x 85 的張量；其中 B 是指一批（batch）中圖像的數量，10647 是每個圖像中所預測的邊界框的數量，85 是指邊界框屬性的數量。

但是，正如第 1 部分所述，我們必須使我們的輸出滿足 objectness 分數閾值和非極大值抑制（NMS），以得到後文所說的「真實（true）」檢測結果。要做到這一點，我們將在 util.py 文件中創建一個名為 write_results 的函數。

def write_results(prediction, confidence, num_classes, nms_conf = 0.4):

該函數的輸入為預測結果、置信度（objectness 分數閾值）、num_classes（我們這裡是 80）和 nms_conf（NMS IoU 閾值）。

目標置信度閾值

我們的預測張量包含有關 B x 10647 邊界框的信息。對於有低於一個閾值的 objectness 分數的每個邊界框，我們將其每個屬性的值（表示該邊界框的一整行）都設為零。

conf_mask = (prediction[:,:,4] > confidence).float().unsqueeze(2) prediction = prediction*conf_mask

執行非極大值抑制

註：我假設你已經理解 IoU（Intersection over union）和非極大值抑制（Non-maximum suppression）的含義了。如果你還不理解，請參閱文末提供的鏈接。

我們現在擁有的邊界框屬性是由中心坐標以及邊界框的高度和寬度決定的。但是，使用每個框的兩個對角坐標能更輕鬆地計算兩個框的 IoU。所以，我們可以將我們的框的 (中心 x, 中心 y, 高度, 寬度) 屬性轉換成 (左上角 x, 左上角 y, 右下角 x, 右下角 y)。

box_corner = prediction.new(prediction.shape) box_corner[:,:,0] = (prediction[:,:,0] - prediction[:,:,2]/2) box_corner[:,:,1] = (prediction[:,:,1] - prediction[:,:,3]/2) box_corner[:,:,2] = (prediction[:,:,0] + prediction[:,:,2]/2) box_corner[:,:,3] = (prediction[:,:,1] + prediction[:,:,3]/2) prediction[:,:,:4] = box_corner[:,:,:4]

每張圖像中的「真實」檢測結果的數量可能存在差異。比如，一個大小為 3 的 batch 中有 1、2、3 這 3 張圖像，它們各自有 5、2、4 個「真實」檢測結果。因此，一次只能完成一張圖像的置信度閾值設置和 NMS。也就是說，我們不能將所涉及的操作向量化，而且必須在預測的第一個維度（包含一個 batch 中圖像的索引）上循環。

batch_size = prediction.size(0) write = False for ind in range(batch_size): image_pred = prediction[ind] #image Tensor #confidence threshholding #NMS

如前所述，write 標籤是用於指示我們尚未初始化輸出，我們將使用一個張量來收集整個 batch 的「真實」檢測結果。

進入循環後，我們再更清楚地說明一下。注意每個邊界框行都有 85 個屬性，其中 80 個是類別分數。此時，我們只關心有最大值的類別分數。所以，我們移除了每一行的這 80 個類別分數，並且轉而增加了有最大值的類別的索引以及那一類別的類別分數。

max_conf, max_conf_score = torch.max(image_pred[:,5:5+ num_classes], 1) max_conf = max_conf.float().unsqueeze(1) max_conf_score = max_conf_score.float().unsqueeze(1) seq = (image_pred[:,:5], max_conf, max_conf_score) image_pred = torch.cat(seq, 1)

記得我們將 object 置信度小於閾值的邊界框行設為零了嗎？讓我們擺脫它們。

non_zero_ind = (torch.nonzero(image_pred[:,4])) try: image_pred_ = image_pred[non_zero_ind.squeeze(),:].view(-1,7) except: continue #For PyTorch 0.4 compatibility #Since the above code with not raise exception for no detection #as scalars are supported in PyTorch 0.4 if image_pred_.shape[0] == 0: continue

其中的 try-except 模塊的目的是處理無檢測結果的情況。在這種情況下，我們使用 continue 來跳過對本圖像的循環。

現在，讓我們獲取一張圖像中所檢測到的類別。

#Get the various classes detected in the image img_classes = unique(image_pred_[:,-1]) # -1 index holds the class index

因為同一類別可能會有多個「真實」檢測結果，所以我們使用一個名叫 unique 的函數來獲取任意給定圖像中存在的類別。

def unique(tensor): tensor_np = tensor.cpu().numpy() unique_np = np.unique(tensor_np) unique_tensor = torch.from_numpy(unique_np) tensor_res = tensor.new(unique_tensor.shape) tensor_res.copy_(unique_tensor) return tensor_res

然後，我們按照類別執行 NMS。

for cls in img_classes: #perform NMS

一旦我們進入循環，我們要做的第一件事就是提取特定類別（用變數 cls 表示）的檢測結果。

注意，以下代碼在原始代碼文件中有 3 格縮進，但因為頁面空間有限，這裡沒有縮進。

#get the detections with one particular classcls_mask = image_pred_*(image_pred_[:,-1] == cls).float().unsqueeze(1)class_mask_ind = torch.nonzero(cls_mask[:,-2]).squeeze()image_pred_class = image_pred_[class_mask_ind].view(-1,7)#sort the detections such that the entry with the maximum objectnesss#confidence is at the topconf_sort_index = torch.sort(image_pred_class[:,4], descending = True )[1]image_pred_class = image_pred_class[conf_sort_index]idx = image_pred_class.size(0) #Number of detections

現在，我們執行 NMS。

for i in range(idx): #Get the IOUs of all boxes that come after the one we are looking at #in the loop try: ious = bbox_iou(image_pred_class[i].unsqueeze(0), image_pred_class[i+1:]) except ValueError: break except IndexError: break #Zero out all the detections that have IoU > treshhold iou_mask = (ious < nms_conf).float().unsqueeze(1) image_pred_class[i+1:] *= iou_mask #Remove the non-zero entries non_zero_ind = torch.nonzero(image_pred_class[:,4]).squeeze() image_pred_class = image_pred_class[non_zero_ind].view(-1,7)

這裡，我們使用了函數 bbox_iou。第一個輸入是邊界框行，這是由循環中的變數 i 索引的。bbox_iou 的第二個輸入是多個邊界框行構成的張量。bbox_iou 函數的輸出是一個張量，其中包含通過第一個輸入代表的邊界框與第二個輸入中的每個邊界框的 IoU。

如果我們有 2 個同樣類別的邊界框且它們的 IoU 大於一個閾值，那麼就去掉其中類別置信度較低的那個。我們已經對邊界框進行了排序，其中有更高置信度的在上面。

在循環部分，下面的代碼給出了框的 IoU，其中通過 i 索引所有索引排序高於 i 的邊界框。

ious = bbox_iou(image_pred_class[i].unsqueeze(0), image_pred_class[i+1:])

每次迭代時，如果有邊界框的索引大於 i 且有大於閾值 nms_thresh 的 IoU（與索引為 i 的框），那麼就去掉那個特定的框。

#Zero out all the detections that have IoU > treshholdiou_mask = (ious < nms_conf).float().unsqueeze(1)image_pred_class[i+1:] *= iou_mask #Remove the non-zero entriesnon_zero_ind = torch.nonzero(image_pred_class[:,4]).squeeze()image_pred_class = image_pred_class[non_zero_ind]

還要注意，我們已經將用於計算 ious 的代碼放在了一個 try-catch 模塊中。這是因為這個循環在設計上是為了運行 idx 次迭代（image_pred_class 中的行數）。但是，當我們繼續循環時，一些邊界框可能會從 image_pred_class 移除。這意味著，即使只從 image_pred_class 中移除了一個值，我們也不能有 idx 次迭代。因此，我們可能會嘗試索引一個邊界之外的值（IndexError），片狀的 image_pred_class[i+1:] 可能會返回一個空張量，從而指定觸發 ValueError 的量。此時，我們可以確定 NMS 不能進一步移除邊界框，然後跳出循環。

計算 IoU

這裡是 bbox_iou 函數。

def bbox_iou(box1, box2): """ Returns the IoU of two bounding boxes """ #Get the coordinates of bounding boxes b1_x1, b1_y1, b1_x2, b1_y2 = box1[:,0], box1[:,1], box1[:,2], box1[:,3] b2_x1, b2_y1, b2_x2, b2_y2 = box2[:,0], box2[:,1], box2[:,2], box2[:,3] #get the corrdinates of the intersection rectangle inter_rect_x1 = torch.max(b1_x1, b2_x1) inter_rect_y1 = torch.max(b1_y1, b2_y1) inter_rect_x2 = torch.min(b1_x2, b2_x2) inter_rect_y2 = torch.min(b1_y2, b2_y2) #Intersection area inter_area = (inter_rect_x2 - inter_rect_x1 + 1)*(inter_rect_y2 - inter_rect_y1 + 1) #Union Area b1_area = (b1_x2 - b1_x1 + 1)*(b1_y2 - b1_y1 + 1) b2_area = (b2_x2 - b2_x1 + 1)*(b2_y2 - b2_y1 + 1) iou = inter_area / (b1_area + b2_area - inter_area) return iou

寫出預測

write_results 函數輸出一個形狀為 Dx8 的張量；其中 D 是所有圖像中的「真實」檢測結果，每個都用一行表示。每一個檢測結果都有 8 個屬性，即：該檢測結果所屬的 batch 中圖像的索引、4 個角的坐標、objectness 分數、有最大置信度的類別的分數、該類別的索引。

如之前一樣，我們沒有初始化我們的輸出張量，除非我們有要分配給它的檢測結果。一旦其被初始化，我們就將後續的檢測結果與它連接起來。我們使用 write 標籤來表示張量是否初始化了。在類別上迭代的循環結束時，我們將所得到的檢測結果加入到張量輸出中。

batch_ind = image_pred_class.new(image_pred_class.size(0), 1).fill_(ind) #Repeat the batch_id for as many detections of the class cls in the image seq = batch_ind, image_pred_class if not write: output = torch.cat(seq,1) write = True else: out = torch.cat(seq,1) output = torch.cat((output,out))

在該函數結束時，我們會檢查輸出是否已被初始化。如果沒有，就意味著在該 batch 的任意圖像中都沒有單個檢測結果。在這種情況下，我們返回 0。

try: return output except: return 0

這部分就到此為止了。在這部分結束時，我們終於有了一個張量形式的預測結果，其中以行的形式列出了每個預測。現在還剩下：創造一個從磁碟讀取圖像的輸入流程，計算預測結果，在圖像上繪製邊界框，然後展示/寫入這些圖像。這是下一部分要介紹的內容。

設計輸入和輸出流程

在這一部分，我們將為我們的檢測器構建輸入和輸出流程。這涉及到從磁碟讀取圖像，做出預測，使用預測結果在圖像上繪製邊界框，然後將它們保存到磁碟上。我們也會介紹如何讓檢測器在相機饋送或視頻上實時工作。我們將引入一些命令行標籤，以便能使用該網路的各種超參數進行一些實驗。接下來就開始吧。

註：這部分需要安裝 OpenCV 3。

在我們的檢測器文件中創建一個 detector.py 文件，在上面導入必要的庫。

from __future__ import divisionimport timeimport torch import torch.nn as nnfrom torch.autograd import Variableimport numpy as npimport cv2 from util import *import argparseimport os import os.path as ospfrom darknet import Darknetimport pickle as pklimport pandas as pdimport random

創建命令行參數

因為 detector.py 是我們運行我們的檢測器的文件，所以有一些可以傳遞給它的命令行參數會很不錯，我使用了 Python 的 ArgParse 來做這件事。

def arg_parse(): """ Parse arguements to the detect module """ parser = argparse.ArgumentParser(description=YOLO v3 Detection Module) parser.add_argument("--images", dest = images, help = "Image / Directory containing images to perform detection upon", default = "imgs", type = str) parser.add_argument("--det", dest = det, help = "Image / Directory to store detections to", default = "det", type = str) parser.add_argument("--bs", dest = "bs", help = "Batch size", default = 1) parser.add_argument("--confidence", dest = "confidence", help = "Object Confidence to filter predictions", default = 0.5) parser.add_argument("--nms_thresh", dest = "nms_thresh", help = "NMS Threshhold", default = 0.4) parser.add_argument("--cfg", dest = cfgfile, help = "Config file", default = "cfg/yolov3.cfg", type = str) parser.add_argument("--weights", dest = weightsfile, help = "weightsfile", default = "yolov3.weights", type = str) parser.add_argument("--reso", dest = reso, help = "Input resolution of the network. Increase to increase accuracy. Decrease to increase speed", default = "416", type = str) return parser.parse_args()args = arg_parse()images = args.imagesbatch_size = int(args.bs)confidence = float(args.confidence)nms_thesh = float(args.nms_thresh)start = 0CUDA = torch.cuda.is_available()

在這些參數中，重要的標籤包括 images（用於指定輸入圖像或圖像目錄）、det（保存檢測結果的目錄）、reso（輸入圖像的解析度，可用於在速度與準確度之間的權衡）、cfg（替代配置文件）和 weightfile。

載入網路

從這裡下載 coco.names 文件：https://raw.githubusercontent.com/ayooshkathuria/YOLO_v3_tutorial_from_scratch/master/data/coco.names。這個文件包含了 COCO 數據集中目標的名稱。在你的檢測器目錄中創建一個文件夾 data。如果你使用的 Linux，你可以使用以下命令實現：

mkdir datacd datawget https://raw.githubusercontent.com/ayooshkathuria/YOLO_v3_tutorial_from_scratch/master/data/coco.names

然後，將類別文件載入到我們的程序中。

num_classes = 80 #For COCOclasses = load_classes("data/coco.names")

load_classes 是在 util.py 中定義的一個函數，其會返回一個字典——將每個類別的索引映射到其名稱的字元串。

def load_classes(namesfile): fp = open(namesfile, "r") names = fp.read().split(" ")[:-1] return names

初始化網路並載入權重。

#Set up the neural networkprint("Loading network.....")model = Darknet(args.cfgfile)model.load_weights(args.weightsfile)print("Network successfully loaded")model.net_info["height"] = args.resoinp_dim = int(model.net_info["height"])assert inp_dim % 32 == 0 assert inp_dim > 32#If theres a GPU availible, put the model on GPUif CUDA: model.cuda()#Set the model in evaluation modemodel.eval()

讀取輸入圖像

從磁碟讀取圖像或從目錄讀取多張圖像。圖像的路徑存儲在一個名為 imlist 的列表中。

read_dir = time.time()#Detection phasetry: imlist = [osp.join(osp.realpath(.), images, img) for img in os.listdir(images)]except NotADirectoryError: imlist = [] imlist.append(osp.join(osp.realpath(.), images))except FileNotFoundError: print ("No file or directory with the name {}".format(images)) exit()

read_dir 是一個用於測量時間的檢查點。（我們會遇到多個檢查點）

如果保存檢測結果的目錄（由 det 標籤定義）不存在，就創建一個。

if not os.path.exists(args.det): os.makedirs(args.det)

我們將使用 OpenCV 來載入圖像。

load_batch = time.time()loaded_ims = [cv2.imread(x) for x in imlist]

load_batch 又是一個檢查點。

OpenCV 會將圖像載入為 numpy 數組，顏色通道的順序為 BGR。PyTorch 的圖像輸入格式是（batch x 通道 x 高度 x 寬度），其通道順序為 RGB。因此，我們在 util.py 中寫了一個函數 prep_image 來將 numpy 數組轉換成 PyTorch 的輸入格式。

def prep_image(img, inp_dim): """ Prepare image for inputting to the neural network. Returns a Variable """ img = cv2.resize(img, (inp_dim, inp_dim)) img = img[:,:,::-1].transpose((2,0,1)).copy() img = torch.from_numpy(img).float().div(255.0).unsqueeze(0) return img

除了轉換後的圖像，我們也會維護一個原始圖像的列表，以及一個包含原始圖像的維度的列表 im_dim_list。

#PyTorch Variables for imagesim_batches = list(map(prep_image, loaded_ims, [inp_dim for x in range(len(imlist))]))#List containing dimensions of original imagesim_dim_list = [(x.shape[1], x.shape[0]) for x in loaded_ims]im_dim_list = torch.FloatTensor(im_dim_list).repeat(1,2)if CUDA: im_dim_list = im_dim_list.cuda()

創建 batch

leftover = 0if (len(im_dim_list) % batch_size): leftover = 1if batch_size != 1: num_batches = len(imlist) // batch_size + leftover im_batches = [torch.cat((im_batches[i*batch_size : min((i + 1)*batch_size, len(im_batches))])) for i in range(num_batches)]

檢測循環

我們在 batch 上迭代，生成預測結果，將我們必須執行檢測的所有圖像的預測張量（形狀為 Dx8，write_results 函數的輸出）連接起來。

對於每個 batch，我們都會測量檢測所用的時間，即測量獲取輸入到 write_results 函數得到輸出之間所用的時間。在 write_prediction 返回的輸出中，其中一個屬性是 batch 中圖像的索引。我們對這個特定屬性執行轉換，使其現在能代表 imlist 中圖像的索引，該列表包含了所有圖像的地址。

在那之後，我們 print 每個檢測結果所用的時間以及每張圖像中檢測到的目標。

如果 write_results 函數在 batch 上的輸出是一個 int 值（0），也就是說沒有檢測結果，那麼我們就繼續跳過循環的其餘部分。

write = 0start_det_loop = time.time()for i, batch in enumerate(im_batches): #load the image start = time.time() if CUDA: batch = batch.cuda() prediction = model(Variable(batch, volatile = True), CUDA) prediction = write_results(prediction, confidence, num_classes, nms_conf = nms_thesh) end = time.time() if type(prediction) == int: for im_num, image in enumerate(imlist[i*batch_size: min((i + 1)*batch_size, len(imlist))]): im_id = i*batch_size + im_num print("{0:20s} predicted in {1:6.3f} seconds".format(image.split("/")[-1], (end - start)/batch_size)) print("{0:20s} {1:s}".format("Objects Detected:", "")) print("----------------------------------------------------------") continue prediction[:,0] += i*batch_size #transform the atribute from index in batch to index in imlist if not write: #If we havet initialised output output = prediction write = 1 else: output = torch.cat((output,prediction)) for im_num, image in enumerate(imlist[i*batch_size: min((i + 1)*batch_size, len(imlist))]): im_id = i*batch_size + im_num objs = [classes[int(x[-1])] for x in output if int(x[0]) == im_id] print("{0:20s} predicted in {1:6.3f} seconds".format(image.split("/")[-1], (end - start)/batch_size)) print("{0:20s} {1:s}".format("Objects Detected:", " ".join(objs))) print("----------------------------------------------------------") if CUDA: torch.cuda.synchronize()

torch.cuda.synchronize 這一行是為了確保 CUDA 核與 CPU 同步。否則，一旦 GPU 工作排隊了並且 GPU 工作還遠未完成，那麼 CUDA 核就將控制返回給 CPU（非同步調用）。如果 end = time.time() 在 GPU 工作實際完成前就 print 了，那麼這可能會導致時間錯誤。

現在，所有圖像的檢測結果都在張量輸出中了。讓我們在圖像上繪製邊界框。

在圖像上繪製邊界框

我們使用一個 try-catch 模塊來檢查是否存在單個檢測結果。如果不存在，就退出程序。

try: outputexcept NameError: print ("No detections were made") exit()

在我們繪製邊界框之前，我們的輸出張量中包含的預測結果對應的是該網路的輸入大小，而不是圖像的原始大小。因此，在我們繪製邊界框之前，讓我們將每個邊界框的角屬性轉換到圖像的原始尺寸上。

output_recast = time.time()output[:,1:5] = torch.clamp(output[:,1:5], 0.0, float(inp_dim))im_dim_list = torch.index_select(im_dim_list, 0, output[:,0].long())/inp_dimoutput[:,1:5] *= im_dim_list

如果圖像中存在太多邊界框，那麼只用一種顏色來繪製可能不太適合觀看。將這個文件下載到你的檢測器文件夾中：https://github.com/ayooshkathuria/YOLO_v3_tutorial_from_scratch/raw/master/pallete。這是一個 pickle 文件，其中包含很多可以隨機選擇的顏色。

class_load = time.time()colors = pkl.load(open("pallete", "rb"))

現在，讓我們寫一個函數來繪製邊界框。

draw = time.time()def write(x, results, color): c1 = tuple(x[1:3].int()) c2 = tuple(x[3:5].int()) img = results[int(x[0])] cls = int(x[-1]) label = "{0}".format(classes[cls]) cv2.rectangle(img, c1, c2,color, 1) t_size = cv2.getTextSize(label, cv2.FONT_HERSHEY_PLAIN, 1 , 1)[0] c2 = c1[0] + t_size[0] + 3, c1[1] + t_size[1] + 4 cv2.rectangle(img, c1, c2,color, -1) cv2.putText(img, label, (c1[0], c1[1] + t_size[1] + 4), cv2.FONT_HERSHEY_PLAIN, 1, [225,255,255], 1); return img

上面的函數是使用從 colors 中隨機選出的顏色繪製一個矩形框。它也會在邊界框的左上角創建一個填充後的矩形，並且寫入在該框位置檢測到的目標的類別。cv2.rectangle 函數的 -1 參數用於創建填充的矩形。

我們局部定義 write 函數使其能夠獲取顏色列表。我們也可以將顏色作為一個參數包含進來，但這會讓我們只能在一張圖像上使用一種顏色，這有違我們的目的。

我們定義了這個函數之後，現在就來在圖像上畫邊界框吧。

list(map(lambda x: write(x, loaded_ims), output))

上面的代碼片段是原地修改 loaded_ims 之中的圖像。

每張圖像都以「det_」加上圖像名稱的方式保存。我們創建了一個地址列表，這是我們保存我們的檢測結果圖像的位置。

det_names = pd.Series(imlist).apply(lambda x: "{}/det_{}".format(args.det,x.split("/")[-1]))

最後，將帶有檢測結果的圖像寫入到 det_names 中的地址。

list(map(cv2.imwrite, det_names, loaded_ims))end = time.time()

顯示輸出時間總結

在檢測器工作結束時，我們會 print 一個總結，其中包含了哪部分代碼用了多少執行時間的信息。當我們必須比較不同的超參數對檢測器速度的影響方式時，這會很有用。batch 大小、objectness 置信度和 NMS 閾值等超參數（分別用 bs、confidence、nms_thresh 標籤傳遞）可以在命令行上執行 detection.py 腳本時設置。

print("SUMMARY")print("----------------------------------------------------------")print("{:25s}: {}".format("Task", "Time Taken (in seconds)"))print()print("{:25s}: {:2.3f}".format("Reading addresses", load_batch - read_dir))print("{:25s}: {:2.3f}".format("Loading batch", start_det_loop - load_batch))print("{:25s}: {:2.3f}".format("Detection (" + str(len(imlist)) + " images)", output_recast - start_det_loop))print("{:25s}: {:2.3f}".format("Output Processing", class_load - output_recast))print("{:25s}: {:2.3f}".format("Drawing Boxes", end - draw))print("{:25s}: {:2.3f}".format("Average time_per_img", (end - load_batch)/len(imlist)))print("----------------------------------------------------------")torch.cuda.empty_cache()

測試目標檢測器

比如，在終端上運行：

python detect.py --images dog-cycle-car.png --det det

得到輸出：

註：下面的結果是在 CPU 上運行代碼得到。在 GPU 上的預期檢測時間會快得多。在 Tesla K80 上大約為每張圖像 0.1 秒。

Loading network.....Network successfully loadeddog-cycle-car.png predicted in 2.456 secondsObjects Detected: bicycle truck dog----------------------------------------------------------SUMMARY----------------------------------------------------------Task : Time Taken (in seconds)Reading addresses : 0.002Loading batch : 0.120Detection (1 images) : 2.457Output Processing : 0.002Drawing Boxes : 0.076Average time_per_img : 2.657----------------------------------------------------------

在 det 目錄中保存的一張名為 det_dog-cycle-car.png 的圖像：

在視頻/網路攝像頭上運行檢測器

要在視頻或網路攝像頭上運行這個檢測器，代碼基本可以保持不變，只是我們不會在 batch 上迭代，而是在視頻的幀上迭代。

在視頻上運行該檢測器的代碼可以在我們的 GitHub 中的 video.py 文件中找到。這個代碼非常類似 detect.py 的代碼，只有幾處不太一樣。

首先，我們要用 OpenCV 打開視頻/相機流。

videofile = "video.avi" #or path to the video file. cap = cv2.VideoCapture(videofile) #cap = cv2.VideoCapture(0) for webcamassert cap.isOpened(), Cannot capture sourceframes = 0

然後，我們以在圖像上類似的迭代方式在幀上迭代。

因為我們不必再處理 batch，而是一次只處理一張圖像，所以很多地方的代碼都進行了簡化。因為一次只處理一幀。這包括使用一個元組替代 im_dim_list 的張量，然後對 write 函數進行一點小修改。

每次迭代，我們都會跟蹤名為 frames 的變數中幀的數量。然後我們用這個數字除以自第一幀以來過去的時間，得到視頻的幀率。

我們不再使用 cv2.imwrite 將檢測結果圖像寫入磁碟，而是使用 cv2.imshow 展示畫有邊界框的幀。如果用戶按 Q 按鈕，就會讓代碼中斷循環，並且視頻終止。

frames = 0 start = time.time()while cap.isOpened(): ret, frame = cap.read() if ret: img = prep_image(frame, inp_dim)# cv2.imshow("a", frame) im_dim = frame.shape[1], frame.shape[0] im_dim = torch.FloatTensor(im_dim).repeat(1,2) if CUDA: im_dim = im_dim.cuda() img = img.cuda() output = model(Variable(img, volatile = True), CUDA) output = write_results(output, confidence, num_classes, nms_conf = nms_thesh) if type(output) == int: frames += 1 print("FPS of the video is {:5.4f}".format( frames / (time.time() - start))) cv2.imshow("frame", frame) key = cv2.waitKey(1) if key & 0xFF == ord(q): break continue output[:,1:5] = torch.clamp(output[:,1:5], 0.0, float(inp_dim)) im_dim = im_dim.repeat(output.size(0), 1)/inp_dim output[:,1:5] *= im_dim classes = load_classes(data/coco.names) colors = pkl.load(open("pallete", "rb")) list(map(lambda x: write(x, frame), output)) cv2.imshow("frame", frame) key = cv2.waitKey(1) if key & 0xFF == ord(q): break frames += 1 print(time.time() - start) print("FPS of the video is {:5.2f}".format( frames / (time.time() - start))) else: break

總結

在這個系列教程中，我們從頭開始實現了一個目標檢測器。我還認為編寫高效的代碼是深度學習實踐者應該具備的但卻最被低估的技能。不管你的想法可能具有多大的革命性，如果你不能測試它，它就毫無用處。為此，你就需要很強的寫代碼能力。

我也認識到，學習深度學習的最佳方法是實現深度學習代碼。這能迫使你關注一個主題的細微但又基礎的部分——如果只讀論文，你可能會錯過這些地方。我希望這個系列教程能幫助你磨礪你的深度學習實踐技能。

擴展閱讀

PyTorch 教程：http://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html

吳恩達解釋 IoU：https://youtu.be/DNEm4fJ-rto

吳恩達解釋非極大值抑制：https://youtu.be/A46HZGR5fMw

OpenCV 基礎：https://pythonprogramming.net/loading-images-python-opencv-tutorial/

Python ArgParse：https://docs.python.org/3/library/argparse.html

原文鏈接：https://blog.paperspace.com/how-to-implement-a-yolo-v3-object-detector-from-scratch-in-pytorch-part-5/