OpenCV讀視頻性能測試:python和C++ API比較
[重要更新]
2017年12月23日,opencv 3.3.1.11 發布。首次在Linux平台上帶來了讀視頻的功能,詳情參考Gemfield專欄文章:
opencv-python 3.3.1.11:Linux平台cv2讀視頻時代到來
基於pip下載的opencv_python包中的opencv庫為靜態編譯,同本文所使用的動態編譯opencv庫相比,讀視頻的速度獲得了成倍提升!
背景
在使用Caffe庫對視頻進行特徵分析時,一般的做法就是逐幀(或者每n幀取1幀),然後將這一幀圖像送給Caffe的Net去做分類或者目標檢測。項目中使用Python代碼較多,從python代碼實施的角度來講,取幀的方式一般就是ffmpeg和OpenCV。Gemfield在項目中選用的就是python的openCV模塊: import cv2。
備註:以下測試基於的是x99硬體平台,OpenCV基於opencv讀視頻時的編碼問題 進行編譯。
問題
在該項目中,我們讀取幀的產生的性能問題讓Gemfield深感憂慮。具體來說,就是在針對一個視頻(mp4容器、h264編碼)讀幀(read Frame)的操作時,平均下來,每幀要耗費大約0.139秒的時間(Intel Xeon CPU E5-2620 v3 @ 2.40GHz,16GB RAM)。而Caffe Model的Net分析一幀的時間也才0.1秒左右...
上述讀幀的python代碼(gemfield_seek.py)如下:
#!/usr/bin/env pythonimport timeimport cv2import systime_start_dict = {}time_cost_dict = {}time_times = 0def logTime(op, loc=B): global time_times if loc == B: if op == Whole: time_times += 1 if op not in time_cost_dict: time_cost_dict[op] = 0 time_start_dict[op] = time.time() elif loc == A: cost = time.time() - time_start_dict[op] time_cost_dict[op] += cost average = time_cost_dict[op] / time_times print({}: {} - {}.format(op, cost, average)) print(-----------------------) else: raise Exception(Unsupported loc: {}.format(loc))def cvPt(local_filename): video_cap = cv2.VideoCapture(str(local_filename)) fps = int(video_cap.get( cv2.CAP_PROP_FPS )) frame_cnt = int(video_cap.get( cv2.CAP_PROP_FRAME_COUNT )) sample_frames = range(0,frame_cnt) ERROR_FRAMES = 10; # create image directory if not exist for i, frame_index in enumerate(sample_frames): is_print = False logTime(Whole) logTime(setFrame) video_cap.set(cv2.CAP_PROP_POS_FRAMES, frame_index) logTime(setFrame,A) logTime(readFrame) status, frame = video_cap.read() logTime(readFrame,A) if not status: ERROR_FRAMES -= 1 print([CVWARING] fetch frame {} failed, total frames is {}.format(frame_index, frame_cnt)) if ERROR_FRAMES == 0: raise Exception( fetch frame {} failed, total frames is {}.format(frame_index, frame_cnt) ) continue video_cap.release()if __name__ == __main__: abs_video_f = /home/gemfield/test/test222.mp4 if len(sys.argv) == 2: abs_video_f = sys.argv[1] cvPt(abs_video_f)
輸出結果如下:
gemfield@scene:/home/gemfield/test/opencv# python3 gemfield_seek.py ../test/test222.mp4 ......(剛開始的幀)setFrame: 0.051221370697021484 - 0.04456290602684021-----------------------readFrame: 0.008270025253295898 - 0.012707263231277466-----------------------setFrame: 0.0626213550567627 - 0.04562516773448271-----------------------readFrame: 0.01091456413269043 - 0.012601810343125287......(讀了一會兒之後的數據)setFrame: 0.0783698558807373 - 0.13035893788972755-----------------------readFrame: 0.009439706802368164 - 0.009669762560732095-----------------------setFrame: 0.08253788948059082 - 0.13032215246787437-----------------------readFrame: 0.00872659683227539 - 0.009669037048633282
可以看到2點信息,一是代碼中使用OpenCV seek某一幀的時間約為0.13秒,而讀這一幀的時間約為0.01秒。二是,seek frame的時間在剛開始是比較快的,大概讀個100幀左右,速度就緩慢下降並穩定在了0.13秒。所以大部分的損耗竟然發生在seek frame上。
猜測
現在的猜測出現了2個分支:一是seek再read的方式和按視頻流順序讀的方式之間的差別;二是python API和C++ API的方式之間的區別,會不會是python API的performance天然的就差呢?如果換成C++的API,那麼情況會好轉嗎?
那就動手比較這4種情況吧。
1,Python API:Seek + Read
參考上面的[問題]
2,Python API:按順序讀
代碼如下(gemfield_read.py):
#!/usr/bin/env pythonimport osimport timeimport sysimport cv2time_start_dict = {}time_cost_dict = {}time_times = 0def logTime(op, loc=B): global time_times if loc == B: if op == Whole: time_times += 1 if op not in time_cost_dict: time_cost_dict[op] = 0 time_start_dict[op] = time.time() elif loc == A: cost = time.time() - time_start_dict[op] time_cost_dict[op] += cost average = time_cost_dict[op] / time_times print({} | {}: {} - {}.format(time_times, op, cost, average)) else: raise Exception(Unsupported loc: {}.format(loc))def cvPt(local_filename): video_cap = cv2.VideoCapture(str(local_filename)) fps = int(video_cap.get( cv2.CAP_PROP_FPS )) frame_cnt = int(video_cap.get( cv2.CAP_PROP_FRAME_COUNT )) success = True while(success): logTime(Whole) logTime(readFrame) success, frame = video_cap.read() logTime(readFrame, A) video_cap.release()if __name__ == __main__: abs_video_f = /home/gemfield/test/test222.mp4 if len(sys.argv) == 2: abs_video_f = sys.argv[1] cvPt(abs_video_f)
程序執行輸出如下:
gemfield@scene:/home/gemfield/test# python3 gemfield_read.py ../test/test222.mp41 | readFrame: 0.053015947341918945 - 0.0530159473419189452 | readFrame: 0.008069515228271484 - 0.0305427312850952153 | readFrame: 0.0064580440521240234 - 0.0225145022074381524 | readFrame: 0.006206035614013672 - 0.018437385559082035 | readFrame: 0.006482124328613281 - 0.016046333312988286 | readFrame: 0.009226799011230469 - 0.0149097442626953127 | readFrame: 0.005541324615478516 - 0.0135713985988071988 | readFrame: 0.005561351776123047 - 0.012570142745971689 | readFrame: 0.005191326141357422 - 0.01175027423434787210 | readFrame: 0.005190134048461914 - 0.011094260215759277......2748 | readFrame: 0.0060291290283203125 - 0.0060675581469806522749 | readFrame: 0.006089687347412109 - 0.0060675661968898672750 | readFrame: 0.005376338958740234 - 0.00606731484153053962751 | readFrame: 0.0051801204681396484 - 0.00606699234266707552752 | readFrame: 0.0056912899017333984 - 0.0060668558228847592753 | readFrame: 0.009527206420898438 - 0.0060681127609879242754 | readFrame: 0.007679939270019531 - 0.0060686980284203982755 | readFrame: 0.005699634552001953 - 0.0060685640670859452756 | readFrame: 0.0061647891998291016 - 0.0060685989818656052757 | readFrame: 0.0057332515716552734 - 0.0060684773469689022758 | readFrame: 0.005670785903930664 - 0.0060683331513767922759 | readFrame: 0.0054531097412109375 - 0.006068110163551433......3710 | readFrame: 0.007841825485229492 - 0.0059051424345237553711 | readFrame: 0.007093906402587891 - 0.0059054627697347673712 | readFrame: 0.0071887969970703125 - 0.005905808495550323713 | readFrame: 0.006438016891479492 - 0.0059059518320426253714 | readFrame: 0.005589008331298828 - 0.0059058664945356943715 | readFrame: 0.005753993988037109 - 0.0059058256136456533716 | readFrame: 0.005844593048095703 - 0.0059058091355601993717 | readFrame: 0.005541324615478516 - 0.0059057110767708313718 | readFrame: 0.005610942840576172 - 0.0059056317953732533719 | readFrame: 0.0050907135009765625 - 0.0059054126724115973720 | readFrame: 0.005651712417602539 - 0.0059053444734183695
可以看到,按照讀文件流的方式,每幀的讀取時間約為0.006秒。
3,C++ API:Seek + Read
代碼如下(gemfield_seek.cpp):
#include <opencv2/core/core.hpp>#include <opencv2/highgui/highgui.hpp>#include <iostream>#include <chrono>#include <ctime>using namespace cv;using namespace std;int main(int argc, char** argv){ string filename = "/home/gemfield/test/test222.mp4"; if(argc == 2){ filename = argv[1]; } VideoCapture capture(filename); Mat frame; if( !capture.isOpened() ) throw "Error when reading mp4"; std::chrono::time_point<std::chrono::system_clock> start, middle, end; int frame_index = 0; float total_time = 0; for (; ; ) { frame_index ++; start = std::chrono::system_clock::now(); /* set pointer to frame index i */ capture.set(CV_CAP_PROP_POS_FRAMES, frame_index); middle = std::chrono::system_clock::now(); /* capture the frame and do sth with it */ capture >> frame; if(frame.empty()){ std::cout<<"Finished."<<std::endl; break; } end = std::chrono::system_clock::now(); std::chrono::duration<double> seconds_total = end - start; std::chrono::duration<double> seconds_seek = middle - start; std::chrono::duration<double> seconds_read = end - middle; total_time += seconds_total.count(); std::cout <<"Index: "<< frame_index << " | seek: " << seconds_seek.count() << "s" << " | read: " << seconds_read.count() << "s"<< " | total: " << seconds_total.count() << "s"<<" | aver: "<<total_time/frame_index<<std::endl; }}
編譯:
g++ -std=c++11 gemfield_seek.cpp -lopencv_core -lopencv_videoio -o gemfield_seek
運行:
gemfield@scene:/home/gemfield/test# ./gemfield_seek ../test/test222.mp4Index: 1 | seek: 0.0995976s | read: 0.019335s | total: 0.118933s | aver: 0.118933Index: 2 | seek: 0.0330427s | read: 0.00800924s | total: 0.0410519s | aver: 0.0799923Index: 3 | seek: 0.0464318s | read: 0.00794659s | total: 0.0543784s | aver: 0.0714543Index: 4 | seek: 0.0431533s | read: 0.0111978s | total: 0.0543512s | aver: 0.0671785Index: 5 | seek: 0.0481438s | read: 0.00774463s | total: 0.0558885s | aver: 0.0649205Index: 6 | seek: 0.0494663s | read: 0.0102679s | total: 0.0597342s | aver: 0.0640561Index: 7 | seek: 0.0495316s | read: 0.00924286s | total: 0.0587744s | aver: 0.0633016Index: 8 | seek: 0.0511648s | read: 0.00957251s | total: 0.0607373s | aver: 0.0629811Index: 9 | seek: 0.0507133s | read: 0.00756003s | total: 0.0582734s | aver: 0.062458Index: 10 | seek: 0.0383379s | read: 0.00898075s | total: 0.0473187s | aver: 0.0609441Index: 11 | seek: 0.0623336s | read: 0.0105759s | total: 0.0729096s | aver: 0.0620318......Index: 650 | seek: 0.153053s | read: 0.00679515s | total: 0.159848s | aver: 0.138271Index: 651 | seek: 0.13733s | read: 0.0124941s | total: 0.149824s | aver: 0.138289Index: 652 | seek: 0.139874s | read: 0.00714733s | total: 0.147022s | aver: 0.138302Index: 653 | seek: 0.155352s | read: 0.0135669s | total: 0.168919s | aver: 0.138349Index: 654 | seek: 0.151458s | read: 0.00796778s | total: 0.159426s | aver: 0.138381Index: 655 | seek: 0.13896s | read: 0.0128386s | total: 0.151799s | aver: 0.138402Index: 656 | seek: 0.142042s | read: 0.00912114s | total: 0.151163s | aver: 0.138421Index: 657 | seek: 0.151733s | read: 0.00909946s | total: 0.160833s | aver: 0.138455Index: 658 | seek: 0.152458s | read: 0.00739748s | total: 0.159856s | aver: 0.138488Index: 659 | seek: 0.152881s | read: 0.0110096s | total: 0.163891s | aver: 0.138526Index: 660 | seek: 0.151401s | read: 0.00618463s | total: 0.157586s | aver: 0.13855......Index: 1260 | seek: 0.135438s | read: 0.00870509s | total: 0.144144s | aver: 0.139064Index: 1261 | seek: 0.150076s | read: 0.0107371s | total: 0.160813s | aver: 0.139081Index: 1262 | seek: 0.136285s | read: 0.00718818s | total: 0.143473s | aver: 0.139084Index: 1263 | seek: 0.168572s | read: 0.00631354s | total: 0.174885s | aver: 0.139113Index: 1264 | seek: 0.165383s | read: 0.00813947s | total: 0.173522s | aver: 0.13914Index: 1265 | seek: 0.156136s | read: 0.0109173s | total: 0.167054s | aver: 0.139162Index: 1266 | seek: 0.175051s | read: 0.00858275s | total: 0.183634s | aver: 0.139197Index: 1267 | seek: 0.163603s | read: 0.00805248s | total: 0.171655s | aver: 0.139223
可以看到,seek花的時間也很長,和python相差無幾。
4,C++ API:按順序讀
代碼如下(gemfield_read.cpp):
#include <opencv2/core/core.hpp>#include <opencv2/highgui/highgui.hpp>#include <iostream>#include <chrono>#include <ctime>using namespace cv;using namespace std;int main(int argc, char** argv){ string filename = "/home/gemfield/test/test222.mp4"; if(argc == 2){ filename = argv[1]; } VideoCapture capture(filename); Mat frame; std::chrono::time_point<std::chrono::system_clock> start, end; if( !capture.isOpened() ) throw "Error when reading mp4"; int frame_index = 0; float total_time = 0; for( ; ;){ start = std::chrono::system_clock::now(); frame_index ++; capture >> frame; if(frame.empty()) break; end = std::chrono::system_clock::now(); std::chrono::duration<double> seconds_total = end - start; total_time += seconds_total.count(); //std::time_t start_time = std::chrono::system_clock::to_time_t(start); //std::time_t end_time = std::chrono::system_clock::to_time_t(end); //std::cout <<"Index: "<< frame_index << " | Finished at " << std::ctime(&end_time) << " | Start at "<<std::ctime(&start_time) << " | elapsed time: " << elapsed_seconds.count()<< std::endl; std::cout <<"Index: "<< frame_index << " | readFrame: " << seconds_total.count() << "s"<<" | average: "<<total_time/frame_index<<std::endl; } // releases and window destroy are automatic in C++ interface}
編譯:
g++ -std=c++11 gemfield_read.cpp -lopencv_core -lopencv_videoio -o gemfield_read
運行:
gemfield@scene:/home/gemfield/test# ./gemfield_read ../test/test222.mp4Index: 1 | readFrame: 0.0663819s | average: 0.066382Index: 2 | readFrame: 0.00535328s | average: 0.0358676Index: 3 | readFrame: 0.00565319s | average: 0.0257961Index: 4 | readFrame: 0.00526596s | average: 0.0206636Index: 5 | readFrame: 0.00532343s | average: 0.0175956Index: 6 | readFrame: 0.0084277s | average: 0.0160676Index: 7 | readFrame: 0.00700944s | average: 0.0147736Index: 8 | readFrame: 0.0078517s | average: 0.0139083Index: 9 | readFrame: 0.00649891s | average: 0.0130851......Index: 9130 | readFrame: 0.00512566s | average: 0.00578302Index: 9131 | readFrame: 0.00461669s | average: 0.00578289Index: 9132 | readFrame: 0.00452525s | average: 0.00578276Index: 9133 | readFrame: 0.00463961s | average: 0.00578263Index: 9134 | readFrame: 0.00446446s | average: 0.00578249Index: 9135 | readFrame: 0.00693344s | average: 0.00578261Index: 9136 | readFrame: 0.00604796s | average: 0.00578264Index: 9137 | readFrame: 0.00522689s | average: 0.00578258Index: 9138 | readFrame: 0.00446979s | average: 0.00578244Index: 9139 | readFrame: 0.00439828s | average: 0.00578229
可以看到,讀幀的時間比python的稍微快一點,但也只是快一點點。
總結
所以性能方面來講,問題並不是出在是選用OpenCV的python API還是C++ API上,而是,為什麼seek的模式會比按順序讀取慢這麼多呢(數量級的差別)?這是因為,現代的視頻編碼技術為了得到更高的壓縮比,使得一個幀會對其前後的幀有依賴關係。這就使得seek+read的操作方式並不像數組那樣對起始地址加上一個offset,而是引入了大量的計算。這一點一定要清楚的意識到。
推薦閱讀:
※黃哥Python轉載「Python』s super() considered super!」
※從零開始寫Python爬蟲 --- 2.1 Scrapy 爬蟲框架的安裝與基本介紹
※如何自動獲取電影文件夾的清單?
※Flask源碼閱讀筆記(四)
※如何挑選你的第一門編程語言