多線程效率測試

05-04

對於不同的任務，多線程對運行效率的影響是不同的，所以這裡首先要說明IO密集型任務和計算密集型任務的區別

IO指input output，IO密集型任務包括文件讀寫（磁碟IO）、網頁請求（網路IO）等。這類任務計算量較小，有比較多的等待時間，用多線程可以較大提高運行效率
計算密集型任務是指CPU計算佔據主要運行時間，這類任務使用python多線程無法提高效率

下面分為CPU密集型任務、文件讀寫、網路請求三部分測試多線程對程序運行效率的改進程度

首先導入模塊，並定義基本運行函數

import requestsfrom bs4 import BeautifulSoupimport timefrom threading import Threadimport numpy as np# 計算從1加到5000000def cal(a = None): # 參數i沒用，只是為了和後面統一 s = 0 for i in range(5000000): s = s + i# 500000次寫入文件def file(a = None): # 參數i沒用，只是為了和後面統一 with open(try.txt, w) as f: for i in range(500000): f.write(abcd )# 抓取豆瓣top250的10個網頁def gettitle(a): url = https://movie.douban.com/top250?start={}&filter=.format(a*25) r = requests.get(url) soup = BeautifulSoup(r.content, html.parser) lis = soup.find(ol, class_=grid_view).find_all(li) for li in lis: title = li.find(span, class_="title").text# 分別將上面三個函數傳入，計算10次，返回不使用多線程的運行總時間def no_thread(func): t = time.time() for i in range(10): func(i) duration = time.time() - t return duration# 分別將上面三個函數傳入，計算10次，返回使用多線程的運行總時間def thread(func): t = time.time() ths = [] for i in range(10): th = Thread(target = func, args = (i, )) th.start() ths.append(th) for th in ths: th.join() duration = time.time() - t return duration

每一項做5次試驗，返回每次時間和5次平均值

def get_duration(func_th, func): l = [] for _ in range(5): l.append(func_th(func)) mean_duration = %.2f % np.mean(l) all_duration = [%.2f % i for i in l] return mean_duration, all_duration

調用函數計算

# CPU密集任務對比get_duration(no_thread, cal)# (5.67, [6.72, 5.54, 5.14, 5.15, 5.79])get_duration(thread, cal)# (5.38, [6.17, 6.09, 5.29, 4.70, 4.65])# 文件讀寫任務對比get_duration(no_thread, file)# (6.26, [6.24, 5.96, 6.47, 6.07, 6.55])get_duration(thread, file)# (5.82, [6.27, 6.12, 5.46, 5.35, 5.90])# 網路請求任務對比get_duration(no_thread, gettitle)# (4.01, [4.06, 3.82, 4.43, 4.00, 3.74])get_duration(thread, gettitle)# (1.19, [1.07, 1.23, 1.33, 0.93, 1.39])

從上面的結果我們可以看出

多線程對網路請求任務效率改進非常明顯
對文件讀寫任務有少量改進
對計算密集型任務幾乎沒有改進

專欄信息

專欄主頁：python編程

版本說明：軟體及包版本說明