旅行銷售員的演化：基於Python實現遺傳演算法

08-21

來自專欄論智17 人贊了文章

作者：Eric Stoltz

編譯：weakish

借鑒自然選擇的靈感，遺傳演算法（GA）是一種迷人的求解搜索和優化問題方法。儘管已經有很多關於GA的文章，這些文章很少演示如何一步一步地使用Python實現GA演算法，解決一個比較複雜的問題。所以我寫了這篇教程！讀完這篇教程後，你將對如何從頭實現GA演算法有一個完整的了解。

導言

問題

這篇教程將使用GA求解旅行推銷員問題（TSP）：

給定一系列城市和每對城市之間的距離，求解訪問每一座城市一次並回到起始城市的最短路由。

TSP示意圖（作者：Xypron；許可：公有領域）

記住兩條重要的規則：

每座城市僅訪問一次（不重不漏）
必須返回起始城市

方法

讓我們從一些定義開始：

基因（gene）： 一座城市（以(x, y)坐標表示）
個體（individual）或染色體（chromosome）： 滿足以上條件的一個路由
種群（population）： 一組可能的路由（即，一組個體）
父母（parents）： 兩個路由相組合，創建一個新路由
交配池（mating pool）： 一組父母，用來創建下一代種群（即創建下一代路由）
適應度（fitness）： 一個告訴我們每個路由有多好的函數（在我們的例子中，也就是距離多短）
突變（mutation）： 在種群中引入變化的方法，隨機交換路由中的兩個城市
精英選擇（Elitism）： 將最好的個體保留至下一代

GA演算法的進行過程如下：

創建種群
確定適應度
選擇交配池
繁殖
變異
重複

現在，讓我們來動手實現。

創建我們的遺傳演算法

導入一些機器學習的標準包：

import numpy as npimport randomimport operatorimport pandas as pdimport matplotlib.pyplot as plt

創建城市和適應度類

我們首先創建City（城市）類，城市不過是(x, y)坐標。City類中包含distance函數（根據勾股定理計算一對城市間的距離）。

class City: def __init__(self, x, y): self.x = x self.y = y def distance(self, city): xDis = abs(self.x - city.x) yDis = abs(self.y - city.y) distance = np.sqrt((xDis ** 2) + (yDis ** 2)) return distance def __repr__(self): return "(" + str(self.x) + "," + str(self.y) + ")"

在Fitness（適應度）類中，我們將適應度（routeFitness）定義為路由距離（routeDistance）的倒數。我們想要最小化路由距離，因此較大的適應度較優。注意routeDistance的定義考慮到了我們需要回到起點。

class Fitness: def __init__(self, route): self.route = route self.distance = 0 self.fitness= 0.0 def routeDistance(self): if self.distance ==0: pathDistance = 0 for i in range(0, len(self.route)): fromCity = self.route[i] toCity = None if i + 1 < len(self.route): toCity = self.route[i + 1] else: toCity = self.route[0] pathDistance += fromCity.distance(toCity) self.distance = pathDistance return self.distance def routeFitness(self): if self.fitness == 0: self.fitness = 1 / float(self.routeDistance()) return self.fitness

創建種群

現在我們將生產初始種群（第一代）。我們需要一個能夠生成滿足條件的路由的函數（我們將在教程末尾創建城市列表）。我們通過隨機選擇訪問城市的順序創建一個個體：

def createRoute(cityList): route = random.sample(cityList, len(cityList)) return route

重複運行createRoute函數，直到我們有了足夠的個體：

def initialPopulation(popSize, cityList): population = [] for i in range(0, popSize): population.append(createRoute(cityList)) return population

確定適應度

接下來到了有意思的演化部分。為了模擬「適者生存」，我們利用Fitness排序種群中的每個個體。我們的輸出是一個有序列表，包括路由ID和相應的適應度分數。

def rankRoutes(population): fitnessResults = {} for i in range(0,len(population)): fitnessResults[i] = Fitness(population[i]).routeFitness() return sorted(fitnessResults.items(), key = operator.itemgetter(1), reverse = True)

選擇交配池

選擇用來產生下一代的父母有若干種方法。其中最常見的兩種是適應度比例選擇（fitness proportionate selection）和錦標賽選擇（tournament selection）。適應度比例選擇又稱為輪盤賭選擇（roulette wheel selection）。

適應度比例選擇 （下面實現的版本）：基於每個個體相對於種群的適應度分配選中的概率。可以看成適應度加權選擇概率。
錦標賽選擇：從種群中隨機選擇一些個體，其中適應度最高的被選為父母的一方。重複這一過程選出父母的另一方。

另外也可以考慮精英選擇。種群中表現最優的個體自動進入下一代，確保留存最成功的個體。

為清晰計，我們將分兩步（selection、matingPool）創建交配池。在selection函數中，我們將根據rankRoutes的輸出判定選擇哪些路由。我們為每個個體計算相對適應度權重（創建輪盤），然後將這些權重與一個隨機數比較，以選擇交配池。同時，我們打算保留最佳的路由，因此引入了精英選擇。最後，selection函數返迴路由ID的列表，供matingPool函數使用。

def selection(popRanked, eliteSize): selectionResults = [] df = pd.DataFrame(np.array(popRanked), columns=["Index","Fitness"]) df[cum_sum] = df.Fitness.cumsum() df[cum_perc] = 100*df.cum_sum/df.Fitness.sum() for i in range(0, eliteSize): selectionResults.append(popRanked[i][0]) for i in range(0, len(popRanked) - eliteSize): pick = 100*random.random() for i in range(0, len(popRanked)): if pick <= df.iat[i,3]: selectionResults.append(popRanked[i][0]) break return selectionResults

有了selection提供的路由ID，我們可以創建交配池了。

def matingPool(population, selectionResults): matingpool = [] for i in range(0, len(selectionResults)): index = selectionResults[i] matingpool.append(population[index]) return matingpool

繁殖

創建好了交配池後，我們可以通過交叉（crossover）過程生產下一代。例如，假設個體是由0和1組成的字元串，我們可以直接選定一個交叉點，鉸接兩個字元串以產生後代。

然而，TSP要求每個位置出現且僅出現一次。為了遵循這條規則，我們將使用一種特殊的繁殖函數有序交叉（ordered crossover）。在有序交叉中，我們隨機選擇父字元串的一個子集，然後用母字元串中的基因填充路由的剩餘空位。填充時，按照基因在母字元串中出現的次序依次填充，跳過選中的父字元串子集中已有的基因。

有序交叉示意圖（來源：Lee Jacobson）

def breed(parent1, parent2): child = [] childP1 = [] childP2 = [] geneA = int(random.random() * len(parent1)) geneB = int(random.random() * len(parent1)) startGene = min(geneA, geneB) endGene = max(geneA, geneB) for i in range(startGene, endGene): childP1.append(parent1[i]) childP2 = [item for item in parent2 if item not in childP1] child = childP1 + childP2 return child

在整個交配池上進行交叉操作（注意我們附加了精英選擇）：

def breedPopulation(matingpool, eliteSize): children = [] length = len(matingpool) - eliteSize pool = random.sample(matingpool, len(matingpool)) for i in range(0,eliteSize): children.append(matingpool[i]) for i in range(0, length): child = breed(pool[i], pool[len(matingpool)-i-1]) children.append(child) return children

變異

變異在GA中起到了重要作用，它通過引入新路由讓我們得以探索解答空間的其他部分，避免收斂到局部最優值。和交叉類似，TSP問題中的變異需要特殊考慮。同樣，如果我們的染色體由0和1組成，變異不過是給基因分配一個較低的由0變為1（或由1變0）的概率。

然而，因為我們需要遵守TSP的規則，我們無法丟棄城市。我們將轉而使用交換變異（swap mutation）。這意味著，我們指定一個較低的概率，兩座城市會在我們的路由中交換位置。

def mutate(individual, mutationRate): for swapped in range(len(individual)): if(random.random() < mutationRate): swapWith = int(random.random() * len(individual)) city1 = individual[swapped] city2 = individual[swapWith] individual[swapped] = city2 individual[swapWith] = city1 return individual

在整個種群上應用mutate函數：

def mutatePopulation(population, mutationRate): mutatedPop = [] for ind in range(0, len(population)): mutatedInd = mutate(population[ind], mutationRate) mutatedPop.append(mutatedInd) return mutatedPop

重複

基本上差不多了。現在讓我們把以上部分整合起來，創建一個產生下一代的函數。首先，我們計算當前種群的適應度。接著，我們選擇潛在的父母，以創建交配池。最後，我們生產（交叉、變異）下一代。

def nextGeneration(currentGen, eliteSize, mutationRate): popRanked = rankRoutes(currentGen) selectionResults = selection(popRanked, eliteSize) matingpool = matingPool(currentGen, selectionResults) children = breedPopulation(matingpool, eliteSize) nextGeneration = mutatePopulation(children, mutationRate) return nextGeneration

演化

我們已經具備了GA所需的一切！現在我們只需創建一個初始種群，然後就可以不斷演化了。按照我們設定的代數演化後，將返回最佳路由。另外，我們將列印初始的路由距離，最終的路由距離，來看看我們提升了多少。

def geneticAlgorithm(population, popSize, eliteSize, mutationRate, generations): pop = initialPopulation(popSize, population) print("Initial distance: " + str(1 / rankRoutes(pop)[0][1])) for i in range(0, generations): pop = nextGeneration(pop, eliteSize, mutationRate) print("Final distance: " + str(1 / rankRoutes(pop)[0][1])) bestRouteIndex = rankRoutes(pop)[0][0] bestRoute = pop[bestRouteIndex] return bestRoute

運行演化演算法

萬事俱備，只欠東風。我們現在只需要創建經過的城市列表。這裡，我們將隨機創建25座城市來演示GA演算法：

cityList = []for i in range(0,25): cityList.append(City(x=int(random.random() * 200), y=int(random.random() * 200)))

只需一行代碼即可運行演化演算法。接下來藝術和科學的交匯之處：判斷哪些假定效果最好。這裡，我們將選擇如下參數：每代100個個體，保留20個精英個體，給定基因的變異率為1%，運行500代：

geneticAlgorithm(population=cityList, popSize=100, eliteSize=20, mutationRate=0.01, generations=500)

附加題: 繪製趨勢

我們希望能查看距離隨時間縮短的趨勢，因此對geneticAlgorithm函數進行了一項小小的改動，保存每代的最短距離，並據此繪製圖形。

def geneticAlgorithmPlot(population, popSize, eliteSize, mutationRate, generations): pop = initialPopulation(popSize, population) progress = [] progress.append(1 / rankRoutes(pop)[0][1]) for i in range(0, generations): pop = nextGeneration(pop, eliteSize, mutationRate) progress.append(1 / rankRoutes(pop)[0][1]) plt.plot(progress) plt.ylabel(Distance) plt.xlabel(Generation) plt.show()

結語

我希望這樣親手學習如何創建自己的GA演算法是一個有意思的過程。請自行嘗試，看看你可以得到多短的路由。或者進一步嘗試在其他問題上實現GA；想像你將如何修改bred和mutate函數來處理其他類型的染色體。我們這裡只涉及一些皮毛！

本文配套的Jupyter Notebook見GitHub：ezstoltz/genetic-algorithm/genetic_algorithm_TSP.ipynb

參考鏈接

http://www.theprojectspot.com/tutorial-post/applying-a-genetic-algorithm-to-the-travelling-salesman-problem/5
https://gist.github.com/turbofart/3428880
https://gist.github.com/NicolleLouis/d4f88d5bd566298d4279bcb69934f51d
https://en.wikipedia.org/wiki/Travelling_salesman_problem