Notes on Implementing Data Structures and Algorithms with Python(三)
Notes on Implementing Data Structures and Algorithms with Python
筆記的內容形式:簡要介紹以及代碼實現,分三部分(有交叉):
第一部分是數據結構和與它們相關的常見問題。內容順序:線性結構(棧,堆,鏈表)、樹、圖(遍歷和最短路徑)。 第二部分是一些重要思想和演算法。內容順序:遞歸、分治、貪心、動態規劃、查找、排序。 第三部分是刷題筆記。主要參考資料:
https://interactivepython.org/courselib/static/pythonds/index.html
http://javayhu.me/python/
Python Algorithms: Mastering Basic Algorithms in the Python Language by Magnus Lie Hetland.
筆記原先是寫在jupyter notebook,導出md格式後在知乎導入。全部更完之後附上ipynb文件,文章增加目錄索引,食用效果更佳。
計劃:(一)線性數據結構;(二)樹數據機構;(三)圖數據結構;(四)演算法設計;(五)查找和排序;(六)其他演算法和刷題;
本篇為筆記的第(三)篇,對圖的數據結構和在python中的表示法,圖的遍歷和幾個相關問題,以及求解拓撲排序和連通分量的演算法的筆記。
Graph
鄰接矩陣(adjacent matrix)
適合稠密圖
鄰接表(adjacent list)
適合稀疏圖
ADT:
- Graph() creates a new, empty graph.
- addVertex(vert) adds an instance of Vertex to the graph.
- addEdge(fromVert, toVert) Adds a new, directed edge to the graph that connects two vertices.
- addEdge(fromVert, toVert, weight) Adds a new, weighted, directed edge to the graph that connects two vertices.
- getVertex(vertKey) finds the vertex in the graph named vertKey.
- getVertices() returns the list of all vertices in the graph.
- in returns True for a statement of the form vertex in graph, if the given vertex is in the graph, False otherwise.
class Vertex: def __init__(self,key): self.id = key self.connectedTo = {} def addNeighbor(self,nbr,weight=0): self.connectedTo[nbr] = weight def __str__(self): return str(self.id) + connectedTo: + str([x.id for x in self.connectedTo]) def getConnections(self): return self.connectedTo.keys() def getId(self): return self.id def getWeight(self,nbr): return self.connectedTo[nbr]class Graph: def __init__(self): self.vertList = {} self.numVertices = 0 def addVertex(self,key): self.numVertices = self.numVertices + 1 newVertex = Vertex(key) self.vertList[key] = newVertex return newVertex def getVertex(self,n): if n in self.vertList: return self.vertList[n] else: return None def __contains__(self,n): return n in self.vertList def addEdge(self,f,t,cost=0): if f not in self.vertList: nv = self.addVertex(f) if t not in self.vertList: nv = self.addVertex(t) self.vertList[f].addNeighbor(self.vertList[t], cost) def getVertices(self): return self.vertList.keys() def __iter__(self): return iter(self.vertList.values())g = Graph()for i in range(6): g.addVertex(i)g.addEdge(0,1,5)g.addEdge(0,5,2)g.addEdge(1,2,4)g.addEdge(2,3,9)g.addEdge(3,4,7)g.addEdge(3,5,3)g.addEdge(4,0,1)g.addEdge(5,2,1)for v in g: for w in v.getConnections(): print("( %s, %s)" %(v.getId(), w.getId()))( 0, 1)( 0, 5)( 1, 2)( 2, 3)( 3, 4)( 3, 5)( 4, 0)( 5, 2)
出於方便,python中可以用內置的數據結構實現樹和圖。
鄰接表:
# A Straightforward Adjacency List Representationa, b, c, d, e, f, g, h = range(8)N = [ [b, c, d, e, f], # a [c, e], # b [d], # c [e], # d [f], # e [c, g, h], # f [f, h], # g [f, g] # h]print(b in N[a]) # Neighborhood membership -> Trueprint(len(N[f])) # Degree -> 3True3
除了用list,也可以用set, dict等,使用不同的數據結構在插入,判斷連接關係,計算節點的度等操作時的方法和性能有所區別。比如稠密圖更適合用set,它判斷連接關係(元素是否存在)是O(1)。用dict可以表示帶權圖。
# A Straightforward Adjacency Set Representationa, b, c, d, e, f, g, h = range(8)N = [ {b, c, d, e, f}, # a {c, e}, # b {d}, # c {e}, # d {f}, # e {c, g, h}, # f {f, h}, # g {f, g} # h]# A Straightforward Adjacency Dict Representationa, b, c, d, e, f, g, h = range(8)N = [ {b:2, c:1, d:3, e:9, f:4}, # a {c:4, e:3}, # b {d:8}, # c {e:7}, # d {f:5}, # e {c:2, g:2, h:2}, # f {f:1, h:6}, # g {f:9, g:8} # h]
鄰接矩陣:
# An Adjacency Matrix, Implemented with Nested Listsa, b, c, d, e, f, g, h = range(8)N = [[0,1,1,1,1,1,0,0], # a [0,0,1,0,1,0,0,0], # b [0,0,0,1,0,0,0,0], # c [0,0,0,0,1,0,0,0], # d [0,0,0,0,0,1,0,0], # e [0,0,1,0,0,0,1,1], # f [0,0,0,0,0,1,0,1], # g [0,0,0,0,0,1,1,0]] # hprint(N[a][b]) # Neighborhood membership -> 1print(sum(N[f])) # Degree -> 3_ = float(inf)W = [[0,2,1,3,9,4,_,_], # a [_,0,4,_,3,_,_,_], # b [_,_,0,8,_,_,_,_], # c [_,_,_,0,7,_,_,_], # d [_,_,_,_,0,5,_,_], # e [_,_,2,_,_,0,2,2], # f [_,_,_,_,_,1,0,6], # g [_,_,_,_,_,9,8,0]] # hprint(W[a][b] < float(inf)) # Neighborhood membershipprint(sum(1 for w in W[a] if w < float(inf)) - 1) # Degree13True5
圖的兩種遍歷方式
深度優先搜索DFS: 遞歸版; 迭代版:輔助S(visited list),Q(plan queue) while Q. O(V+E)
廣度優先搜索BFS: use deque. 輔助P(dict(visited : parents)) or S(visited set),Q(plan deque). while Q:(O(V)), sum of the for loop:O(E), thus O(V+E).def recu_dfs(G, s, S=None): if S is None: S = set() S.add(s) for v in G[s]: if v in S: continue recu_dfs(G, v, S) return Sdef iter_dfs(G, s): Q, S = [s], list() while Q: u = Q.pop() if u in S: continue Q.extend(G[u]) S.append(u) return Sfrom collections import dequedef bfs(G, s): Q, S = deque([s]), set() # to-do queue, visited set while Q: u = Q.popleft() for v in G[u]: if v in S: continue S.add(v) Q.append(v) return Sprint(list(recu_dfs(N, 0))) print(list(iter_dfs(N, 0)))[0, 1, 2, 3, 4, 5, 6, 7][0, 5, 7, 6, 2, 3, 4, 1]
下面是兩個分別運用BFS和DFS的問題。此外,拓撲排序和尋找連通分量都基於DFS,Dijsktra最短路徑演算法則基於BFS。
The World Ladder Problem
(等長單詞變化,一次只能改變一個字母,並且改變前後的單詞要在詞典里。給定源單詞和目標單詞,求最少改變次數)
轉化成兩個問題:1.如果把詞典的單詞用圖表示出來; 2.求解給定兩個單詞之間的最短路徑。
對於問題1,如果兩兩掃描單詞看是否相差一個字母然後連接,時間複雜度是O(n^2)。有一個更快的辦法是先把每個單詞(長度為n)遮去一個字母分成n個待匹配項,然後掃描一遍全部單詞,把它歸入它所屬的那n項中,最後每一項里的單詞都互相連接即可。
對於問題2,最短路徑可以用權重都為1的dijsktra演算法,最優O(ElogV)。這裡問題是單源單終點的無權圖最短路,可以簡單的用BFS(修改增加endword參數表示目標單詞,level變數用於計數)的辦法,O(n)。
FOOL -> SAGE:
def buildWordGraph(wordList): d = {} G = Graph() # create buckets of words that differ by one letter for word in wordList: for i in range(len(word)): bucket = word[:i] + _ + word[i+1:] if bucket in d: d[bucket].append(word) else: d[bucket] = [word] # add vertices and edges for words in the same bucket for bucket in d: for word1 in d[bucket]: for word2 in d[bucket]: if word1 != word2: G.addEdge(word1, word2) return Gdef ladder_bfs(G, startword, endword): Q, S = deque([s,0]), set() # to-do queue, visited set while Q: u, level = Q.popleft() level += 1 for v in G[u]: if v == endword: return level if v in S: continue S.add(v) Q.append((v,level)) return level
The Knights Tour Problem
(騎士巡遊問題,日字型走法,由任何一個位置出發,要求走完所有位置並且每個位置只能走一遍(一條哈密爾頓路徑)。)
類似上題,需要把騎士能走的地方(節點)和路(邊)用圖表示出來,然後遍歷這個圖,這裡需要用到DFS——相比上面給出的通常的DFS可能會創建多顆樹,這裡需要做一些修改,使得它只創建最深的一組樹,沒有任何分支,如此「深度」探索圖形返回的才是「一條」哈密爾頓路徑。做法是當進入死胡同走不下去時,需要進行回溯,由於DFS迭代進行,這容易做到。def knightGraph(bdSize): # makes one pass over the entire board ktGraph = Graph() for row in range(bdSize): for col in range(bdSize): nodeId = posToNodeId(row,col,bdSize) newPositions = genLegalMoves(row,col,bdSize) for e in newPositions: nid = posToNodeId(e[0],e[1],bdSize) ktGraph.addEdge(nodeId,nid) return ktGraphdef posToNodeId(row, column, board_size): # converts a location on the board in terms of a row and a column... return (row * board_size) + column # ...into a linear vertex numberdef genLegalMoves(x,y,bdSize): # create a list of legal moves for that position on the board. newMoves = [] moveOffsets = [(-1,-2),(-1,2),(-2,-1),(-2,1), # eight directions to go ( 1,-2),( 1,2),( 2,-1),( 2,1)] for i in moveOffsets: newX = x + i[0] newY = y + i[1] if legalCoord(newX,bdSize) and legalCoord(newY,bdSize): newMoves.append((newX,newY)) return newMovesdef legalCoord(x,bdSize): if x >= 0 and x < bdSize: return True else: return Falsedef tour_dfs(G, depth, limit, s, S=None): # s: node we want to explore; depth: current search depth; : if S is None: # limit: the number of nodes in path; S: visited vertices, path. S = [] S.append(s) if depth < limit: done = False for v in G[s]: if v in S: continue done = recu_dfs(G, depth+1, limit, v, S) if not done: S.pop() # search fail, backtracking else: done = True return done
上面的演算法是O(n^k)級的,k是一個由n決定的常數(eg.n=8時,k=5.25)。分析其原因,是因為演算法對於選擇哪一個作為下一個訪問的節點非常敏感。可以用一種啟發式的方法來大大加速這個演算法:選擇具有最少可訪問節點的節點作為當前節點的下一個訪問節點,也就是優先選下圖中靠近邊角的節點。這個啟發式規則被稱為Warnsdorff』s algorithm。
def orderByAvail(n): resList = [] for v in n.getConnections(): if v.getColor() == white: c = 0 for w in v.getConnections(): if w.getColor() == white: c = c + 1 resList.append((c,v)) resList.sort(key=lambda x: x[0]) return [y[1] for y in resList]def tour_accelerate_dfs(G, depth, limit, s, S=None): # s: node we want to explore; depth: current search depth; : if S is None: # limit: the number of nodes in path; S: visited vertices, path. S = [] S.append(s) if depth < limit: done = False v = orderByAvail(s) if v not in S: done = recu_dfs(G, depth+1, limit, v, S) if not done: S.pop() else: done = True return done
拓撲排序
DAG經常用於表示一些有不同優先順序事物的處理過程。拓撲排序(topological sort)為有向無環圖(DAG)的節點按照邏輯順序進行排序。
思考:如果找到一個節點, 去掉它之後剩下的仍然是DAG. 這樣的節點應該是入度為0的. 演算法流程:計算各節點入度,採取類似選擇排序的策略:入度為零的放在未排列序列的最前面。每次排序完一個節點,剩餘未排序的節點更新入度。 step1:首先計算每個節點的入度;入度為零的存到輔助棧Q; step2:基於DFS的方法,當Q不空: \節點出棧, append到sorted list S; \然後被出棧的節點的鄰點的入度數減1,如果出現了新的入度為0的節點則入棧Q; \繼續這個過程, 直到所有節點都被刪掉了(棧Q空)。上面這個過程是選擇節點後,插入到未排序序列的第一位(即已排序的最後一位)。
也可以選擇節點,插入到未排序的最後一位:計算未排序的節點的出度數,出度為0的插入到已排序的前面。(上一種好一些,避免了insert的O(n)。)def topsort(G): count, Q, S= dict((u, 0) for u in G), [], [] # count incoming,Q(plan list),S(sorted list). for u in G: for v in G[u]: count[v] += 1 Q.extend([u for u in G if count[u] == 0]) while Q: u = Q.pop() S.append(u) for v in G[u]: count[v] -= 1 if count[v] == 0: Q.append(v) return SG = {a:set(bf), b:set(cdf), c: set(d), d: set(ef), e: set(f), f: set()}print(topsort(G))[a, b, c, d, e, f]
連通分量:
圖的一個最大子圖,在這個子圖中任何兩個節點之間都是相互可達的。不考慮邊的方向。連通圖的連通分量是連通圖本身。非連通圖可能有多個連通分量。
尋找連通分量的演算法:從一個頂點開始,沿著它的邊找到其他的節點,然後就是不斷地向已有的連通分量中添加節點,使得連通分量內部依然滿足連通性質。仍然是DFS。 遍歷函數walk,其中參數S暫時沒有用,它在後面求強連通分量時需要,表示的是一個「禁區」(forbidden zone),也就是不要去訪問這些節點。只適用於無向圖,而且只能找到一個從參數s出發的連通分量,要想得到全部的連通分量需要修改下。時間複雜度是O(E+V)。def walk(G, s, S=set()): P, Q = dict({s:None}), {s} while Q: u = Q.pop() for v in G[u].difference(P, S): Q.add(v) P[v] = u return Pdef components(G): seen, comp = set(), [] for u in G: if u in seen: continue P = walk(G, u) seen.update(P) comp.append(P) return compG = { 0: set([1, 2]), 1: set([0, 2]), 2: set([0, 1]), 3: set([4, 5]), 4: set([3, 5]), 5: set([3, 4]) }print([list(sorted(C)) for C in components(G)])[[0, 1, 2], [3, 4, 5]]
強連通分量
強連通分量即有向圖的連通分量。演算法的流程有下面四步:
1.對原圖G運行DFS,得到每個節點的完成時間f[v]; 2.得到原圖的轉置圖GT; 3.對GT運行DFS,主循環按照節點的f[v]降序進行訪問; 4.輸出深度優先森林中的每棵樹,也就是一個強連通分支。def tr(G): # Transpose (rev. edges of) G GT = {} for u in G: GT[u] = set() # Get all the nodes in there for u in G: for v in G[u]: GT[v].add(u) # Add all reverse edges return GTdef scc(G): GT = tr(G) # Get the transposed graph sccs, seen = [], set() for u in topsort(G): # DFS starting points if u in seen: continue # Ignore covered nodes C = walk(GT, u, seen) # Dont go "backward" (seen) seen.update(C) # Weve now seen C sccs.append(C) # Another SCC found return sccsfrom string import ascii_lowercasedef parse_graph(s): # print(list(zip(ascii_lowercase, s.split("/")))) # [(a, bc), (b, die), (c, d), (d, ah), (e, f), (f, g), (g, eh), (h, i), (i, h)] G = {} for u, line in zip(ascii_lowercase, s.split("/")): G[u] = set(line) return GG = {a:set(bf), b:set(cdf), c: set(d), d: set(ef), e: set(f), f: set()}print(G)print(list(map(list, scc(G))))#[[a, c, b, d], [e, g, f], [i, h]]{a: {b, f}, b: {c, d, f}, c: {d}, d: {e, f}, e: {f}, f: set()}[[a], [b], [c], [d], [e], [f]]
圖論的最小生成樹問題(貪心法),DAG的單源最短路徑問題(動態規劃),更一般的最短路徑演算法如Bellman-Ford演算法(遍歷),Dijkstra演算法(貪心),Johnson演算法(Bellman-Ford + Dijkstra),Floyd-Warshall演算法(動態規劃),記錄在筆記的第二部分。
Summary
In this chapter we have looked at the graph abstract data type, and some implementations of a graph. A graph enables us to solve many problems provided we can transform the original problem into something that can be represented by a graph. In particular, we have seen that graphs are useful to solve problems in the following general areas.
- Breadth first search for finding the unweighted shortest path.
- Depth first search for graph exploration.
- Strongly connected components for simplifying a graph.
- Topological sort for ordering tasks.
- Minimum weight spanning trees for broadcasting messages.(later in Part II)
- Dijkstra』s algorithm for weighted shortest path.(later in Part II)
推薦閱讀:
※什麼是Python Descriptors
※隊列和棧
※第八章 對象引用、可變性
※簡歷中如何證明自己的編程能力?
※Python GUI教程(十):創建一個複雜的GUI