Notes on Implementing Data Structures and Algorithms with Python(三)

Notes on Implementing Data Structures and Algorithms with Python

筆記的內容形式:簡要介紹以及代碼實現,分三部分(有交叉):

第一部分是數據結構和與它們相關的常見問題。內容順序:線性結構(棧,堆,鏈表)、樹、圖(遍歷和最短路徑)。

第二部分是一些重要思想和演算法。內容順序:遞歸、分治、貪心、動態規劃、查找、排序。

第三部分是刷題筆記。

主要參考資料:

interactivepython.org/c

javayhu.me/python/

Python Algorithms: Mastering Basic Algorithms in the Python Language by Magnus Lie Hetland.

筆記原先是寫在jupyter notebook,導出md格式後在知乎導入。全部更完之後附上ipynb文件,文章增加目錄索引,食用效果更佳。

計劃:(一)線性數據結構;(二)樹數據機構;(三)圖數據結構;(四)演算法設計;(五)查找和排序;(六)其他演算法和刷題

本篇為筆記的第(三)篇,對圖的數據結構和在python中的表示法,圖的遍歷和幾個相關問題,以及求解拓撲排序和連通分量的演算法的筆記。


Graph

鄰接矩陣(adjacent matrix)

適合稠密圖

鄰接表(adjacent list)

適合稀疏圖

ADT:

  • Graph() creates a new, empty graph.
  • addVertex(vert) adds an instance of Vertex to the graph.
  • addEdge(fromVert, toVert) Adds a new, directed edge to the graph that connects two vertices.
  • addEdge(fromVert, toVert, weight) Adds a new, weighted, directed edge to the graph that connects two vertices.
  • getVertex(vertKey) finds the vertex in the graph named vertKey.
  • getVertices() returns the list of all vertices in the graph.
  • in returns True for a statement of the form vertex in graph, if the given vertex is in the graph, False otherwise.

class Vertex: def __init__(self,key): self.id = key self.connectedTo = {} def addNeighbor(self,nbr,weight=0): self.connectedTo[nbr] = weight def __str__(self): return str(self.id) + connectedTo: + str([x.id for x in self.connectedTo]) def getConnections(self): return self.connectedTo.keys() def getId(self): return self.id def getWeight(self,nbr): return self.connectedTo[nbr]class Graph: def __init__(self): self.vertList = {} self.numVertices = 0 def addVertex(self,key): self.numVertices = self.numVertices + 1 newVertex = Vertex(key) self.vertList[key] = newVertex return newVertex def getVertex(self,n): if n in self.vertList: return self.vertList[n] else: return None def __contains__(self,n): return n in self.vertList def addEdge(self,f,t,cost=0): if f not in self.vertList: nv = self.addVertex(f) if t not in self.vertList: nv = self.addVertex(t) self.vertList[f].addNeighbor(self.vertList[t], cost) def getVertices(self): return self.vertList.keys() def __iter__(self): return iter(self.vertList.values())g = Graph()for i in range(6): g.addVertex(i)g.addEdge(0,1,5)g.addEdge(0,5,2)g.addEdge(1,2,4)g.addEdge(2,3,9)g.addEdge(3,4,7)g.addEdge(3,5,3)g.addEdge(4,0,1)g.addEdge(5,2,1)for v in g: for w in v.getConnections(): print("( %s, %s)" %(v.getId(), w.getId()))( 0, 1)( 0, 5)( 1, 2)( 2, 3)( 3, 4)( 3, 5)( 4, 0)( 5, 2)

出於方便,python中可以用內置的數據結構實現樹和圖。

鄰接表:

# A Straightforward Adjacency List Representationa, b, c, d, e, f, g, h = range(8)N = [ [b, c, d, e, f], # a [c, e], # b [d], # c [e], # d [f], # e [c, g, h], # f [f, h], # g [f, g] # h]print(b in N[a]) # Neighborhood membership -> Trueprint(len(N[f])) # Degree -> 3True3

除了用list,也可以用set, dict等,使用不同的數據結構在插入,判斷連接關係,計算節點的度等操作時的方法和性能有所區別。比如稠密圖更適合用set,它判斷連接關係(元素是否存在)是O(1)。用dict可以表示帶權圖。

# A Straightforward Adjacency Set Representationa, b, c, d, e, f, g, h = range(8)N = [ {b, c, d, e, f}, # a {c, e}, # b {d}, # c {e}, # d {f}, # e {c, g, h}, # f {f, h}, # g {f, g} # h]# A Straightforward Adjacency Dict Representationa, b, c, d, e, f, g, h = range(8)N = [ {b:2, c:1, d:3, e:9, f:4}, # a {c:4, e:3}, # b {d:8}, # c {e:7}, # d {f:5}, # e {c:2, g:2, h:2}, # f {f:1, h:6}, # g {f:9, g:8} # h]

鄰接矩陣:

# An Adjacency Matrix, Implemented with Nested Listsa, b, c, d, e, f, g, h = range(8)N = [[0,1,1,1,1,1,0,0], # a [0,0,1,0,1,0,0,0], # b [0,0,0,1,0,0,0,0], # c [0,0,0,0,1,0,0,0], # d [0,0,0,0,0,1,0,0], # e [0,0,1,0,0,0,1,1], # f [0,0,0,0,0,1,0,1], # g [0,0,0,0,0,1,1,0]] # hprint(N[a][b]) # Neighborhood membership -> 1print(sum(N[f])) # Degree -> 3_ = float(inf)W = [[0,2,1,3,9,4,_,_], # a [_,0,4,_,3,_,_,_], # b [_,_,0,8,_,_,_,_], # c [_,_,_,0,7,_,_,_], # d [_,_,_,_,0,5,_,_], # e [_,_,2,_,_,0,2,2], # f [_,_,_,_,_,1,0,6], # g [_,_,_,_,_,9,8,0]] # hprint(W[a][b] < float(inf)) # Neighborhood membershipprint(sum(1 for w in W[a] if w < float(inf)) - 1) # Degree13True5

圖的兩種遍歷方式

深度優先搜索DFS: 遞歸版; 迭代版:輔助S(visited list),Q(plan queue) while Q. O(V+E)

廣度優先搜索BFS: use deque. 輔助P(dict(visited : parents)) or S(visited set),Q(plan deque). while Q:(O(V)), sum of the for loop:O(E), thus O(V+E).

def recu_dfs(G, s, S=None): if S is None: S = set() S.add(s) for v in G[s]: if v in S: continue recu_dfs(G, v, S) return Sdef iter_dfs(G, s): Q, S = [s], list() while Q: u = Q.pop() if u in S: continue Q.extend(G[u]) S.append(u) return Sfrom collections import dequedef bfs(G, s): Q, S = deque([s]), set() # to-do queue, visited set while Q: u = Q.popleft() for v in G[u]: if v in S: continue S.add(v) Q.append(v) return Sprint(list(recu_dfs(N, 0))) print(list(iter_dfs(N, 0)))[0, 1, 2, 3, 4, 5, 6, 7][0, 5, 7, 6, 2, 3, 4, 1]

下面是兩個分別運用BFS和DFS的問題。此外,拓撲排序和尋找連通分量都基於DFS,Dijsktra最短路徑演算法則基於BFS。

The World Ladder Problem

(等長單詞變化,一次只能改變一個字母,並且改變前後的單詞要在詞典里。給定源單詞和目標單詞,求最少改變次數)

轉化成兩個問題:1.如果把詞典的單詞用圖表示出來; 2.求解給定兩個單詞之間的最短路徑。

對於問題1,如果兩兩掃描單詞看是否相差一個字母然後連接,時間複雜度是O(n^2)。有一個更快的辦法是先把每個單詞(長度為n)遮去一個字母分成n個待匹配項,然後掃描一遍全部單詞,把它歸入它所屬的那n項中,最後每一項里的單詞都互相連接即可。

對於問題2,最短路徑可以用權重都為1的dijsktra演算法,最優O(ElogV)。這裡問題是單源單終點的無權圖最短路,可以簡單的用BFS(修改增加endword參數表示目標單詞,level變數用於計數)的辦法,O(n)。

FOOL -> SAGE:

def buildWordGraph(wordList): d = {} G = Graph() # create buckets of words that differ by one letter for word in wordList: for i in range(len(word)): bucket = word[:i] + _ + word[i+1:] if bucket in d: d[bucket].append(word) else: d[bucket] = [word] # add vertices and edges for words in the same bucket for bucket in d: for word1 in d[bucket]: for word2 in d[bucket]: if word1 != word2: G.addEdge(word1, word2) return Gdef ladder_bfs(G, startword, endword): Q, S = deque([s,0]), set() # to-do queue, visited set while Q: u, level = Q.popleft() level += 1 for v in G[u]: if v == endword: return level if v in S: continue S.add(v) Q.append((v,level)) return level

The Knights Tour Problem

(騎士巡遊問題,日字型走法,由任何一個位置出發,要求走完所有位置並且每個位置只能走一遍(一條哈密爾頓路徑)。)

類似上題,需要把騎士能走的地方(節點)和路(邊)用圖表示出來,然後遍歷這個圖,這裡需要用到DFS——相比上面給出的通常的DFS可能會創建多顆樹,這裡需要做一些修改,使得它只創建最深的一組樹,沒有任何分支,如此「深度」探索圖形返回的才是「一條」哈密爾頓路徑。做法是當進入死胡同走不下去時,需要進行回溯,由於DFS迭代進行,這容易做到。

def knightGraph(bdSize): # makes one pass over the entire board ktGraph = Graph() for row in range(bdSize): for col in range(bdSize): nodeId = posToNodeId(row,col,bdSize) newPositions = genLegalMoves(row,col,bdSize) for e in newPositions: nid = posToNodeId(e[0],e[1],bdSize) ktGraph.addEdge(nodeId,nid) return ktGraphdef posToNodeId(row, column, board_size): # converts a location on the board in terms of a row and a column... return (row * board_size) + column # ...into a linear vertex numberdef genLegalMoves(x,y,bdSize): # create a list of legal moves for that position on the board. newMoves = [] moveOffsets = [(-1,-2),(-1,2),(-2,-1),(-2,1), # eight directions to go ( 1,-2),( 1,2),( 2,-1),( 2,1)] for i in moveOffsets: newX = x + i[0] newY = y + i[1] if legalCoord(newX,bdSize) and legalCoord(newY,bdSize): newMoves.append((newX,newY)) return newMovesdef legalCoord(x,bdSize): if x >= 0 and x < bdSize: return True else: return Falsedef tour_dfs(G, depth, limit, s, S=None): # s: node we want to explore; depth: current search depth; : if S is None: # limit: the number of nodes in path; S: visited vertices, path. S = [] S.append(s) if depth < limit: done = False for v in G[s]: if v in S: continue done = recu_dfs(G, depth+1, limit, v, S) if not done: S.pop() # search fail, backtracking else: done = True return done

上面的演算法是O(n^k)級的,k是一個由n決定的常數(eg.n=8時,k=5.25)。分析其原因,是因為演算法對於選擇哪一個作為下一個訪問的節點非常敏感。可以用一種啟發式的方法來大大加速這個演算法:選擇具有最少可訪問節點的節點作為當前節點的下一個訪問節點,也就是優先選下圖中靠近邊角的節點。這個啟發式規則被稱為Warnsdorff』s algorithm。

def orderByAvail(n): resList = [] for v in n.getConnections(): if v.getColor() == white: c = 0 for w in v.getConnections(): if w.getColor() == white: c = c + 1 resList.append((c,v)) resList.sort(key=lambda x: x[0]) return [y[1] for y in resList]def tour_accelerate_dfs(G, depth, limit, s, S=None): # s: node we want to explore; depth: current search depth; : if S is None: # limit: the number of nodes in path; S: visited vertices, path. S = [] S.append(s) if depth < limit: done = False v = orderByAvail(s) if v not in S: done = recu_dfs(G, depth+1, limit, v, S) if not done: S.pop() else: done = True return done

拓撲排序

DAG經常用於表示一些有不同優先順序事物的處理過程。拓撲排序(topological sort)為有向無環圖(DAG)的節點按照邏輯順序進行排序。

思考:如果找到一個節點, 去掉它之後剩下的仍然是DAG. 這樣的節點應該是入度為0的.

演算法流程:計算各節點入度,採取類似選擇排序的策略:入度為零的放在未排列序列的最前面。每次排序完一個節點,剩餘未排序的節點更新入度。

step1:首先計算每個節點的入度;入度為零的存到輔助棧Q;

step2:基於DFS的方法,當Q不空:

\節點出棧, append到sorted list S;

\然後被出棧的節點的鄰點的入度數減1,如果出現了新的入度為0的節點則入棧Q;

\繼續這個過程, 直到所有節點都被刪掉了(棧Q空)。

上面這個過程是選擇節點後,插入到未排序序列的第一位(即已排序的最後一位)。

也可以選擇節點,插入到未排序的最後一位:計算未排序的節點的出度數,出度為0的插入到已排序的前面。(上一種好一些,避免了insert的O(n)。)

def topsort(G): count, Q, S= dict((u, 0) for u in G), [], [] # count incoming,Q(plan list),S(sorted list). for u in G: for v in G[u]: count[v] += 1 Q.extend([u for u in G if count[u] == 0]) while Q: u = Q.pop() S.append(u) for v in G[u]: count[v] -= 1 if count[v] == 0: Q.append(v) return SG = {a:set(bf), b:set(cdf), c: set(d), d: set(ef), e: set(f), f: set()}print(topsort(G))[a, b, c, d, e, f]

連通分量:

圖的一個最大子圖,在這個子圖中任何兩個節點之間都是相互可達的。不考慮邊的方向。連通圖的連通分量是連通圖本身。非連通圖可能有多個連通分量。

尋找連通分量的演算法:從一個頂點開始,沿著它的邊找到其他的節點,然後就是不斷地向已有的連通分量中添加節點,使得連通分量內部依然滿足連通性質。仍然是DFS。

遍歷函數walk,其中參數S暫時沒有用,它在後面求強連通分量時需要,表示的是一個「禁區」(forbidden zone),也就是不要去訪問這些節點。只適用於無向圖,而且只能找到一個從參數s出發的連通分量,要想得到全部的連通分量需要修改下。時間複雜度是O(E+V)。

def walk(G, s, S=set()): P, Q = dict({s:None}), {s} while Q: u = Q.pop() for v in G[u].difference(P, S): Q.add(v) P[v] = u return Pdef components(G): seen, comp = set(), [] for u in G: if u in seen: continue P = walk(G, u) seen.update(P) comp.append(P) return compG = { 0: set([1, 2]), 1: set([0, 2]), 2: set([0, 1]), 3: set([4, 5]), 4: set([3, 5]), 5: set([3, 4]) }print([list(sorted(C)) for C in components(G)])[[0, 1, 2], [3, 4, 5]]

強連通分量

強連通分量即有向圖的連通分量。演算法的流程有下面四步:

1.對原圖G運行DFS,得到每個節點的完成時間f[v];

2.得到原圖的轉置圖GT;

3.對GT運行DFS,主循環按照節點的f[v]降序進行訪問;

4.輸出深度優先森林中的每棵樹,也就是一個強連通分支。

def tr(G): # Transpose (rev. edges of) G GT = {} for u in G: GT[u] = set() # Get all the nodes in there for u in G: for v in G[u]: GT[v].add(u) # Add all reverse edges return GTdef scc(G): GT = tr(G) # Get the transposed graph sccs, seen = [], set() for u in topsort(G): # DFS starting points if u in seen: continue # Ignore covered nodes C = walk(GT, u, seen) # Dont go "backward" (seen) seen.update(C) # Weve now seen C sccs.append(C) # Another SCC found return sccsfrom string import ascii_lowercasedef parse_graph(s): # print(list(zip(ascii_lowercase, s.split("/")))) # [(a, bc), (b, die), (c, d), (d, ah), (e, f), (f, g), (g, eh), (h, i), (i, h)] G = {} for u, line in zip(ascii_lowercase, s.split("/")): G[u] = set(line) return GG = {a:set(bf), b:set(cdf), c: set(d), d: set(ef), e: set(f), f: set()}print(G)print(list(map(list, scc(G))))#[[a, c, b, d], [e, g, f], [i, h]]{a: {b, f}, b: {c, d, f}, c: {d}, d: {e, f}, e: {f}, f: set()}[[a], [b], [c], [d], [e], [f]]

圖論的最小生成樹問題(貪心法),DAG的單源最短路徑問題(動態規劃),更一般的最短路徑演算法如Bellman-Ford演算法(遍歷),Dijkstra演算法(貪心),Johnson演算法(Bellman-Ford + Dijkstra),Floyd-Warshall演算法(動態規劃),記錄在筆記的第二部分。

Summary

In this chapter we have looked at the graph abstract data type, and some implementations of a graph. A graph enables us to solve many problems provided we can transform the original problem into something that can be represented by a graph. In particular, we have seen that graphs are useful to solve problems in the following general areas.

  1. Breadth first search for finding the unweighted shortest path.
  2. Depth first search for graph exploration.
  3. Strongly connected components for simplifying a graph.
  4. Topological sort for ordering tasks.
  5. Minimum weight spanning trees for broadcasting messages.(later in Part II)
  6. Dijkstra』s algorithm for weighted shortest path.(later in Part II)

推薦閱讀:

什麼是Python Descriptors
隊列和棧
第八章 對象引用、可變性
簡歷中如何證明自己的編程能力?
Python GUI教程(十):創建一個複雜的GUI

TAG:Python | 演算法與數據結構 |