用 Python 實現常用演算法和數據結構
I think a lot of new programmers like to use advanced data structures and advanced language features as a way of demonstrating their ability. I call it the lion-tamer syndrome. Such demonstrations are impressive, but unless they actually translate into real wins for the project, avoid them. - Glyn Williams"
最近重新複習下常用演算法和數據結構,下邊是看《Data Structures and Algorithms Using Python》筆記和代碼,純手敲,有興趣的可以參考下,都是基於python3.5 class實現,所以也算是複習了一下OOP。雖然思想都是通用的,但是使用的語言也會影響到我們的思維,我會結合一下使用python的經驗來稍微分析下各種數據結構和其操作的複雜度,以便靈活選用。實現一個數據結構後最好寫一些單元測試用例,否則沒人知道你寫的究竟對不對。坑爹的是本書有很多代碼錯誤甚至實現錯誤,調試花了我很多時間。同時你還會發現很多坑爹的網路演算法教程文章代碼直接拷貝根本不能用,沒有單元測試證明演算法正確性的都是扯淡。(ps:這篇博客可以當做面試 python 演算法的提綱,筆者來知乎的時候直接手寫了裡邊的一些基礎演算法)
___
1章:ADT抽象數據類型,定義數據和其操作
什麼是ADT: 抽象數據類型,學過數據結構的應該都知道。
How to select datastructures for ADT
- Dose the data structure provie for the storage requirements as specified by the domain of the ADT?
- Does the data structure provide the data access and manipulation functionality to fully implement the ADT?
- Effcient implemention? based on complexity analysis.
下邊代碼是個簡單的示例,比如實現一個簡單的Bag類,先定義其具有的操作,然後我們再用類的magic method來實現這些方法:
class Bag: """ constructor: 構造函數 size contains append remove iter """ def __init__(self): self._items = list() def __len__(self): return len(self._items) def __contains__(self, item): return item in self._items def add(self, item): self._items.append(item) def remove(self, item): assert item in self._items, "item must in the bag" return self._items.remove(item) def __iter__(self): return _BagIterator(self._items)class _BagIterator: """ 注意這裡實現了迭代器類 """ def __init__(self, seq): self._bag_items = seq self._cur_item = 0 def __iter__(self): return self def __next__(self): if self._cur_item < len(self._bag_items): item = self._bag_items[self._cur_item] self._cur_item += 1 return item else: raise StopIterationb = Bag()b.add(1)b.add(2)for i in b: # for使用__iter__構建,用__next__迭代 print(i)"""# for 語句等價於i = b.__iter__()while True: try: item = i.__next__() print(item) except StopIteration: break"""
2章:array vs list
array: 定長,操作有限,但是節省內存;貌似我的生涯中還沒用過,不過python3.5中我試了確實有array類,可以用import array直接導入
list: 會預先分配內存,操作豐富,但是耗費內存。我用sys.getsizeof做了實驗。我個人理解很類似C++ STL里的vector,是使用最頻繁的數據結構。
- list.append: 如果之前沒有分配夠內存,會重新開闢新區域,然後複製之前的數據,複雜度退化
- list.insert: 會移動被插入區域後所有元素,O(n)
- list.pop: pop不同位置需要的複雜度不同pop(0)是O(1)複雜度,pop()首位O(n)複雜度
- list[]: slice操作copy數據(預留空間)到另一個list
來實現一個array的ADT:
import ctypesclass Array: def __init__(self, size): assert size > 0, "array size must be > 0" self._size = size PyArrayType = ctypes.py_object * size self._elements = PyArrayType() self.clear(None) def __len__(self): return self._size def __getitem__(self, index): assert index >= 0 and index < len(self), "out of range" return self._elements[index] def __setitem__(self, index, value): assert index >= 0 and index < len(self), "out of range" self._elements[index] = value def clear(self, value): """ 設置每個元素為value """ for i in range(len(self)): self._elements[i] = value def __iter__(self): return _ArrayIterator(self._elements)class _ArrayIterator: def __init__(self, items): self._items = items self._idx = 0 def __iter__(self): return self def __next__(self): if self._idex < len(self._items): val = self._items[self._idx] self._idex += 1 return val else: raise StopIteration
Two-Demensional Arrays
class Array2D: """ 要實現的方法 Array2D(nrows, ncols): constructor numRows() numCols() clear(value) getitem(i, j) setitem(i, j, val) """ def __init__(self, numrows, numcols): self._the_rows = Array(numrows) # 數組的數組 for i in range(numrows): self._the_rows[i] = Array(numcols) @property def numRows(self): return len(self._the_rows) @property def NumCols(self): return len(self._the_rows[0]) def clear(self, value): for row in range(self.numRows): row.clear(value) def __getitem__(self, ndx_tuple): # ndx_tuple: (x, y) assert len(ndx_tuple) == 2 row, col = ndx_tuple[0], ndx_tuple[1] assert (row >= 0 and row < self.numRows and col >= 0 and col < self.NumCols) the_1d_array = self._the_rows[row] return the_1d_array[col] def __setitem__(self, ndx_tuple, value): assert len(ndx_tuple) == 2 row, col = ndx_tuple[0], ndx_tuple[1] assert (row >= 0 and row < self.numRows and col >= 0 and col < self.NumCols) the_1d_array = self._the_rows[row] the_1d_array[col] = value
The Matrix ADT, m行,n列。這個最好用還是用pandas處理矩陣,自己實現比較*疼
class Matrix: """ 最好用pandas的DataFrame Matrix(rows, ncols): constructor numCols() getitem(row, col) setitem(row, col, val) scaleBy(scalar): 每個元素乘scalar transpose(): 返回transpose轉置 add(rhsMatrix): size must be the same subtract(rhsMatrix) multiply(rhsMatrix) """ def __init__(self, numRows, numCols): self._theGrid = Array2D(numRows, numCols) self._theGrid.clear(0) @property def numRows(self): return len(self._theGrid.numRows()) @property def NumCols(self): return len(self._theGrid.numCols()) def __getitem__(self, ndxTuple): return self._theGrid[ndxTuple[0], ndxTuple[1]] def __setitem__(self, ndxTuple, scalar): self._theGrid[ndxTuple[0], ndxTuple[1]] = scalar def scaleBy(self, scalar): for r in range(self.numRows): for c in range(self.numCols): self[r, c] *= scalar def __add__(self, rhsMatrix): assert (rhsMatrix.numRows == self.numRows and rhsMatrix.numCols == self.numCols) newMartrix = Matrix(self.numRows, self.numCols) for r in range(self.numRows): for c in range(self.numCols): newMartrix[r, c] = self[r, c] + rhsMatrix[r, c]
3章:Sets and Maps
除了list之外,最常用的應該就是python內置的set和dict了。
sets ADT
A set is a container that stores a collection of unique values over a given comparable domain in which the stored values have no particular ordering.
class Set: """ 使用list實現set ADT Set() length() contains(element) add(element) remove(element) equals(element) isSubsetOf(setB) union(setB) intersect(setB) difference(setB) iterator() """ def __init__(self): self._theElements = list() def __len__(self): return len(self._theElements) def __contains__(self, element): return element in self._theElements def add(self, element): if element not in self: self._theElements.append(element) def remove(self, element): assert element in self, "The element must be set" self._theElements.remove(element) def __eq__(self, setB): if len(self) != len(setB): return False else: return self.isSubsetOf(setB) def isSubsetOf(self, setB): for element in self: if element not in setB: return False return True def union(self, setB): newSet = Set() newSet._theElements.extend(self._theElements) for element in setB: if element not in self: newSet._theElements.append(element) return newSet
Maps or Dict: 鍵值對,python內部採用hash實現。
class Map: """ Map ADT list implemention Map() length() contains(key) add(key, value) remove(key) valudOf(key) iterator() """ def __init__(self): self._entryList = list() def __len__(self): return len(self._entryList) def __contains__(self, key): ndx = self._findPosition(key) return ndx is not None def add(self, key, value): ndx = self._findPosition(key) if ndx is not None: self._entryList[ndx].value = value return False else: entry = _MapEntry(key, value) self._entryList.append(entry) return True def valueOf(self, key): ndx = self._findPosition(key) assert ndx is not None, "Invalid map key" return self._entryList[ndx].value def remove(self, key): ndx = self._findPosition(key) assert ndx is not None, "Invalid map key" self._entryList.pop(ndx) def __iter__(self): return _MapIterator(self._entryList) def _findPosition(self, key): for i in range(len(self)): if self._entryList[i].key == key: return i return Noneclass _MapEntry: # or use collections.namedtuple("_MapEntry", "key,value") def __init__(self, key, value): self.key = key self.value = value
The multiArray ADT, 多維數組,一般是使用一個一維數組模擬,然後通過計算下標獲取元素
class MultiArray: """ row-major or column-marjor ordering, this is row-major ordering MultiArray(d1, d2, ...dn) dims(): the number of dimensions length(dim): the length of given array dimension clear(value) getitem(i1, i2, ... in), index(i1,i2,i3) = i1*(d2*d3) + i2*d3 + i3 setitem(i1, i2, ... in) 計算下標:index(i1,i2,...in) = i1*f1 + i2*f2 + ... + i(n-1)*f(n-1) + in*1 """ def __init__(self, *dimensions): # Implementation of MultiArray ADT using a 1-D # array,數組的數組的數組。。。 assert len(dimensions) > 1, "The array must have 2 or more dimensions" self._dims = dimensions # Compute to total number of elements in the array size = 1 for d in dimensions: assert d > 0, "Dimensions must be > 0" size *= d # Create the 1-D array to store the elements self._elements = Array(size) # Create a 1-D array to store the equation factors self._factors = Array(len(dimensions)) self._computeFactors() @property def numDims(self): return len(self._dims) def length(self, dim): assert dim > 0 and dim < len(self._dims), "Dimension component out of range" return self._dims[dim-1] def clear(self, value): self._elements.clear(value) def __getitem__(self, ndxTuple): assert len(ndxTuple) == self.numDims, "Invalid # of array subscripts" index = self._computeIndex(ndxTuple) assert index is not None, "Array subscript out of range" return self._elements[index] def __setitem__(self, ndxTuple, value): assert len(ndxTuple) == self.numDims, "Invalid # of array subscripts" index = self._computeIndex(ndxTuple) assert index is not None, "Array subscript out of range" self._elements[index] = value def _computeIndex(self, ndxTuple): # using the equation: i1*f1 + i2*f2 + ... + in*fn offset = 0 for j in range(len(ndxTuple)): if ndxTuple[j] < 0 or ndxTuple[j] >= self._dims[j]: return None else: offset += ndexTuple[j] * self._factors[j] return offset
4章:Algorithm Analysis
一般使用大O標記法來衡量演算法的平均時間複雜度, 1 < log(n) < n < nlog(n) < n^2 < n^3 < a^n。 了解常用數據結構操作的平均時間複雜度有利於使用更高效的數據結構,當然有時候需要在時間和空間上進行衡量,有些操作甚至還會退化,比如list的append操作,如果list空間不夠,會去開闢新的空間,操作複雜度退化到O(n),有時候還需要使用均攤分析(amortized)
5章:Searching and Sorting
排序和查找是最基礎和頻繁的操作,python內置了in操作符和bisect二分操作模塊實現查找,內置了sorted方法來實現排序操作。二分和快排也是面試中經常考到的,本章講的是基本的排序和查找。
def binary_search(sorted_seq, val): """ 實現標準庫中的bisect.bisect_left """ low = 0 high = len(sorted_seq) - 1 while low <= high: mid = (high + low) // 2 if sorted_seq[mid] == val: return mid elif val < sorted_seq[mid]: high = mid - 1 else: low = mid + 1 return lowdef bubble_sort(seq): # O(n^2), n(n-1)/2 = 1/2(n^2 + n) n = len(seq) for i in range(n-1): for j in range(i+n-1): # 每一輪冒泡如果滿足條件交換相鄰的元素 if seq[j] > seq[j+1]: seq[j], seq[j+1] = seq[j+1], seq[j] # swap seq[j], seq[j+1] # 冒泡實際上可以優化,設置一個flag,如果有一輪沒有交換操作就說明已經有序了def select_sort(seq): """可以看作是冒泡的改進,每次找一個最小的元素交換,每一輪只需要交換一次""" n = len(seq) for i in range(n-1): min_idx = i # assume the ith element is the smallest for j in range(i+1, n): if seq[j] < seq[min_idx]: # find the minist element index min_idx = j if min_idx != i: # swap seq[i] = seq[min_idx]def insertion_sort(seq): """ 每次挑選下一個元素插入已經排序的數組中,初始時已排序數組只有???個元素""" n = len(seq) for i in range(1, n): value = seq[i] # save the value to be positioned # find the position where value fits in the ordered part of the list pos = i while pos > 0 and value < seq[pos-1]: # Shift the items to the right during the search seq[pos] = seq[pos-1] pos -= 1 seq[pos] = valuedef merge_sorted_list(listA, listB): """ 歸併兩個有序數組 """ new_list = list() a = b = 0 while a < len(listA) and b < len(listB): if listA[a] < listB[b]: new_list.append(listA[a]) a += 1 else: new_list.append(listB[b]) b += 1 while a < len(listA): new_list.append(listA[a]) a += 1 while b < len(listB): new_list.append(listB[b]) b += 1 return new_list
6章:Linked Structure
list是最常用的數據結構,但是list在中間增減元素的時候效率會很低,這時候linked list會更適合,缺點就是獲取元素的平均時間複雜度變成了O(n)
# 單鏈表實現class ListNode: def __init__(self, data): self.data = data self.next = Nonedef travsersal(head, callback): curNode = head while curNode is not None: callback(curNode.data) curNode = curNode.nextdef unorderdSearch(head, target): curNode = head while curNode is not None and curNode.data != target: curNode = curNode.next return curNode is not None# Given the head pointer, prepend an item to an unsorted linked list.def prepend(head, item): newNode = ListNode(item) newNode.next = head head = newNode# Given the head reference, remove a target from a linked listdef remove(head, target): predNode = None curNode = head while curNode is not None and curNode.data != target: # 尋找目標 predNode = curNode curNode = curNode.data if curNode is not None: if curNode is head: head = curNode.next else: predNode.next = curNode.next
7章:Stacks
棧也是計算機里用得比較多的數據結構,棧是一種後進先出的數據結構,可以理解為往一個桶里放盤子,先放進去的會被壓在地下,拿盤子的時候,後放的會被先拿出來。
class Stack: """ Stack ADT, using a python list Stack() isEmpty() length() pop(): assert not empty peek(): assert not empty, return top of non-empty stack without removing it push(item) """ def __init__(self): self._items = list() def isEmpty(self): return len(self) == 0 def __len__(self): return len(self._items) def peek(self): assert not self.isEmpty() return self._items[-1] def pop(self): assert not self.isEmpty() return self._items.pop() def push(self, item): self._items.append(item)class Stack: """ Stack ADT, use linked list 使用list實現很簡單,但是如果涉及大量push操作,list的空間不夠時複雜度退化到O(n) 而linked list可以保證最壞情況下仍是O(1) """ def __init__(self): self._top = None # top節點, _StackNode or None self._size = 0 # int def isEmpty(self): return self._top is None def __len__(self): return self._size def peek(self): assert not self.isEmpty() return self._top.item def pop(self): assert not self.isEmpty() node = self._top self.top = self._top.next self._size -= 1 return node.item def _push(self, item): self._top = _StackNode(item, self._top) self._size += 1class _StackNode: def __init__(self, item, link): self.item = item self.next = link
8章:Queues
隊列也是經常使用的數據結構,比如發送消息等,celery可以使用redis提供的list實現消息隊列。 本章我們用list和linked list來實現隊列和優先順序隊列。
class Queue: """ Queue ADT, use list。list實現,簡單但是push和pop效率最差是O(n) Queue() isEmpty() length() enqueue(item) dequeue() """ def __init__(self): self._qList = list() def isEmpty(self): return len(self) == 0 def __len__(self): return len(self._qList) def enquue(self, item): self._qList.append(item) def dequeue(self): assert not self.isEmpty() return self._qList.pop(0)from array import Array # Array那一章實現的Array ADTclass Queue: """ circular Array ,通過頭尾指針實現。list內置append和pop複雜度會退化,使用 環數組實現可以使得入隊出隊操作時間複雜度為O(1),缺點是數組長度需要固定。 """ def __init__(self, maxSize): self._count = 0 self._front = 0 self._back = maxSize - 1 self._qArray = Array(maxSize) def isEmpty(self): return self._count == 0 def isFull(self): return self._count == len(self._qArray) def __len__(self): return len(self._count) def enqueue(self, item): assert not self.isFull() maxSize = len(self._qArray) self._back = (self._back + 1) % maxSize # 移動尾指針 self._qArray[self._back] = item self._count += 1 def dequeue(self): assert not self.isFull() item = self._qArray[self._front] maxSize = len(self._qArray) self._front = (self._front + 1) % maxSize self._count -= 1 return itemclass _QueueNode: def __init__(self, item): self.item = itemclass Queue: """ Queue ADT, linked list 實現。為了改進環型數組有最大數量的限制,改用 帶有頭尾節點的linked list實現。 """ def __init__(self): self._qhead = None self._qtail = None self._qsize = 0 def isEmpty(self): return self._qhead is None def __len__(self): return self._count def enqueue(self, item): node = _QueueNode(item) # 創建新的節點並用尾節點指向他 if self.isEmpty(): self._qhead = node else: self._qtail.next = node self._qtail = node self._qcount += 1 def dequeue(self): assert not self.isEmpty(), "Can not dequeue from an empty queue" node = self._qhead if self._qhead is self._qtail: self._qtail = None self._qhead = self._qhead.next # 前移頭節點 self._count -= 1 return node.itemclass UnboundedPriorityQueue: """ PriorityQueue ADT: 給每個item加上優先順序p,高優先順序先dequeue 分為兩種: - bounded PriorityQueue: 限制優先順序在一個區間[0...p) - unbounded PriorityQueue: 不限制優先順序 PriorityQueue() BPriorityQueue(numLevels): create a bounded PriorityQueue with priority in range [0, numLevels-1] isEmpty() length() enqueue(item, priority): 如果是bounded PriorityQueue, priority必須在區間內 dequeue(): 最高優先順序的出隊,同優先順序的按照FIFO順序 - 兩種實現方式: 1.入隊的時候都是到隊尾,出隊操作找到最高優先順序的出隊,出隊操作O(n) 2.始終維持隊列有序,每次入隊都找到該插入的位置,出隊操作是O(1) (注意如果用list實現list.append和pop操作複雜度會因內存分配退化) """ from collections import namedtuple _PriorityQEntry = namedtuple("_PriorityQEntry", "item, priority") # 採用方式1,用內置list實現unbounded PriorityQueue def __init__(self): self._qlist = list() def isEmpty(self): return len(self) == 0 def __len__(self): return len(self._qlist) def enqueue(self, item, priority): entry = UnboundedPriorityQueue._PriorityQEntry(item, priority) self._qlist.append(entry) def deque(self): assert not self.isEmpty(), "can not deque from an empty queue" highest = self._qlist[0].priority for i in range(len(self)): # 出隊操作O(n),遍歷找到最高優先順序 if self._qlist[i].priority < highest: highest = self._qlist[i].priority entry = self._qlist.pop(highest) return entry.itemclass BoundedPriorityQueue: """ BoundedPriorityQueue ADT,用linked list實現。上一個地方提到了 BoundedPriorityQueue 但是為什麼需要 BoundedPriorityQueue呢? BoundedPriorityQueue 的優先順序限制在[0, maxPriority-1] 對於 UnboundedPriorityQueue,出隊操作由於要遍歷尋找優先順序最高的item,所以平均 是O(n)的操作,但是對於 BoundedPriorityQueue,用隊列數組實現可以達到常量時間, 用空間換時間。比如要彈出一個元素,直接找到第一個非空隊列彈出 元素就可以了。 (小數字代表高優先順序,先出隊) qlist [0] -> ["white"] [1] [2] -> ["black", "green"] [3] -> ["purple", "yellow"] """ # Implementation of the bounded Priority Queue ADT using an array of # # queues in which the queues are implemented using a linked list. from array import Array # 第二章定義的ADT def __init__(self, numLevels): self._qSize = 0 self._qLevels = Array(numLevels) for i in range(numLevels): self._qLevels[i] = Queue() # 上一節講到用linked list實現的Queue def isEmpty(self): return len(self) == 0 def __len__(self): return len(self._qSize) def enqueue(self, item, priority): assert priority >= 0 and priority < len(self._qLevels), "invalid priority" self._qLevel[priority].enquue(item) # 直接找到 priority 對應的槽入隊 def deque(self): assert not self.isEmpty(), "can not deque from an empty queue" i = 0 p = len(self._qLevels) while i < p and not self._qLevels[i].isEmpty(): # 找到第一個非空隊列 i += 1 return self._qLevels[i].dequeue()
9章:Advanced Linked Lists
之前曾經介紹過單鏈表,一個鏈表節點只有data和next欄位,本章介紹高級的鏈表。
Doubly Linked List,雙鏈表,每個節點多了個prev指向前一個節點。雙鏈表可以用來編寫文本編輯器的buffer。
class DListNode: def __init__(self, data): self.data = data self.prev = None self.next = Nonedef revTraversa(tail): curNode = tail while cruNode is not None: print(curNode.data) curNode = curNode.prevdef search_sorted_doubly_linked_list(head, tail, probe, target): """ probing technique探查法,改進直接遍歷,不過最壞時間複雜度仍是O(n) searching a sorted doubly linked list using the probing technique Args: head (DListNode obj) tail (DListNode obj) probe (DListNode or None) target (DListNode.data): data to search """ if head is None: # make sure list is not empty return False if probe is None: # if probe is null, initialize it to first node probe = head # if the target comes before the probe node, we traverse backward, otherwise # traverse forward if target < probe.data: while probe is not None and target <= probe.data: if target == probe.dta: return True else: probe = probe.prev else: while probe is not None and target >= probe.data: if target == probe.data: return True else: probe = probe.next return Falsedef insert_node_into_ordered_doubly_linekd_list(value): """ 最好畫個圖看,鏈表操作很容易繞暈,注意賦值順序""" newnode = DListNode(value) if head is None: # empty list head = newnode tail = head elif value < head.data: # insert before head newnode.next = head head.prev = newnode head = newnode elif value > tail.data: # insert after tail newnode.prev = tail tail.next = newnode tail = newnode else: # insert into middle node = head while node is not None and node.data < value: node = node.next newnode.next = node newnode.prev = node.prev node.prev.next = newnode node.prev = newnode
循環鏈表
def travrseCircularList(listRef): curNode = listRef done = listRef is None while not None: curNode = curNode.next print(curNode.data) done = curNode is listRef # 回到遍歷起始點def searchCircularList(listRef, target): curNode = listRef done = listRef is None while not done: curNode = curNode.next if curNode.data == target: return True else: done = curNode is listRef or curNode.data > target return Falsedef add_newnode_into_ordered_circular_linked_list(listRef, value): """ 插入並維持順序 1.插入空鏈表;2.插入頭部;3.插入尾部;4.按順序插入中間 """ newnode = ListNode(value) if listRef is None: # empty list listRef = newnode newnode.next = newnode elif value < listRef.next.data: # insert in front newnode.next = listRef.next listRef.next = newnode elif value > listRef.data: # insert in back newnode.next = listRef.next listRef.next = newnode listRef = newnode else: # insert in the middle preNode = None curNode = listRef done = listRef is None while not done: preNode = curNode preNode = curNode.next done = curNode is listRef or curNode.data > value newnode.next = curNode preNode.next = newnode
10章:Recursion
Recursion is a process for solving problems by subdividing a larger problem into smaller cases of the problem itself and then solving the smaller, more trivial parts.
遞歸函數:調用自己的函數
# 遞歸函數:調用自己的函數,看一個最簡單的遞歸函數,倒序列印一個數def printRev(n): if n > 0: print(n) printRev(n-1)printRev(3) # 從10輸出到1# 稍微改一下,print放在最後就得到了正序列印的函數def printInOrder(n): if n > 0: printInOrder(n-1) print(n) # 之所以最小的先列印是因為函數一直遞歸到n==1時候的最深棧,此時不再 # 遞歸,開始執行print語句,這時候n==1,之後每跳出一層棧,列印更大的值printInOrder(3) # 正序輸出
Properties of Recursion: 使用stack解決的問題都能用遞歸解決
- A recursive solution must contain a base case; 遞歸出口,代表最小子問題(n == 0退出列印)
- A recursive solution must contain a recursive case; 可以分解的子問題
- A recursive solution must make progress toward the base case. 遞減n使得n像遞歸出口靠近
Tail Recursion: occurs when a function includes a single recursive call as the last statement of the function. In this case, a stack is not needed to store values to te used upon the return of the recursive call and thus a solution can be implemented using a iterative loop instead.
# Recursive Binary Searchdef recBinarySearch(target, theSeq, first, last): # 你可以寫寫單元測試來驗證這個函數的正確性 if first > last: # 遞歸出口1 return False else: mid = (first + last) // 2 if theSeq[mid] == target: return True # 遞歸出口2 elif theSeq[mid] > target: return recBinarySearch(target, theSeq, first, mid - 1) else: return recBinarySearch(target, theSeq, mid + 1, last)
11章:Hash Tables
基於比較的搜索(線性搜索,有序數組的二分搜索)最好的時間複雜度只能達到O(logn),利用hash可以實現O(1)查找,python內置dict的實現方式就是hash,你會發現dict的key必須要是實現了__hash__和__eq__方法的。
Hashing: hashing is the process of mapping a search a key to a limited range of array indeices with the goal of providing direct access to the keys.
hash方法有個hash函數用來給key計算一個hash值,作為數組下標,放到該下標對應的槽中。當不同key根據hash函數計算得到的下標相同時,就出現了衝突。解決衝突有很多方式,比如讓每個槽成為鏈表,每次衝突以後放到該槽鏈表的尾部,但是查詢時間就會退化,不再是O(1)。還有一種探查方式,當key的槽衝突時候,就會根據一種計算方式去尋找下一個空的槽存放,探查方式有線性探查,二次方探查法等,cpython解釋器使用的是二次方探查法。還有一個問題就是當python使用的槽數量大於預分配的2/3時候,會重新分配內存並拷貝以前的數據,所以有時候dict的add操作代價還是比較高的,犧牲空間但是可以始終保證O(1)的查詢效率。如果有大量的數據,建議還是使用bloomfilter或者redis提供的HyperLogLog。
如果你感興趣,可以看看這篇文章,介紹c解釋器如何實現的python dict對象:Python dictionary implementation。我們使用Python來實現一個類似的hash結構。
import ctypesclass Array: # 第二章曾經定義過的ADT,這裡當做HashMap的槽數組使用 def __init__(self, size): assert size > 0, "array size must be > 0" self._size = size PyArrayType = ctypes.py_object * size self._elements = PyArrayType() self.clear(None) def __len__(self): return self._size def __getitem__(self, index): assert index >= 0 and index < len(self), "out of range" return self._elements[index] def __setitem__(self, index, value): assert index >= 0 and index < len(self), "out of range" self._elements[index] = value def clear(self, value): """ 設置每個元素為value """ for i in range(len(self)): self._elements[i] = value def __iter__(self): return _ArrayIterator(self._elements)class _ArrayIterator: def __init__(self, items): self._items = items self._idx = 0 def __iter__(self): return self def __next__(self): if self._idx < len(self._items): val = self._items[self._idx] self._idx += 1 return val else: raise StopIterationclass HashMap: """ HashMap ADT實現,類似於python內置的dict 一個槽有三種狀態: 1.從未使用 HashMap.UNUSED。此槽沒有被使用和衝突過,查找時只要找到UNUSEd就不用再繼續探查了 2.使用過但是remove了,此時是 HashMap.EMPTY,該探查點後邊的元素扔可能是有key 3.槽正在使用 _MapEntry節點 """ class _MapEntry: # 槽里存儲的數據 def __init__(self, key, value): self.key = key self.value = value UNUSED = None # 沒被使用過的槽,作為該類變數的一個單例,下邊都是is 判斷 EMPTY = _MapEntry(None, None) # 使用過但是被刪除的槽 def __init__(self): self._table = Array(7) # 初始化7個槽 self._count = 0 # 超過2/3空間被使用就重新分配,load factor = 2/3 self._maxCount = len(self._table) - len(self._table) // 3 def __len__(self): return self._count def __contains__(self, key): slot = self._findSlot(key, False) return slot is not None def add(self, key, value): if key in self: # 覆蓋原有value slot = self._findSlot(key, False) self._table[slot].value = value return False else: slot = self._findSlot(key, True) self._table[slot] = HashMap._MapEntry(key, value) self._count += 1 if self._count == self._maxCount: # 超過2/3使用就rehash self._rehash() return True def valueOf(self, key): slot = self._findSlot(key, False) assert slot is not None, "Invalid map key" return self._table[slot].value def remove(self, key): """ remove操作把槽置為EMPTY""" assert key in self, "Key error %s" % key slot = self._findSlot(key, forInsert=False) value = self._table[slot].value self._count -= 1 self._table[slot] = HashMap.EMPTY return value def __iter__(self): return _HashMapIteraotr(self._table) def _slot_can_insert(self, slot): return (self._table[slot] is HashMap.EMPTY or self._table[slot] is HashMap.UNUSED) def _findSlot(self, key, forInsert=False): """ 注意原書有錯誤,代碼根本不能運行,這裡我自己改寫的 Args: forInsert (bool): if the search is for an insertion Returns: slot or None """ slot = self._hash1(key) step = self._hash2(key) _len = len(self._table) if not forInsert: # 查找是否存在key while self._table[slot] is not HashMap.UNUSED: # 如果一個槽是UNUSED,直接跳出 if self._table[slot] is HashMap.EMPTY: slot = (slot + step) % _len continue elif self._table[slot].key == key: return slot slot = (slot + step) % _len return None else: # 為了插入key while not self._slot_can_insert(slot): # 循環直到找到一個可以插入的槽 slot = (slot + step) % _len return slot def _rehash(self): # 當前使用槽數量大於2/3時候重新創建新的table origTable = self._table newSize = len(self._table) * 2 + 1 # 原來的2*n+1倍 self._table = Array(newSize) self._count = 0 self._maxCount = newSize - newSize // 3 # 將原來的key value添加到新的table for entry in origTable: if entry is not HashMap.UNUSED and entry is not HashMap.EMPTY: slot = self._findSlot(entry.key, True) self._table[slot] = entry self._count += 1 def _hash1(self, key): """ 計算key的hash值""" return abs(hash(key)) % len(self._table) def _hash2(self, key): """ key衝突時候用來計算新槽的位置""" return 1 + abs(hash(key)) % (len(self._table)-2)class _HashMapIteraotr: def __init__(self, array): self._array = array self._idx = 0 def __iter__(self): return self def __next__(self): if self._idx < len(self._array): if self._array[self._idx] is not None and self._array[self._idx].key is not None: key = self._array[self._idx].key self._idx += 1 return key else: self._idx += 1 else: raise StopIterationdef print_h(h): for idx, i in enumerate(h): print(idx, i) print("
")def test_HashMap(): """ 一些簡單的單元測試,不過測試用例覆蓋不是很全面 """ h = HashMap() assert len(h) == 0 h.add("a", "a") assert h.valueOf("a") == "a" assert len(h) == 1 a_v = h.remove("a") assert a_v == "a" assert len(h) == 0 h.add("a", "a") h.add("b", "b") assert len(h) == 2 assert h.valueOf("b") == "b" b_v = h.remove("b") assert b_v == "b" assert len(h) == 1 h.remove("a") assert len(h) == 0 n = 10 for i in range(n): h.add(str(i), i) assert len(h) == n print_h(h) for i in range(n): assert str(i) in h for i in range(n): h.remove(str(i)) assert len(h) == 0
12章:Advanced Sorting
第5章介紹了基本的排序演算法,本章介紹高級排序演算法。
歸併排序(mergesort): 分治法
def merge_sorted_list(listA, listB): """ 歸併兩個有序數組,O(max(m, n)) ,m和n是數組長度""" print("merge left right list", listA, listB, end="") new_list = list() a = b = 0 while a < len(listA) and b < len(listB): if listA[a] < listB[b]: new_list.append(listA[a]) a += 1 else: new_list.append(listB[b]) b += 1 while a < len(listA): new_list.append(listA[a]) a += 1 while b < len(listB): new_list.append(listB[b]) b += 1 print(" ->", new_list) return new_listdef mergesort(theList): """ O(nlogn),log層調用,每層n次操作 mergesort: divided and conquer 分治 1. 把原數組分解成越來越小的子數組 2. 合併子數組來創建一個有序數組 """ print(theList) # 我把關鍵步驟打出來了,你可以運行下看看整個過程 if len(theList) <= 1: # 遞歸出口 return theList else: mid = len(theList) // 2 # 遞歸分解左右兩邊數組 left_half = mergesort(theList[:mid]) right_half = mergesort(theList[mid:]) # 合併兩邊的有序子數組 newList = merge_sorted_list(left_half, right_half) return newList""" 這是我調用一次打出來的排序過程[10, 9, 8, 7, 6, 5, 4, 3, 2, 1][10, 9, 8, 7, 6][10, 9][10][9]merge left right list [10] [9] -> [9, 10][8, 7, 6][8][7, 6][7][6]merge left right list [7] [6] -> [6, 7]merge left right list [8] [6, 7] -> [6, 7, 8]merge left right list [9, 10] [6, 7, 8] -> [6, 7, 8, 9, 10][5, 4, 3, 2, 1][5, 4][5][4]merge left right list [5] [4] -> [4, 5][3, 2, 1][3][2, 1][2][1]merge left right list [2] [1] -> [1, 2]merge left right list [3] [1, 2] -> [1, 2, 3]merge left right list [4, 5] [1, 2, 3] -> [1, 2, 3, 4, 5]"""
快速排序
def quicksort(theSeq, first, last): """ quicksort :也是分而治之,但是和歸併排序不同的是,採用選定主元(pivot)而不是從中間 進行數組劃分 1. 第一步選定pivot用來劃分數組,pivot左邊元素都比它小,右邊元素都大於等於它 2. 對劃分的左右兩邊數組遞歸,直到遞歸出口(數組元素數目小於2) 3. 對pivot和左右劃分的數組合併成一個有序數組 """ if first < last: pos = partitionSeq(theSeq, first, last) # 對劃分的子數組遞歸操作 quicksort(theSeq, first, pos - 1) quicksort(theSeq, pos + 1, last)def partitionSeq(theSeq, first, last): """ 快排中的劃分操作,把比pivot小的挪到左邊,比pivot大的挪到右邊""" pivot = theSeq[first] print("before partitionSeq", theSeq) left = first + 1 right = last while True: # 找到第一個比pivot大的 while left <= right and theSeq[left] < pivot: left += 1 # 從右邊開始找到比pivot小的 while right >= left and theSeq[right] >= pivot: right -= 1 if right < left: break else: theSeq[left], theSeq[right] = theSeq[right], theSeq[left] # 把pivot放到合適的位置 theSeq[first], theSeq[right] = theSeq[right], theSeq[first] print("after partitionSeq {}: {} ".format(theSeq, pivot)) return right # 返回pivot的位置def test_partitionSeq(): l = [0,1,2,3,4] assert partitionSeq(l, 0, len(l)-1) == 0 l = [4,3,2,1,0] assert partitionSeq(l, 0, len(l)-1) == 4 l = [2,3,0,1,4] assert partitionSeq(l, 0, len(l)-1) == 2test_partitionSeq()def test_quicksort(): def _is_sorted(seq): for i in range(len(seq)-1): if seq[i] > seq[i+1]: return False return True from random import randint for i in range(100): _len = randint(1, 100) to_sort = [] for i in range(_len): to_sort.append(randint(0, 100)) quicksort(to_sort, 0, len(to_sort)-1) # 注意這裡用了原地排序,直接更改了數組 print(to_sort) assert _is_sorted(to_sort)test_quicksort()
利用快排中的partitionSeq操作,我們還能實現另一個演算法,nth_element,快速查找一個無序數組中的第k大元素
def nth_element(seq, beg, end, k): if beg == end: return seq[beg] pivot_index = partitionSeq(seq, beg, end) if pivot_index == k: return seq[k] elif pivot_index > k: return nth_element(seq, beg, pivot_index-1, k) else: return nth_element(seq, pivot_index+1, end, k)def test_nth_element(): from random import shuffle n = 10 l = list(range(n)) shuffle(l) print(l) for i in range(len(l)): assert nth_element(l, 0, len(l)-1, i) == itest_nth_element()
13章:Binary Tree
The binary Tree: 二叉樹,每個節點做多只有兩個子節點
class _BinTreeNode: def __init__(self, data): self.data = data self.left = None self.right = None# 三種depth-first遍歷def preorderTrav(subtree): """ 先(根)序遍歷""" if subtree is not None: print(subtree.data) preorderTrav(subtree.left) preorderTrav(subtree.right)def inorderTrav(subtree): """ 中(根)序遍歷""" if subtree is not None: preorderTrav(subtree.left) print(subtree.data) preorderTrav(subtree.right)def postorderTrav(subtree): """ 後(根)序遍歷""" if subtree is not None: preorderTrav(subtree.left) preorderTrav(subtree.right) print(subtree.data)# 寬度優先遍歷(bradth-First Traversal): 一層一層遍歷, 使用queuedef breadthFirstTrav(bintree): from queue import Queue # py3 q = Queue() q.put(bintree) while not q.empty(): node = q.get() print(node.data) if node.left is not None: q.put(node.left) if node.right is not None: q.put(node.right)class _ExpTreeNode: __slots__ = ("element", "left", "right") def __init__(self, data): self.element = data self.left = None self.right = None def __repr__(self): return "<_ExpTreeNode: {} {} {}>".format( self.element, self.left, self.right)from queue import Queueclass ExpressionTree: """ 表達式樹: 操作符存儲在內節點操作數存儲在葉子節點的二叉樹。(符號樹真難打出來) * / + - / / 9 3 8 4 (9+3) * (8-4) Expression Tree Abstract Data Type,可以實現二元操作符 ExpressionTree(expStr): user string as constructor param evaluate(varDict): evaluates the expression and returns the numeric result toString(): constructs and retutns a string represention of the expression Usage: vars = {"a": 5, "b": 12} expTree = ExpressionTree("(a/(b-3))") print("The result = ", expTree.evaluate(vars)) """ def __init__(self, expStr): self._expTree = None self._buildTree(expStr) def evaluate(self, varDict): return self._evalTree(self._expTree, varDict) def __str__(self): return self._buildString(self._expTree) def _buildString(self, treeNode): """ 在一個子樹被遍歷之前添加做括弧,在子樹被遍歷之後添加右括弧 """ # print(treeNode) if treeNode.left is None and treeNode.right is None: return str(treeNode.element) # 葉子節點是操作數直接返回 else: expStr = "(" expStr += self._buildString(treeNode.left) expStr += str(treeNode.element) expStr += self._buildString(treeNode.right) expStr += ")" return expStr def _evalTree(self, subtree, varDict): # 是不是葉子節點, 是的話說明是操作數,直接返回 if subtree.left is None and subtree.right is None: # 操作數是合法數字嗎 if subtree.element >= "0" and subtree.element <= "9": return int(subtree.element) else: # 操作數是個變數 assert subtree.element in varDict, "invalid variable." return varDict[subtree.element] else: # 操作符則計算其子表達式 lvalue = self._evalTree(subtree.left, varDict) rvalue = self._evalTree(subtree.right, varDict) print(subtree.element) return self._computeOp(lvalue, subtree.element, rvalue) def _computeOp(self, left, op, right): assert op op_func = { "+": lambda left, right: left + right, # or import operator, operator.add "-": lambda left, right: left - right, "*": lambda left, right: left * right, "/": lambda left, right: left / right, "%": lambda left, right: left % right, } return op_func[op](left, right) def _buildTree(self, expStr): expQ = Queue() for token in expStr: # 遍歷表達式字元串的每個字元 expQ.put(token) self._expTree = _ExpTreeNode(None) # 創建root節點 self._recBuildTree(self._expTree, expQ) def _recBuildTree(self, curNode, expQ): token = expQ.get() if token == "(": curNode.left = _ExpTreeNode(None) self._recBuildTree(curNode.left, expQ) # next token will be an operator: + = * / % curNode.element = expQ.get() curNode.right = _ExpTreeNode(None) self._recBuildTree(curNode.right, expQ) # the next token will be ")", remmove it expQ.get() else: # the token is a digit that has to be converted to an int. curNode.element = tokenvars = {"a": 5, "b": 12}expTree = ExpressionTree("((2*7)+8)")print(expTree)print("The result = ", expTree.evaluate(vars))
Heap(堆):二叉樹最直接的一個應用就是實現堆。堆就是一顆完全二叉樹,最大堆的非葉子節點的值都比孩子大,最小堆的非葉子結點的值都比孩子小。 python內置了heapq模塊幫助我們實現堆操作,比如用內置的heapq模塊實現個堆排序:
# 使用python內置的heapq實現heap sortdef heapsort(iterable): from heapq import heappush, heappop h = [] for value in iterable: heappush(h, value) return [heappop(h) for i in range(len(h))]
但是一般實現堆的時候實際上並不是用數節點來實現的,而是使用數組實現,效率比較高。為什麼可以用數組實現呢?因為完全二叉樹的性質, 可以用下標之間的關係表示節點之間的關係,MaxHeap的docstring中已經說明了
class MaxHeap: """ Heaps: 完全二叉樹,最大堆的非葉子節點的值都比孩子大,最小堆的非葉子結點的值都比孩子小 Heap包含兩個屬性,order property 和 shape property(a complete binary tree),在插入 一個新節點的時候,始終要保持這兩個屬性 插入操作:保持堆屬性和完全二叉樹屬性, sift-up 操作維持堆屬性 extract操作:只獲取根節點數據,並把樹最底層最右節點copy到根節點後,sift-down操作維持堆屬性 用數組實現heap,從根節點開始,從上往下從左到右給每個節點編號,則根據完全二叉樹的 性質,給定一個節點i, 其父親和孩子節點的編號分別是: parent = (i-1) // 2 left = 2 * i + 1 rgiht = 2 * i + 2 使用數組實現堆一方面效率更高,節省樹節點的內存佔用,一方面還可以避免複雜的指針操作,減少 調試難度。 """ def __init__(self, maxSize): self._elements = Array(maxSize) # 第二章實現的Array ADT self._count = 0 def __len__(self): return self._count def capacity(self): return len(self._elements) def add(self, value): assert self._count < self.capacity(), "can not add to full heap" self._elements[self._count] = value self._count += 1 self._siftUp(self._count - 1) self.assert_keep_heap() # 確定每一步add操作都保持堆屬性 def extract(self): assert self._count > 0, "can not extract from an empty heap" value = self._elements[0] # save root value self._count -= 1 self._elements[0] = self._elements[self._count] # 最右下的節點放到root後siftDown self._siftDown(0) self.assert_keep_heap() return value def _siftUp(self, ndx): if ndx > 0: parent = (ndx - 1) // 2 # print(ndx, parent) if self._elements[ndx] > self._elements[parent]: # swap self._elements[ndx], self._elements[parent] = self._elements[parent], self._elements[ndx] self._siftUp(parent) # 遞歸 def _siftDown(self, ndx): left = 2 * ndx + 1 right = 2 * ndx + 2 # determine which node contains the larger value largest = ndx if (left < self._count and self._elements[left] >= self._elements[largest] and self._elements[left] >= self._elements[right]): # 原書這個地方沒寫實際上找的未必是largest largest = left elif right < self._count and self._elements[right] >= self._elements[largest]: largest = right if largest != ndx: self._elements[ndx], self._elements[largest] = self._elements[largest], self._elements[ndx] self._siftDown(largest) def __repr__(self): return " ".join(map(str, self._elements)) def assert_keep_heap(self): """ 我加了這個函數是用來驗證每次add或者extract之後,仍保持最大堆的性質""" _len = len(self) for i in range(0, int((_len-1)/2)): # 內部節點(非葉子結點) l = 2 * i + 1 r = 2 * i + 2 if l < _len and r < _len: assert self._elements[i] >= self._elements[l] and self._elements[i] >= self._elements[r]def test_MaxHeap(): """ 最大堆實現的單元測試用例 """ _len = 10 h = MaxHeap(_len) for i in range(_len): h.add(i) h.assert_keep_heap() for i in range(_len): # 確定每次出來的都是最大的數字,添加的時候是從小到大添加的 assert h.extract() == _len-i-1test_MaxHeap()def simpleHeapSort(theSeq): """ 用自己實現的MaxHeap實現堆排序,直接修改原數組實現inplace排序""" if not theSeq: return theSeq _len = len(theSeq) heap = MaxHeap(_len) for i in theSeq: heap.add(i) for i in reversed(range(_len)): theSeq[i] = heap.extract() return theSeqdef test_simpleHeapSort(): """ 用一些測試用例證明實現的堆排序是可以工作的 """ def _is_sorted(seq): for i in range(len(seq)-1): if seq[i] > seq[i+1]: return False return True from random import randint assert simpleHeapSort([]) == [] for i in range(1000): _len = randint(1, 100) to_sort = [] for i in range(_len): to_sort.append(randint(0, 100)) simpleHeapSort(to_sort) # 注意這裡用了原地排序,直接更改了數組 assert _is_sorted(to_sort)test_simpleHeapSort()
14章:Search Trees
二叉差找樹性質:對每個內部節點V,
- 所有key小於V.key的存儲在V的左子樹。
- 所有key大於V.key的存儲在V的右子樹 對BST進行中序遍歷會得到升序的key序列
class _BSTMapNode: __slots__ = ("key", "value", "left", "right") def __init__(self, key, value): self.key = key self.value = value self.left = None self.right = None def __repr__(self): return "<{}:{}> left:{}, right:{}".format( self.key, self.value, self.left, self.right) __str__ = __repr__class BSTMap: """ BST,樹節點包含key可payload。用BST來實現之前用hash實現過的Map ADT. 性質:對每個內部節點V, 1.對於節點V,所有key小於V.key的存儲在V的左子樹。 2.所有key大於V.key的存儲在V的右子樹 對BST進行中序遍歷會得到升序的key序列 """ def __init__(self): self._root = None self._size = 0 self._rval = None # 作為remove的返回值 def __len__(self): return self._size def __iter__(self): return _BSTMapIterator(self._root, self._size) def __contains__(self, key): return self._bstSearch(self._root, key) is not None def valueOf(self, key): node = self._bstSearch(self._root, key) assert node is not None, "Invalid map key." return node.value def _bstSearch(self, subtree, target): if subtree is None: # 遞歸出口,遍歷到樹底沒有找到key或是空樹 return None elif target < subtree.key: return self._bstSearch(subtree.left, target) elif target > subtree.key: return self._bstSearch(subtree.right, target) return subtree # 返回引用 def _bstMinumum(self, subtree): """ 順著樹一直往左下角遞歸找就是最小的,向右下角遞歸就是最大的 """ if subtree is None: return None elif subtree.left is None: return subtree else: return subtree._bstMinumum(self, subtree.left) def add(self, key, value): """ 添加或者替代一個key的value, O(N) """ node = self._bstSearch(self._root, key) if node is not None: # if key already exists, update value node.value = value return False else: # insert a new entry self._root = self._bstInsert(self._root, key, value) self._size += 1 return True def _bstInsert(self, subtree, key, value): """ 新的節點總是插入在樹的葉子結點上 """ if subtree is None: subtree = _BSTMapNode(key, value) elif key < subtree.key: subtree.left = self._bstInsert(subtree.left, key, value) elif key > subtree.key: subtree.right = self._bstInsert(subtree.right, key, value) # 注意這裡沒有else語句了,應為在被調用處add函數里先判斷了是否有重複key return subtree def remove(self, key): """ O(N) 被刪除的節點分為三種: 1.葉子結點:直接把其父親指向該節點的指針置None 2.該節點有一個孩子: 刪除該節點後,父親指向一個合適的該節點的孩子 3.該節點有倆孩子: (1)找到要刪除節點N和其後繼S(中序遍歷後該節點下一個) (2)複製S的key到N (3)從N的右子樹中刪除後繼S(即在N的右子樹中最小的) """ assert key in self, "invalid map key" self._root = self._bstRemove(self._root, key) self._size -= 1 return self._rval def _bstRemove(self, subtree, target): # search for the item in the tree if subtree is None: return subtree elif target < subtree.key: subtree.left = self._bstRemove(subtree.left, target) return subtree elif target > subtree.key: subtree.right = self._bstRemove(subtree.right, target) return subtree else: # found the node containing the item self._rval = subtree.value if subtree.left is None and subtree.right is None: # 葉子node return None elif subtree.left is None or subtree.right is None: # 有一個孩子節點 if subtree.left is not None: return subtree.left else: return subtree.right else: # 有倆孩子節點 successor = self._bstMinumum(subtree.right) subtree.key = successor.key subtree.value = successor.value subtree.right = self._bstRemove(subtree.right, successor.key) return subtree def __repr__(self): return "->".join([str(i) for i in self]) def assert_keep_bst_property(self, subtree): """ 寫這個函數為了驗證add和delete操作始終維持了bst的性質 """ if subtree is None: return if subtree.left is not None and subtree.right is not None: assert subtree.left.value <= subtree.value assert subtree.right.value >= subtree.value self.assert_keep_bst_property(subtree.left) self.assert_keep_bst_property(subtree.right) elif subtree.left is None and subtree.right is not None: assert subtree.right.value >= subtree.value self.assert_keep_bst_property(subtree.right) elif subtree.left is not None and subtree.right is None: assert subtree.left.value <= subtree.value self.assert_keep_bst_property(subtree.left)class _BSTMapIterator: def __init__(self, root, size): self._theKeys = Array(size) self._curItem = 0 self._bstTraversal(root) self._curItem = 0 def __iter__(self): return self def __next__(self): if self._curItem < len(self._theKeys): key = self._theKeys[self._curItem] self._curItem += 1 return key else: raise StopIteration def _bstTraversal(self, subtree): if subtree is not None: self._bstTraversal(subtree.left) self._theKeys[self._curItem] = subtree.key self._curItem += 1 self._bstTraversal(subtree.right)def test_BSTMap(): l = [60, 25, 100, 35, 17, 80] bst = BSTMap() for i in l: bst.add(i)def test_HashMap(): """ 之前用來測試用hash實現的map,改為用BST實現的Map測試 """ # h = HashMap() h = BSTMap() assert len(h) == 0 h.add("a", "a") assert h.valueOf("a") == "a" assert len(h) == 1 a_v = h.remove("a") assert a_v == "a" assert len(h) == 0 h.add("a", "a") h.add("b", "b") assert len(h) == 2 assert h.valueOf("b") == "b" b_v = h.remove("b") assert b_v == "b" assert len(h) == 1 h.remove("a") assert len(h) == 0 _len = 10 for i in range(_len): h.add(str(i), i) assert len(h) == _len for i in range(_len): assert str(i) in h for i in range(_len): print(len(h)) print("bef", h) _ = h.remove(str(i)) assert _ == i print("aft", h) print(len(h)) assert len(h) == 0test_HashMap()
推薦閱讀:
※Python練習第一題,在圖片上加入數字
※機器學習之Python基礎(五) --協程,分散式
※幾個提高工作效率的Python內置小工具