邁向深度學習第一步！零基礎深度學習：計算圖

02-09

歡迎訪問景略集智的官方網站（http://jizhi.im）。

快來參加2017無人駕駛智能車Hackathon 挑戰賽吧！

本篇為零基礎深度學習教程的第一章。本章將帶領你入門深度神經網路的數學和演算法基礎。然後我們將效仿 TensorFlow API，自己動手用 Python 實現一個神經網路庫。

（本系列教程全部附有代碼實戰，點這裡前往集智原貼調教代碼）

學習本章不需要任何機器學習或者神經網路的基礎。但是，對於本科生級別的微積分、線性代數、基礎演算法和概率學，需要有一定的基礎。如果學習過程中遇到困難，請寫在原帖評論里或通過結尾聯繫方式聯繫我們。

在本章結束時，你將會深入理解神經網路背後的數學知識，以及深度學習庫在其背後所起的作用。（我會讓代碼儘可能的簡單明了，相比於運行效率，這樣更加易於理解。由於我們的 API 是效仿 TensorFlow 的，你在完成本章的學習後，自然會知道如何使用 TensorFlow 的 API，以及 TensorFlow 背後的運行機理（而不是花時間去學習某個全能、最高效的 API）。）

原文：Deep Learning From Scratch I: Computational Graphs

翻譯： @孫一萌

教程編輯： @Kaiser

計算圖 Computational graphs

我們從計算圖(computational graph)的理論開始，因為神經網路本身是計算圖的一個特殊形式。

Computational graph 是有向圖，其中的節點都對應著操作(Operation) 或者變數(Variable)。

Variable 可以把自己的值遞送給 Operation，而 Operation 可以把自己的輸出遞送給其他的 Operation。這樣的話，計算圖中的每一個節點都定義了 graph 中的 Variable 的一個函數（本句意義可以參照「函數」的定義，大意為一種輸入對應一種輸出）。

遞送入節點的、從節點中傳出的值，被稱為 tensor，這是個用於多維數組的詞。因此，它包括標量、矢量、矩陣，也包括高階的張量（tensor）。

下例中的 computational graph 把兩個輸入 x 和 y 相加，計算得總和 z。

本例中，x 和 y 是 z 的輸入節點，z 是 x 和 y 的消耗者。z 因此定義了一個函數，即：

$z:R^2?>R where z(x,y)=x+y$

當計算變得越來越複雜時，computational graph 的概念就越顯得重要了。比如，下面的 computational graph 定義了一個仿射變換：

$z(A,x,b)=Ax+b$

操作 Operations

每一個 Operation 有三項特徵：

一個計算函數：用於計算對於給定的輸入，應當輸出的值
輸入節點(node)：可有多個，可以是 Variable 或者其他 Operation
consumer：可有多個，將 Operation 的輸出作為它們的輸入

來嘗試實現代碼吧！點擊前往集智運行該代碼。

class Operation: """Represents a graph node that performs a computation. An `Operation` is a node in a `Graph` that takes zero or more objects as input, and produces zero or more objects as output. """ def __init__(self, input_nodes=[]): """Construct Operation """ self.input_nodes = input_nodes # Initialize list of consumers (i.e. nodes that receive this operations output as input) self.consumers = [] # Append this operation to the list of consumers of all input nodes for input_node in input_nodes: input_node.consumers.append(self) # Append this operation to the list of operations in the currently active default graph _default_graph.operations.append(self) def compute(self): """Computes the output of this operation. "" Must be implemented by the particular operation. """ pass

一些簡單的 Operation

為了熟悉操作類（日後會需要），我們來實現一些簡單的 Operation。在這兩個Operation中，我們假定所有tensor都是NumPy數組，這樣的話，元素加法和矩陣乘法（.點號）就不需要我們自己實現了。

加法

來嘗試實現代碼吧！點擊前往集智運行該代碼。

class add(Operation): """Returns x + y element-wise. """ def __init__(self, x, y): """Construct add Args: x: First summand node y: Second summand node """ super().__init__([x, y]) def compute(self, x_value, y_value): """Compute the output of the add operation Args: x_value: First summand value y_value: Second summand value """ return x_value + y_value

矩陣乘法

來嘗試實現代碼吧！點擊前往集智運行該代碼。

class matmul(Operation): """Multiplies matrix a by matrix b, producing a * b. """ def __init__(self, a, b): """Construct matmul Args: a: First matrix b: Second matrix """ super().__init__([a, b]) def compute(self, a_value, b_value): """Compute the output of the matmul operation Args: a_value: First matrix value b_value: Second matrix value """ return a_value.dot(b_value)

佔位符 Placeholders

在計算圖中，並非所有節點都是Operation。比如在仿射變化的graph 中，A, x 和 b 都不是 Operation。相對地，它們是graph的輸入，而且，如果我們想要計算 graph 的輸出，就必須為它們各提供一個值。為了提供這樣的值，我們引入 placeholder。

來嘗試實現代碼吧！點擊前往集智運行該代碼。

class placeholder: """Represents a placeholder node that has to be provided with a value when computing the output of a computational graph """ def __init__(self): """Construct placeholder """ self.consumers = [] # Append this placeholder to the list of placeholders in the currently active default graph _default_graph.placeholders.append(self)

變數 Variables

在仿射變換的 graph 中，x 與 A 和 b 有本質的不同。x 是 operation 的輸入，而 A 和 b 是 operation 的參數，即它們是 graph 本身固有的。我們把 A 和 b 這樣的參數稱為 variable。

來嘗試實現代碼吧！點擊前往集智運行該代碼。

class Variable: """Represents a variable (i.e. an intrinsic, changeable parameter of a computational graph). """ def __init__(self, initial_value=None): """Construct Variable Args: initial_value: The initial value of this variable """ self.value = initial_value self.consumers = [] # Append this variable to the list of variables in the currently active default graph _default_graph.variables.append(self)

Graph類

最後，我們需要一個把所有 operation, placeholder 和 variable 包含在一起的類。創建一個新的 graph 時，可以通過調用 as_default 方法來設置它的 _defaultgraph。

通過這個方式，我們不用每次都傳入一個 graph 的引用，就可以創建 operation, placeholder 和 variable。

來嘗試實現代碼吧！點擊前往集智運行該代碼。

class Graph: """Represents a computational graph """ def __init__(self): """Construct Graph""" self.operations = [] self.placeholders = [] self.variables = [] def as_default(self): global _default_graph _default_graph = self

舉例

現在我們來用上面列舉的類，創建一個仿射變換的 computational graph：

來嘗試實現代碼吧！點擊前往集智運行該代碼。

# Create a new graphGraph().as_default()# Create variablesA = Variable([[1, 0], [0, -1]])b = Variable([1, 1])# Create placeholderx = placeholder()# Create hidden node yy = matmul(A, x)# Create output node zz = add(y, b)

計算操作輸出

既然已經學會了怎麼創建計算圖，我們就該考慮怎麼計算 operation 的輸出了。

創建一個會話(Session) 類，用來包括一個 operation 的執行。我們希望能夠對 session 的實例調用 run 方法，能夠傳入需要計算的 operation，以及一個包含所有 placeholder 所需要的值的字典。

session = Session()output = session.run(z, { x: [1, 2]})

這裡計算過程是這樣的：

為了計算 operation 所代表的函數，我們需要按正確的順序進行計算。比如，如果中間結果 y 還沒計算出來，我們就不能先計算 z。因此我們必須確保 operation 執行順序正確，只有這樣才能確保在計算某個 operation之前，它所需要的輸入節點的值都已經計算好了。這點可以通過 post-order traversal 實現。

來嘗試實現代碼吧！點擊前往集智運行該代碼。

import numpy as npclass Session: """Represents a particular execution of a computational graph. """ def run(self, operation, feed_dict={}): """Computes the output of an operation Args: operation: The operation whose output wed like to compute. feed_dict: A dictionary that maps placeholders to values for this session """ # Perform a post-order traversal of the graph to bring the nodes into the right order nodes_postorder = traverse_postorder(operation) # Iterate all nodes to determine their value for node in nodes_postorder: if type(node) == placeholder: # Set the node value to the placeholder value from feed_dict node.output = feed_dict[node] elif type(node) == Variable: # Set the node value to the variables value attribute node.output = node.value else: # Operation # Get the input values for this operation from node_values node.inputs = [input_node.output for input_node in node.input_nodes] # Compute the output of this operation node.output = node.compute(*node.inputs) # Convert lists to numpy arrays if type(node.output) == list: node.output = np.array(node.output) # Return the requested node value return operation.outputdef traverse_postorder(operation): """Performs a post-order traversal, returning a list of nodes in the order in which they have to be computed Args: operation: The operation to start traversal at """ nodes_postorder = [] def recurse(node): if isinstance(node, Operation): for input_node in node.input_nodes: recurse(input_node) nodes_postorder.append(node) recurse(operation) return nodes_postorder

測試一下上例裡頭我們寫的類：

來嘗試實現代碼吧！點擊前往集智運行該代碼。

session = Session()output = session.run(z, { x: [1, 2]})print(output)

本章的課程到這裡就結束啦！如果有什麼學習上的問題歡迎你前往集智社區發帖諮詢，我們會安排專人解答。