深度學習：系統科學的視角

04-30

1，系統科學

系統科學[1, 2] 是關於整體湧現性的基礎科學。系統科學的開創者貝塔朗菲指出，要區分兩類整體，一類是加和性整體，即非系統，不具備湧現性；另一類是非加和性整體，即系統，具有湧現性。系統科學研究的並不是一切的整體和整體性，而只關注於非加和性整體，即整體湧現性。湧現現象定性地表述為：整體具有部分及其總和所沒有的新的屬性或行為模式，用部分的性質或模式不可能全面解釋整體的性質和模式。即``整體大於部分之和。湧現現象也可定量表述為：令 $W$ 記為系統整體，由 $n$ 個部分組成；令 $p_{i}$ 記為第 $i$ 個部分， $i=1,2, cdots , n$ ；則形式化表示為 $W > sum_{i=1}^{n} p_{i}$ 。更簡潔地表述為 $2>1+1$ 。

自然界中，存在著大量的湧現現象。比如蟻群 [3]。單個螞蟻的行為是可由一個簡單的規則集概括，如「沿著氣味前進」、「用上顎抓緊物體」、「在認為危險的地方留下氣味標記」等。但是當螞蟻聚集形成蟻群，群內各成員之間的相互作用使得蟻群的整體行為表現出來的複雜、高效和智慧令人嘆為觀止。比如，修建橋樑、跨越深溝和駕馭樹葉之舟在溪流上航行。這正是湧現現象的體現：複雜的事物是從小而簡單的事物中發展而來的。

我認為，在深度神經網路中，湧現現象不但存在，而且比較普遍，特別是面對複雜問題時所使用的複雜模型。根據穆勒提出的判斷湧現的三個條件[4]，我嘗試給出深度神經網路中湧現的定義和判據條件。

2，深度神經網路中的湧現

定義(Theorem) 深度神經網路中的湧現：深度學習模型是一個由大量基礎神經元，按照一定的並行結構和層次結構組合而成的，一個自適應、自組織的神經元網路系統；系統表現出來的整體性質是各個基礎神經元並不具備的新的屬性，也不是所有基礎神經元特性的簡單疊加。

判斷依據(Proof)

一個整體的湧現特徵不是其部分的特徵之和。
深度學習系統內的任意一個單一神經元都不可能產生強大的效能，只有當一定數量的這些簡單神經元，通過某種巧妙的組合方式，累積成某種精巧的結構之後，深度學習系統才開始具備前所未有的分類能力和特徵學習能力。大量實踐表明，這種整體新性質的出現，並不是源於系統內各部分的簡單累加。
湧現特徵的種類與組成部分特徵的種類完全不同。

在深度學習系統內，系統整體表現出來的特徵學習能力（CNN）、適應能力、可塑性（GAN [5]）、聯想記憶能力（LSTM [6]、MemoryNN [7,8,9]、NTM [10], DNC[12]）等，遠遠超過了單個神經元的二元分類特性。
湧現特徵不能從獨立考察部分的行為中推導或預測出來。
近幾年來，在computer vision [13, 14], speech recognition [15] 和 natural language processing/understanding [16]上的深度學習實踐表明，深度學習是結構與組合的藝術(Deep learning is the art of architecture and composition.)。深度學習的整體湧現性（Whole Emergence）是各組成成分按照一定的結構方式，相互作用、相互補充、相互制約而激發出來的一種結構效應，已經遠遠不可能獨立考察各部分行為而得出。

[1]. Von Bertalanffy, Ludwig. "General system theory." New York 41973.1968 (1968): 40.

[2]. Von Bertalanffy, Ludwig. "The history and status of general systems theory." Academy of Management Journal 15.4 (1972): 407-426.

[3]. Holland, John H. Emergence: From chaos to order. OUP Oxford, 2000.

[4]. Mill, John Stuart. A System of Logic: Ratiocinative and Inductive. Routledge, 1960.

[5]. Goodfellow, Ian, et al. "Generative adversarial nets." Advances in neural information processing systems. 2014.

[6]. Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9.8 (1997): 1735-1780.

[7]. Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory networks." arXiv preprint arXiv:1410.3916 (2014).

[8]. Sukhbaatar, Sainbayar, Jason Weston, and Rob Fergus. "End-to-end memory networks." Advances in neural information processing systems. 2015.

[9]. Kumar, Ankit, et al. "Ask me anything: Dynamic memory networks for natural language processing." CoRR, abs/1506.07285 (2015).

[10]. Graves, Alex, Greg Wayne, and Ivo Danihelka. "Neural turing machines." arXiv preprint arXiv:1410.5401 (2014).

[12]. Graves, Alex, et al. "Hybrid computing using a neural network with dynamic external memory." Nature 538.7626 (2016): 471-476.

[13]. Szegedy, Christian, et al. "Rethinking the inception architecture for computer vision." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.

[14]. Huang, Gao, et al. "Densely connected convolutional networks." arXiv preprint arXiv:1608.06993 (2016).

[15]. van den Oord, A?ron, et al. "Wavenet: A generative model for raw audio." CoRR abs/1609.03499 (2016).

[16]. Wu, Yonghui, et al. "Googles Neural Machine Translation System: Bridging the Gap between Human and Machine Translation." arXiv preprint arXiv:1609.08144 (2016).