為什麼ReLU比Sigmoid在很多場合都要結果好

05-30

首先，我們先做一下基礎工作，y = sigmoid(x)的形狀是這樣的

y = sigmoid(2x)的形狀是這樣，相當是把sigmoid(x)在X軸上進行了壓縮

而y = sigmoid(2*(x - 1))是下面這樣的，可以看也是在sigmoid(2*x)的基礎上進行了平移

根據我們的高中知識，y = f(w*x + b) = f(w*(x + b/w)) = f(w*(x + h)),相當於是先先根據w進行相應縮放，然後再進行平移，當然我們的w,b都可正可負，所以我們看到

y = sigmoid(-x)是和y=sigmoid(x)關於X軸方向鏡像的：

所以我們初始化我們的神經元的w,b初始值後，神經元的輸出大概就會像是下面這樣

另外我們再看另外一種變換形式 y = w*f(x) + b

比如 y = 2*sigmoid(x) 如下,相當於是函數在y軸上進行了縮放，為了對比，虛線是sigmoid(x)函數

y = 2*sigmoid(x) + 1相當於是在Y軸上進行了平移

所以y = w1*f(w0*x + b0) + b1相當於是同時在X軸和Y軸兩個方向進行了平移和縮放，這個點對理解神經網路的工作機制非常重要！！！！！！！！！！！！！！！！！！！！！！！

我們先大概對我們的代碼的一些基礎參數設置一下，在文章最後，我會貼上所有的源代碼

hideCellNum = 10 #隱含層神經元數目

speed = 0.0001 #不要小看這個speed,選擇過大的時候，非常容易造成遞度爆炸，比如你可以試試speed為1，Relu的訓練
inputLayer = Layer(None,1,None) #第一層，沒有上一層，沒有激活函數，輸入單元的個數為1
##############單隱含層的物理結構如下,一個輸入單元，hideCellNum個隱含層神經單元，一個輸出單元，最後一個輸出單用的是線性神經元，loss函數用的是L2距離
# /-- 0 --
# (x) 0 --- 0 -- 0 (y)
# -- 0 --/

#
hideLayer1 = Layer(inputLayer,hideCellNum,ActivationSigmoid(1,1))

outputLayer = Layer(hideLayer1,1,ActivationLiner(1,0))

loss = LossL2(outputLayer) #L2的距離

x = np.linspace(-1, 1, 20) #這個輸入的範圍，要和相應的激活函數的權重初始化相關聯，取值範圍從-1,1,均勻取20個點
orig_y = 2* np.sin(3*x) + 1 * (x - 3)*x + 2 #調sin（）裡面的係數，可以控制輸出的周期幅度
y = orig_y #1/(1 + np.exp(-orig_y)) #如果最後一層是sigmoid,這裡就可以再用sigmoid處理一下，如果最後一層是Liner,直接用原始的即可

---------------------------------------------------分割線----------------------------------------

我們下面所有要調的主要參數，就只有ActivationSigmoid(i,j)裡面的這兩個參數，現在我們講一下這兩個參數的意義,其中，i,j 直接影響初始化權值的w,b,其中有這樣的關係

如果寫成w*x + b 則 w = ± i ,b = w*j *random(0,1)

如果寫成w*(x + h) 則 w = ± i ,h = j*random(0,1) ,

其數學意義是將f(x)在x軸方向進行縮放，然後將其位置均勻平移在 [-j,j]的空間範圍內，我們可以大概看成下下面的圖，參數是ActivationSigmoid(15,1)，當然，因為只有神經元只有10個，分布有一點點不那麼均勻

---------------------------------------------------分割線----------------------------------------

我們先看看ActivationSigmoid(1,1)，神經元輸出的S坡比較平滑,看看其擬合能力

訓練了50個周期（每個周期大概有30次遞度下降），似乎擬合不太理想

150個周期過去了，似乎變化不大，擬合結果不理想

---------------------------------------------------分割線----------------------------------------

我們先看看ActivationSigmoid(15,1)，神經元輸出的S坡變陡峭,看看其擬合能力

這個結果看上去還不錯，但是，到底發什麼了什麼讓擬合能力似乎一下子變好了呢，我們要仔細看下右上和左下的兩張圖，△w是隱含層的權重變化量，相同的顏色代表是同一個函數，你會發現△w是一個鐘形，而且這個鐘只在對應的輸出的坡上很窄的地方,這意味著，只有在坡上的時候，w,和b值才會發生變化，這個鐘形來自於sigmoid(x)的偏導(sigmoid(x)*(1 - sigmoid(x)),我們可能推導反射傳播，推導出這個偏導的影響，但是我們在取ActivationSigmoid(1,1)時，△w卻不是鐘形，而是和目標函數有相同的擺動規律，這是因為在S的坡變平坦之後，偏導的的影響變小了。另外我們還要注意，對△w求和(積分)，其實就是就是整個訓練周期的整體變化量。

---------------------------------------------------分割線----------------------------------------

我們坡看看ActivationSigmoid(50,1)，這次坡更陡峭了

我們發現，我們這次訓練出來的數據，略顯生硬，而且如果不是因為偏移均勻分布，很容易形成訓練盲區，但這個盲區，我們試試增加神經個數能不能更好，比如我們將神經元個數從10個變成10個，就會有下面的訓練結果：

結果是訓練盲區沒有了，但是訓練出來的結果卻依然生硬。

---------------------------------------------------分割線----------------------------------------

從上面的結果，我基本上形成一個直觀的感受，如果我們的s坡比目標函數的波形平緩很多，那很有可能訓練不出來，另外如果s坡比目標函數的坡陡峭，那訓練的結果就會擬合得很生硬，往往是我們在s坡和目標坡比較相近的時候，會有比較好的結果

另外我們來看另外一組實驗

我們的神經元個數還是10，先看看ActivationSigmoid(15,0)，之前我ActivationSigmoid(15,1)訓練結果還行，這次我們將波形的偏移固定成0，再看其訓練結果

50個訓練周期過去了，波形有一定的偏移，但是偏移還是較少

151個周期過去了，變化似乎不大，訓練結果實在是不太好，訓練變得很慢很慢，至少我們得出一個結論，如果偏移不是均勻分布在整個輸出空間，會嚴重影響訓練結果

我們通過上面的實驗，可以通過如下策略來改善我們的訓練結果

1 縮放空間上，w取值要在一個大值和小值之間，均勻分布，大值和小值可以預先對輸入樣本做一下分析，如果無法分析就只能靠實驗和經驗

2 在平移空間， b取值要在一個大值和小值之間，均勻分布，大值和小值可以預先對輸入樣本做一下分析，如果無法分析就只能靠實驗和經驗

如果要滿足縮放空間和平移空間都能均勻分布，需要的神經元數目大概會是N*N個

--------------------------------------------------分割線---開始分析ReLu-------------------------------------

我們取ActivationRelu(1, 1)

150個周期過去了，擬合結果看上去還是不錯的

我們這次取ActivationRelu(15, 1)

神奇的事情發生了，w變大之後，相當於是學習速度加快了，瞬間變擬合完成，和目標函數差別已經非常小了

150個周期之後，完美，幾乎完全擬合完成！！！！！！！！！！！！！！！！！！！

我們這次取ActivationRelu(15, 0.1)令偏移限制在[-0.1,0.1]很小的範圍內，看下會發生什麼

很明顯看到，擬合出現了盲區(別光看波形，左上角圖片的Y最大值是20，之前的圖最大Y值是10)，擬合效果並不算很好，到此我們得出一個結論：

1 w可以固定成1，因為對於ReLu, 縮放空間的係數越大，僅僅相當於訓練的速度加大（可以通過傳播公式推導證明），而不需要像Sigmoid需要在縮放分間均勻分布

2 平移空間還是需要均勻分布，這點和sigmoid是一樣的

至此，我們發現如果想要好的訓練結果，ReLu只需要N個神經元即可，而Sigmoid卻需要NxN個神經元，如果是你去選，你會怎麼選？

以後不要再動不動就是什麼sigmoid梯度消失，然後ReLu好計算什麼的，就說ReLu比Sigmoid 好用（雖然說的都對）,不寫代碼，不做實驗的人統統拉出去鞭刑：）

--------------------------------------------------分割線---不可迷信ReLu-------------------------------------

hideCellNum = 50 #增加了神經元個數

ActivationRelu(1, 1)

orig_y = 2* np.sin(10*x) + 1 * (x - 3)*x + 2 #修改了目標函數

160個周期過去了，我們發現似乎ReLu變化不了了

這次，我們將神經元增加到500個，10個loop,輕鬆完成擬合。。。。。。。

但是我們用ActivationSigmoid(35,1),50個神經元進行擬合，結果如下：

同等的神經元數目下，sigmoid能夠擬合出的結果似乎更好，前提是你選對了初始初始化的參數，如果你用ActivationSigmoid(1，1)，可能一輩子都擬合不出來

--------------------------------------------------分割線---激活函數不是唯一-------------------------------------

我們用sin(x)做激活函數，一樣可以訓練：

然後我們又用高斯函數1 /(1 + e^(x*x)) 做激活函數，同樣是可以訓練：

那再問一個f(x) = x*x可不可以做激活函數，我的代碼裡面用的是默認的BP演算法，會發生瞬間梯度爆炸。而且看過之前一個論文，還強調是"非多項式函數「才能擬合任意函數，我也大概觀察了下原因，在x變大後，y會以極大的速度變化，直觀上的感受就是按住一頭，另一頭就翹起來了

另外一點是任何連續的函數都可以轉換成多項式展開，我也天真的以為我取個多項式的近似應該也是可以的，實際上，我們先看下函數1/(1 + exp(x*x))的波形

然後再計算一下他的x=0附近的展開近似，可以通過這個網站自行計算:泰勒級數展開計算器，

整個波形在[-1.5,1.5]的的範圍內，還算是比較好，但是超過這個範圍內，急速上升，實際上通過觀察，高階的展開項，實際上是用來「壓邊"的

這意味著，f(x)在x0附近的泰勒展開，只能x0附近取得一個比較好的近似，但在整個實數範圍內的近似卻不可以。但是我們又找到另外一個思路，既然多項式在x超出某一範圍後急劇上升，那我們可不可以將x限制在固定的某一範圍內呢，實際上，我們通過代碼驗證了這種方式是可以的

限制後的x^2變成這樣：

實際上我們的擬合結果非常好，另外我們可以有理由的猜想，受限範圍內的多項式，也可以很好地進行擬合函數。而且我們可以看到，x^2能夠擬合的一個很重要原因是兩邊受到了抑制，這在sigmoid,高斯函數中，都是一樣的。

--------------------------------------------------分割線----------------------------------------

上面所有所有分析都是單隱含層的分析，單隱含層分析完成後，形成了直觀印象後，對多層分析其實也是非常簡單的，我們下次再表，下面，貼代碼，諸位自己測試測試，如果錯誤，請指正

--------------------------------------------------代碼分割線----------------------------------------

# 該代碼是一個小型的DNN網路代碼的實現，並且實現了多種激活函數，並且實現了圖形化顯示，特別適合直觀地理解神經網路的擬合過程# 代碼主要測試了sigmoid和relu函數，另外我們還測試了sin和正態分布，這些函數都能很好地擬合函數，但是一定要對初始化權重做一定的處理，否則訓練會很難# 原作者：易瑜郵箱:296721135@qq.com 如果有錯誤，歡迎指正,如轉載，請註明作者和出處# 本代碼在python3上執行測試，只依賴兩個python庫 numpy 和 matplotlib,通過 pip install numpy 和 pip install matplotlib即可安裝，非常簡單import numpy as npimport matplotlib.pyplot as pltfrom mpl_toolkits.mplot3d import Axes3Dimport mathimport randomimport os#from PIL import Imageclass Activation: ##子類必須實現下面的函數 def __init__(self, wRange=1, bRange=1): self.wRange = wRange self.bRange = bRange # 初始化權重 wx + b = w(x + b/w) = w(x + h) -> h = b/w ,w決定了函數的x方向的縮放，h決定了縮放後x方向的平移 # 初始化權重並不是一個隨機初始化的過程，我們測試中發現，在對s型函數擬合的過程中，務必把函數進行合適的縮放，然後初始化偏移，讓其均勻地分布在整個整個輸入空間 # 但對relu類型的函數，w可以設置為+1，-1即可，只要改變初始偏移即可完成相應的擬合 def initWeight(self, cell): for i in range(len(cell.w)): cell.w[i] = self.wRange * random.choice([1., -1.]) cell.b = (self.bRange * self.wRange) * random.uniform(-1, 1) if (cell.specialCellType): for i in range(len(cell.w)): cell.h[i] = (self.bRange) * random.uniform(-1, 1) def activation_fun(self, x): # 激活函數 raise NotImplemented("") def activation_deri_fun(self, cell): # 偏導 raise NotImplemented("") # 權重差值,求出來的偏導為 # △loss/△w = deri, （1） # 如果令 △w = -speed*deri （2） # 令2代入1可以導出 # △loss = deri*△w = - speed*deri*deri, loss是往恆往小的方向進行的 # 但是這個更新策略並不是唯一的策略，只要令△loss實際是往減小方向的策略理論上都是可以的，比如我們,在deri不為零的前提下 # 令 △w = -speed/deri （3） # 代入1,可得 △loss = -speed, 即每更新一步，△loss是以固定速度減小的 # 但是在(3)式的策略其實也可能有一些其他的問題，比如我們的偏導deri只是在當前w的一個很小的鄰域內才成立，所以一定要限制△w 的範圍， # 此處是只拋磚引玉，梯度下降的策略很有多種，可以參數一下下面文章： # http://www.360doc.com/content/16/1121/12/22755525_608221032.shtml def updateDeltaWeight(self, deri, speed, cell, loss, coefficient): return -speed * deri###############################################################X2,梯度很容易爆炸，但可以通過修改更新權重的策略讓其擬合一些函數class ActivationXX(Activation): def activation_fun(self, x): # 激活函數 if (abs(x) > 1): # 限制x的範圍 x = 1 return x * x def activation_deri_fun(self, cell): # 偏導 if (abs(cell.sum) > 1): return 0 return 2 * cell.sum############################################################### V型函數class ActivationAbsolute(Activation): def activation_fun(self, x): # 激活函數 return abs(x) def activation_deri_fun(self, cell): # 偏導 return 1.0 if cell.sum < 0.0 else 1.0############################################################### Sinc型函數class ActivationSinc(Activation): def activation_fun(self, x): # 激活函數 return 1.0 if x == 0.0 else math.sin(x) / x def activation_deri_fun(self, cell): # 偏導 x = cell.sum return 1.0 if x == 0.0 else math.cos(x) / x - math.sin(x) / (x * x)class ActivationTanh(Activation): def activation_fun(self, x): # 激活函數 return math.tanh(x) def activation_deri_fun(self, cell): # 偏導 return 1 - cell.out * cell.outclass ActivationRelu(Activation): def activation_fun(self, x): # 激活函數 return max(0.0, x) def activation_deri_fun(self, cell): # 偏導 return 0.0 if cell.sum <= 0. else 1.0class ActivationMyRelu(Activation): # ____/~~~~~~~`,往右移了一下 def activation_fun(self, x): # 激活函數 return max(0.0, x - 0.5) def activation_deri_fun(self, cell): # 偏導 return 0.0 if cell.sum <= 0. else 1.0class ActivationLeakyRelu(Activation): def activation_fun(self, x): # 激活函數 return x if x > 0.0 else 0.01 * x def activation_deri_fun(self, cell): # 偏導 return 0.01 if cell.sum <= 0 else 1.0class ActivationStep(Activation): # ___|~~~~~~ ,0 - 1 def activation_fun(self, x): # 激活函數 return 1.0 if x >= 0 else 0 def activation_deri_fun(self, cell): # 偏導 return 0class ActivationSignum(Activation): # ___|~~~~~~ ,-1 - 1 def activation_fun(self, x): # 激活函數 return 1.0 if x >= 0 else -1.0 def activation_deri_fun(self, cell): # 偏導 return 0.0class ActivationSoftPlus(Activation): # ln(1 + e^x) def activation_fun(self, x): # 激活函數 return math.log(1 + math.exp(x)) def activation_deri_fun(self, cell): # 偏導 return 1 / (1 + math.exp(-cell.sum))class ActivationLecunTanh(Activation): # LeCun Tanh def activation_fun(self, x): # 激活函數 return 1.7519 * math.tanh(2 * x / 3) # def activation_deri_fun(self, cell): # 偏導 return 1.7519 * 2 * (1 - cell.out * cell / (1.7519 * 1.7519)) / 3class ActivationHardTanh(Activation): # ____/~~~~~~~~~ , def activation_fun(self, x): # 激活函數 return 1 if x > 1.0 else (-1 if x < -1.0 else x) def activation_deri_fun(self, cell): # 偏導 return 1 if abs(x) < 1.0 else 0class ActivationArcTan(Activation): # ArcTan def activation_fun(self, x): # 激活函數 return math.atan(x) # def activation_deri_fun(self, cell): # 偏導 return 1 / (cell.sum * cell.sum + 1)class ActivationSoftsign(Activation): # x/(1 + |x|) def activation_fun(self, x): # 激活函數 return x / (1 + abs(x)) # def activation_deri_fun(self, cell): # 偏導 return 1 / ((1 + abs(cell.sum)) * (1 + abs(cell.sum))) ################################################################sigmoidclass ActivationSigmoid(Activation): def activation_fun(self, x): # 激活函數 try: return 1 / (1 + math.exp(-x)) except OverflowError: if x < 0.0: return 0 else: return 1; def activation_deri_fun(self, cell): # 偏導 return cell.out * (1 - cell.out) # def updateDeltaWeight(self,deri,speed,cell,loss,coefficient): ##權重差值,這種策略貌似會更快一點 # sigmoidDri = abs(cell.out * (1 - cell.out)) # if((sigmoidDri) < 0.1): #梯度太小，不處理 # return 0.0 # coefficient = abs(coefficient) # coefficient = max(coefficient,0.1) # maxDelta = (0.3/coefficient)*sigmoidDri #一次的x變化不能太大 # # if abs(deri) > 0.000001: # delta = (speed/deri) * loss # else: # return 0.0 # if abs(delta) > maxDelta: # delta = maxDelta if delta > 0 else -maxDelta # return -delta###############################################################正態分布class ActivationNormal(Activation): def activation_fun(self, x): # 激活函數 return math.exp(-x * x) def activation_deri_fun(self, cell): # 偏導 return -cell.out * 2 * cell.sum ###############################################################tanh(x/2)函數class ActivationTanh(Activation): def activation_fun(self, x): # 激活函數 return (1 - math.exp(-x)) / (1 + math.exp(-x)) def activation_deri_fun(self, cell): # 偏導 return 0.5 * (1 - cell.out * cell.out)###############################################################loglog函數class ActivationLogLog(Activation): def activation_fun(self, x): # 激活函數 return 1 - math.exp(-math.exp(x)) def activation_deri_fun(self, cell): # 偏導 return math.exp(cell.sum) * cell.out###############################################################cos函數class ActivationCos(Activation): def activation_fun(self, x): # 激活函數 return math.cos(x) def activation_deri_fun(self, cell): # 偏導 return math.sin(cell.sum)###############################################################sin函數class ActivationSin(Activation): def initWeight(self, cell): for i in range(len(cell.w)): cell.w[i] = self.wRange * random.choice([1., -1.]) * random.uniform(0.01, 1) cell.b = (self.bRange * self.wRange) * random.uniform(-1, 1) def activation_fun(self, x): # 激活函數 return math.sin(x) def activation_deri_fun(self, cell): # 偏導 return math.cos(cell.sum)###############################################################線性函數class ActivationLiner(Activation): def activation_fun(self, x): # 激活函數 return x def activation_deri_fun(self, cell): # 偏導 return 1 # def updateDeltaWeight(self,deri,speed,cell,loss,coefficient): # return 0. #暫時先強製為0，測試########################Cell有兩種，一種是以 ∑wi*xi + b 作為輸出 ,特殊的是以∑(abs(wi*(xi + hi)))作為輸出class Cell: def __init__(self, activation, specialCellType): self._activation = activation self.inputCell = None self.sum = 0.0 self.out = 0.0 self.error = 0.0 self.specialCellType = specialCellType def setInputCells(self, inputCell): self.inputCell = inputCell self.w = [0 for i in range(len(inputCell))] self.delta_w = [0 for i in range(len(inputCell))] if (self.specialCellType): self.h = [0 for i in range(len(inputCell))] self.delta_h = [0 for i in range(len(inputCell))] self.b = 0.0 self.delta_b = 0.0 if (self._activation): self._activation.initWeight(self) def caculateOut(self): # 計算輸出 sum = 0.0 i = 0 for cell in self.inputCell: if self.specialCellType: sum += abs(self.w[i] * (cell.out + self.h[i])) else: sum += self.w[i] * cell.out i += 1 if not self.specialCellType: sum += self.b self.sum = sum self.out = self._activation.activation_fun(sum) def updateWeight(self, speed, loss): if self.inputCell: i = 0 outDeri = self.error * self._activation.activation_deri_fun(self) for cell in self.inputCell: if self.specialCellType: deri = (cell.out + self.h[i]) * outDeri if self.delta_w[i] * (cell.out + self.h[i]) < 0.: deri = -deri else: deri = cell.out * outDeri self.delta_w[i] = self._activation.updateDeltaWeight(deri, speed, self, loss, cell.out) self.w[i] += self.delta_w[i] if self.specialCellType: hDeri = outDeri if self.w[i] > 0 else -outDeri # self.w[i]*outDeri if (cell.out + self.h[i]) < 0.: # 絕對值，特殊處理一下 hDeri = -hDeri; self.delta_h[i] = self._activation.updateDeltaWeight(hDeri, speed, self, loss, cell.out) self.h[i] += self.delta_h[i] i += 1 if not self.specialCellType: deri = outDeri self.delta_b = self._activation.updateDeltaWeight(deri, speed, self, loss, 1) self.b += self.delta_bclass Layer: def __init__(self, lastLayer=None, cellNum=1, activation=None, specialCellType=False): self._lastLayer = lastLayer self._cellNum = cellNum self.cells = [Cell(activation, specialCellType) for i in range(cellNum)] self._nextLayer = None if lastLayer: lastLayer._nextLayer = self for cell in self.cells: cell.setInputCells(lastLayer.cells) def _forward(self): # 第一個層調用 nextLayer = self._nextLayer while nextLayer: for cell in nextLayer.cells: cell.caculateOut() nextLayer = nextLayer._nextLayer def setInputAndForward(self, x): # 僅第一層調用 for i in range(len(self.cells)): self.cells[i].out = x[i] self._forward() def backPropagation(self, speed, loss): # 最後一個層調用,往前跑 currLayer = self lastLayer = self._lastLayer while lastLayer: # 計算所有的error for lastLayerCell in lastLayer.cells: lastLayerCell.error = 0.0 for currLayercell in currLayer.cells: deri = currLayercell._activation.activation_deri_fun(currLayercell) * currLayercell.error for j in range(len(lastLayer.cells)): lastLayerCell = lastLayer.cells[j] lastLayerCell.error += currLayercell.w[j] * deri currLayer = lastLayer lastLayer = lastLayer._lastLayer while currLayer: # 更新權重 for currLayercell in currLayer.cells: currLayercell.updateWeight(speed, loss) currLayer = currLayer._nextLayerclass Loss: def __init__(self, layer): self._layer = layer pass def minimize(self, expect): raise NotImplemented("")class LossL2(Loss): def __init__(self, layer): super().__init__(layer) if (len(layer.cells) != 1): raise (Exception("last layer shoule only one cell!")) def minimize(self, expect, speed): # L2距離為（out - expect)^2 ,其偏導為 2*(out - expect) loss = (self._layer.cells[0].out - expect) * (self._layer.cells[0].out - expect) self._layer.cells[0].error = 2 * (self._layer.cells[0].out - expect) self._layer.backPropagation(speed, loss)class LossEntropy(Loss): # 通常是配合前一級是 sigmoid函數的損失計算，否則意義不大 def __init__(self, layer): super().__init__(layer) if (len(layer.cells) != 1): raise (Exception("last layer shoule only one cell!")) def minimize(self, expect, speed): # 距離為 -(expect*ln(out) + (1 - expect)*ln(1 - out) ,其偏導為 -(expect/out - (1 - expect)/(1 - out)) = (out - expect)/((1 - out)*out) ，因為error有一個除法，很容易在計算的時候，數據超出浮點數範圍 loss = -(expect * math.log(self._layer.cells[0].out) + (1 - expect) * math.log(1 - self._layer.cells[0].out)) self._layer.cells[0].error = (self._layer.cells[0].out - expect) / ( self._layer.cells[0].out * (1 - self._layer.cells[0].out)) self._layer.backPropagation(speed, loss)def run3DDraw(): fig = plt.figure() ax = Axes3D(fig) X = np.arange(-8, 8, 0.25) Y = np.arange(-8, 8, 0.25) X, Y = np.meshgrid(X, Y) R = 1 / (1 + np.exp(abs(X) + abs(Y) - 5)) Z = R # 具體函數方法可用 help(function) 查看，如：help(ax.plot_surface) ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=rainbow) plt.show()def run2DDraw(): x = np.linspace(-7, 7, 70) y = 1 / (1 + np.exp((abs(x) - 5))) ax1 = plt.subplot(111) ax1.clear() # ax1.set_title(y = sigmoid(-x)) ax1.plot(x, y) ax1.grid(True) plt.pause(10)def run2D_DNN(): # run2DDraw() hideCellNum = 120 # 隱含層神經元數目 speed = 0.0001 # 不要小看這個speed,選擇過大的時候，非常容易造成遞度爆炸，比如你可以試試speed為1，Relu的訓練 inputLayer = Layer(None, 1, None) # 第一層，沒有上一層，沒有激活函數，輸入單元的個數為1 ##############單隱含層的物理結構如下,一個輸入單元，hideCellNum個隱含層神經單元，一個輸出單元，最後一個輸出單用的是線性神經元，loss函數用的是L2距離 # /-- 0 -- # (x) 0 --- 0 -- 0 (y) # -- 0 --/ # # hideLayer1 = Layer(inputLayer, hideCellNum, ActivationXX(15, 1)) # hideLayer1 = Layer(inputLayer, hideCellNum, ActivationNormal(15, 1)) # hideLayer1 = Layer(inputLayer, hideCellNum, ActivationSin(35, 1)) # hideLayer1 = Layer(inputLayer,hideCellNum,ActivationSigmoid(35,1)) hideLayer1 = Layer(inputLayer, hideCellNum, ActivationRelu(1, 1)) # hideLayer2 = Layer(hideLayer1, hideCellNum, ActivationRelu()) #我們同樣可以進行多層的神經網路 # hideLayer3 = Layer(hideLayer2, hideCellNum, ActivationRelu()) outputLayer = Layer(hideLayer1, 1, ActivationLiner(1, 0)) loss = LossL2(outputLayer) x = np.linspace(-1, 1, 20) # 這個輸入的範圍，要和相應的激活函數的權重初始化相關聯， orig_y = 2 * np.sin(3 * x) + 1 * (x - 3) * x + 2 # 調sin（）裡面的係數，可以控制輸出的周期幅度 # (x,orig_y) = walk_dir(./PIC,.bmp) y = orig_y # 1/(1 + np.exp(-orig_y)) #如果最後一層是sigmoid,這裡就可以再用sigmoid處理一下，如果最後一層是Liner,直接用原始的即可 _z = np.array([0.0 for i in range( len(y))]) # 千萬不要寫_y = y 這種愚蠢的寫法，這種寫法，_y和y會共用同一個存儲空間，改變_y也會改變y,但你可以寫成_y = np.array(y),這時_y和y的存儲空間是獨立的 hideOutZ = [np.array(_z) for i in range(hideCellNum + 1)] hideDeltaWeightZ = [np.array(_z) for i in range(hideCellNum)] hideDeltaBiasZ = [np.array(_z) for i in range(hideCellNum)] outWeightZ = [np.array(_z) for i in range(hideCellNum)] outDeltaWeightZ = [np.array(_z) for i in range(hideCellNum)] plt.close() # clf() # 清圖 cla() # 清坐標軸 close() # 關窗口 plt.grid(True) # 添加網格 plt.ion() # interactive mode on plt.figure(1) # 創建圖表1 ax1 = plt.subplot(221) # 在圖表2中創建子圖1 ax2 = plt.subplot(222) # 在圖表2中創建子圖2 ax3 = plt.subplot(223) # 在圖表2中創建子圖3 ax4 = plt.subplot(224) # 在圖表2中創建子圖4 # ax.axis("equal") # 設置圖像顯示的時候XY軸比例 for t in range(len(x)): # 初始化初值 inputLayer.setInputAndForward([x[t]]) loss.minimize(y[t], speed) for j in range(len(hideLayer1.cells)): hideOutZ[j][t] = hideLayer1.cells[j].out * outputLayer.cells[0].w[j] hideDeltaWeightZ[j][t] = hideLayer1.cells[j].delta_w[0] hideDeltaBiasZ[j][t] = hideLayer1.cells[j].delta_b outDeltaWeightZ[j][t] = outputLayer.cells[0].delta_w[j] outWeightZ[j][t] = outputLayer.cells[0].w[j] hideOutZ[hideCellNum][t] = outputLayer.cells[0].b _z[t] = outputLayer.cells[0].out for loop in range(10000): for epoch in range(30): # t = int(random.uniform(0,1)*10000000)%len(x) for t in range(len(x)): inputLayer.setInputAndForward([x[t]]) loss.minimize(y[t], speed) if (epoch == 1): # True:#True:# inputLayer.setInputAndForward([x[t]]) for j in range(len(hideLayer1.cells)): hideDeltaWeightZ[j][t] = hideLayer1.cells[j].delta_w[0] hideDeltaBiasZ[j][t] = hideLayer1.cells[j].delta_b outDeltaWeightZ[j][t] = outputLayer.cells[0].delta_w[j] outWeightZ[j][t] = outputLayer.cells[0].w[j] for n in range(len(x)): inputLayer.setInputAndForward([x[n]]) for j in range(len(hideLayer1.cells)): hideOutZ[j][n] = hideLayer1.cells[j].out * outputLayer.cells[0].w[j] hideOutZ[hideCellNum][n] = outputLayer.cells[0].b _z[n] = outputLayer.cells[0].sum if (t != len(x) - 1): # 將此處注釋，可以實時看到每一次訓練的變化過程 continue ax1.clear() ax1.set_title( result loop: + str(loop) + Cell: + str(hideCellNum)) # 目標函數，補經網路的輸出，以及隱含層每個神經元的輸出乘以相應w權重 ax2.clear() ax2.set_title(hide layer △w) ax3.clear() ax3.set_title(hide layer △b) ax4.clear() ax4.set_title(target layer △w) for j in range(len(hideOutZ)): ax1.plot(x, hideOutZ[j]) ax1.plot(x, orig_y) # ,-o ax1.plot(x, _z) ax1.plot([x[t], x[t]], [np.min(_z[t]), np.max(y[t])]) for j in range(len(hideDeltaWeightZ)): ax2.plot(x, hideDeltaWeightZ[j]) ax3.plot(x, hideDeltaBiasZ[j]) # ax4.plot(x, outWeightZ[j]) ax4.plot(x, outDeltaWeightZ[j]) ax2.plot([x[t], x[t]], [np.min(hideDeltaWeightZ), np.max(hideDeltaWeightZ)]) ax3.plot([x[t], x[t]], [np.min(hideDeltaBiasZ), np.max(hideDeltaBiasZ)]) plt.pause(0.1)def run3D_DNN(): hideCellNum = 5 # 隱含層神經元數目 speed = 0.001 # 不要小看這個speed,選擇過大的時候，非常容易造成遞度爆炸，比如你可以試試speed為1，Relu的訓練 inputLayer = Layer(None, 2, None) # 第一層，沒有上一層，沒有激活函數，輸入單元的個數為1 inputRange = 0.5 ##############單隱含層的物理結構如下,一個輸入單元，hideCellNum個隱含層神經單元，一個輸出單元，最後一個輸出單用的是線性神經元，loss函數用的是L2距離 # /-- 0 -- # (x) 0 --- 0 -- 0 (y) # -- 0 --/ # # hideLayer1 = Layer(inputLayer, hideCellNum, ActivationXX(15, 1)) # hideLayer1 = Layer(inputLayer, hideCellNum, ActivationNormal(15, 1)) # hideLayer1 = Layer(inputLayer, hideCellNum, ActivationSin(35, 1)) # hideLayer1 = Layer(inputLayer,hideCellNum,ActivationNormal(2,0.5),True) _hideLayer = Layer(inputLayer, hideCellNum, ActivationRelu(1, 0.2), True) hideLayer = Layer(_hideLayer, hideCellNum, ActivationRelu(1, 0.2), True) hideLayer1 = Layer(hideLayer, hideCellNum, ActivationRelu(1, 0.2)) # hideLayer2 = Layer(hideLayer1, hideCellNum, ActivationRelu()) #我們同樣可以進行多層的神經網路 # hideLayer3 = Layer(hideLayer2, hideCellNum, ActivationRelu()) outputLayer = Layer(hideLayer1, 1, ActivationLiner(1, 0)) loss = LossL2(outputLayer) # X = np.arange(-1, 1, 0.4) # Y = np.arange(-1, 1, 0.4) X = np.arange(-inputRange, inputRange + 0.000001, inputRange / 2) # 兩個點 Y = np.arange(-inputRange, inputRange + 0.000001, inputRange / 2) x, y = np.meshgrid(X, Y) subX = np.arange(-inputRange, inputRange, 0.1) # 主要是用來畫圖用的 subY = np.arange(-inputRange, inputRange, 0.1) subx, suby = np.meshgrid(subX, subY) subz = subx + suby orig_z = 2 * np.sin(7 * x) + 1 * (y - 3) * x + 2 # 調sin（）裡面的係數，可以控制輸出的周期幅度 orig_z = [[1, 0, 0, 1, 1], [1, 1, 0, 1, 0], [1, 0, 0, 1, 0], [1, 1, 0, 0, 1], [1, 0, 0, 1, 0]] z = orig_z # 1/(1 + np.exp(-orig_y)) #如果最後一層是sigmoid,這裡就可以再用sigmoid處理一下，如果最後一層是Liner,直接用原始的即可 # print(x) # print(z) _z = np.array(subz) # 千萬不要寫_y = y 這種愚蠢的寫法，這種寫法，_y和y會共用同一個存儲空間，改變_y也會改變y,但你可以寫成_y = np.array(y),這時_y和y的存儲空間是獨立的 hideOutZ = [np.array(_z) for i in range(hideCellNum + 1)] hideDeltaWeightZ = [np.array(_z) for i in range(hideCellNum)] hideDeltaBiasZ = [np.array(_z) for i in range(hideCellNum)] outWeightZ = [np.array(_z) for i in range(hideCellNum)] outDeltaWeightZ = [np.array(_z) for i in range(hideCellNum)] plt.close() # clf() # 清圖 cla() # 清坐標軸 close() # 關窗口 plt.grid(True) # 添加網格 plt.ion() # interactive mode on fig = plt.figure(1) # ax = Axes3D(fig) ax1 = plt.axes(projection=3d) fig = plt.figure(2) # ax = Axes3D(fig) ax2 = plt.axes(projection=3d) # plt.figure(1) # 創建圖表1 # ax1 = plt.subplot(221) # 在圖表2中創建子圖1 # ax2 = plt.subplot(222) # 在圖表2中創建子圖2 # ax3 = plt.subplot(223) # 在圖表2中創建子圖3 # ax4 = plt.subplot(224) # 在圖表2中創建子圖4 # # ax.axis("equal") # 設置圖像顯示的時候XY軸比例 for loop in range(10000): for epoch in range(30): # t = int(random.uniform(0,1)*10000000)%len(x) for t in range(len(X)): for u in range(len(Y)): inputLayer.setInputAndForward([X[t], Y[u]]) loss.minimize(z[t][u], speed) if (epoch == 1): # True:#True:# for t in range(len(subX)): for u in range(len(subY)): inputLayer.setInputAndForward([subX[t], subY[u]]) for j in range(len(hideLayer1.cells)): hideDeltaWeightZ[j][t] = hideLayer1.cells[j].delta_w[0] hideDeltaBiasZ[j][t] = hideLayer1.cells[j].delta_b outDeltaWeightZ[j][t] = outputLayer.cells[0].delta_w[j] outWeightZ[j][t] = outputLayer.cells[0].w[j] n, m = t, u hideOutZ[j][n][m] = hideLayer1.cells[j].out * outputLayer.cells[0].w[j] _z[n][m] = outputLayer.cells[0].sum hideOutZ[hideCellNum][n][m] = outputLayer.cells[0].b ax1.clear() ax2.clear() ax1.set_title( sub loop: + str(loop) + Cell: + str(hideCellNum)) # 目標函數，補經網路的輸出，以及隱含層每個神經元的輸出乘以相應w權重 ax2.plot_surface(x, y, orig_z) ax2.set_title(result loop: + str(loop) + Cell: + str(hideCellNum)) ax2.plot_surface(subx, suby, _z) # , rstride=1, cstride=1, cmap=rainbow for j in range(len(hideOutZ)): ax1.plot_surface(subx, suby, hideOutZ[j]) # ax1.clear() # ax1.set_title(result loop: + str(loop) + Cell: + str(hideCellNum)) #目標函數，補經網路的輸出，以及隱含層每個神經元的輸出乘以相應w權重 # ax2.clear() # ax2.set_title(hide layer △w) # ax3.clear() # ax3.set_title(hide layer △b) # ax4.clear() # ax4.set_title(target layer △w) # for j in range(len(hideOutZ)): # ax1.plot(x, hideOutZ[j]) # # ax1.plot(x, orig_y) # ax1.plot(x, _z) # ax1.plot([x[t],x[t]],[np.min(_z[t]),np.max(y[t])]) # for j in range(len(hideDeltaWeightZ)): # ax2.plot(x, hideDeltaWeightZ[j]) # ax3.plot(x, hideDeltaBiasZ[j]) # # ax4.plot(x, outWeightZ[j]) # ax4.plot(x, outDeltaWeightZ[j]) # # ax2.plot([x[t], x[t]], [np.min(hideDeltaWeightZ), np.max(hideDeltaWeightZ)]) # ax3.plot([x[t], x[t]], [np.min(hideDeltaBiasZ), np.max(hideDeltaBiasZ)]) plt.pause(0.1) def walk_dir(dir,filter = None,topdown=True): points = [] number_2 = [[1,0,0,0,1], [1,1,1,0,1], [1,0,0,0,1], [1,0,1,1,1], [1,0,0,0,1] ] for root, dirs, files in os.walk(dir, topdown): for name in files: if not filter or os.path.splitext(name)[1] in filter: path = os.path.join(root,name) im = Image.open(path) im = im.convert("L") width = im.width height = im.height integer = 0 for i in range(width): for j in range(height): value = 1 if im.getpixel((i,j)) > 128 else 0 integer += 0 if value == number_2[j][i] else 1 # points.append({x: 1.0*integer/(1<< (width*height)),y:int(root[-1])}) points.append({x: integer, y: int(root[-1])}) points.sort(key=lambda d:d["x"]) x = [] y = [] for point in points: # x.append(math.log(1 + point[x],2)) x.append(point[x]) y.append(point[y]) return (x,y)if __name__ == "__main__": # run3DDraw() # run3DDraw() # run3D_DNN() # run2D_DNN() run2D_DNN()