深度學習(Deep Learning)基礎概念3:神經網路實現邏輯回歸模型測試題及解答
此專欄文章隨時更新編輯,如果你看到的文章還沒寫完,那麼多半是作者正在更新或者上一次沒有更新完,請耐心等待,正常的頻率是每天更新一篇文章。
此文章主要是吳恩達在Cursera上的系列課程「深度學習(DeepLearning)」的學習筆記,這一篇是關於第二周課程的筆記,首發於知乎的專欄「深度學習+自然語言處理(NLP)」。
該系列文章的目的在於理清深度學習進的一些基本概念。
以下是正文:
====================================================================
先解釋一下題目:神經網路實現邏輯回歸模型。
這裡可以理解為用神經網路解決邏輯回歸問題。
進一步解釋就是,考慮我們有m個樣本 , , ,
每一個樣本的y都是一個邏輯判斷(不是0就是1),所以叫邏輯回歸問題。
我們想構建一個神經網路模型解決該問題,也就是利用神經網路的方法確定邏輯回歸模型的參數。
以下是本周的10個測試題的解答:
1。What does a neuron compute?
- A neuron computes the mean of all features before applying the output to an activation function
- A neuron computes a function g that scales the input x linearly (Wx + b)
- A neuron computes an activation function followed by a linear function (z = Wx + b)
- A neuron computes a linear function (z = Wx + b) followed by an activation function
一個神經元先計算線性函數(linear function),然後計算激活函數(activation function)。
也就是說,輸入是x的話,先計算 z = Wx + b,再把z作為輸入計算sigmoid(z),顯然這裡我們假設激活函數是sigmoid。
所以答案是d。
2。Which of these is the "Logistic Loss"?
- L(i)(y^(i),y(i))=∣y(i)?y^(i)∣
- L(i)(y^(i),y(i))=?(y(i)log(y^(i))+(1?y(i))log(1?y^(i)))
- L(i)(y^(i),y(i))=∣y(i)?y^(i)∣2
- L(i)(y^(i),y(i))=max(0,y(i)?y^(i))
這道題是關於損失函數(loss function),在邏輯回歸神經網路中,損失函數是 ,進一步推導得到答案b(推導過程見可選課程視頻)。
3。Suppose img is a (32,32,3) array, representing a 32x32 image with 3 color channels red, green and blue. How do you reshape this into a column vector?
- x = img.reshape((3,32*32))
- x = img.reshape((32*32,3))
- x = img.reshape((32*32*3,1))
- x = img.reshape((1,32*32,*3))
這道題是關於如何把一個圖像轉換成一個輸入列向量,既然是列向量,那麼列就是1,行數等於所有像素數32*32*3,答案是3
4。Consider the two following random arrays "a" and "b":
a = np.random.randn(2, 3) # a.shape = (2, 3)nb = np.random.randn(2, 1) # b.shape = (2, 1)nc = a + bn
What will be the shape of "c"?
- c.shape = (2, 3)
- c.shape = (3, 2)
- c.shape = (2, 1)
- The computation cannot happen because the sizes dont match. Its going to be "Error"!
這道題涉及到廣播(bradcasting)的概念,首先把b擴展為(2,3)的矩陣,然後與a相加,所以答案是c。
5。Consider the two following random arrays "a" and "b":
a = np.random.randn(4, 3) # a.shape = (4, 3)nb = np.random.randn(3, 2) # b.shape = (3, 2)nc = a*bn
What will be the shape of "c"?
- The computation cannot happen because the sizes dont match. Its going to be "Error"!
- c.shape = (4,2)
- c.shape = (4, 3)
- c.shape = (3, 3)
同樣是廣播(bradcasting)的概念,這裡a和b的尺寸不同,所以無法應用廣播。
進一步解釋,a和b對應維度上的尺寸必須滿足:要麼相同,要麼其中有一個是1。
比如說a的行數是4,那麼b的行數要麼是4,要麼是1,而這裡,b的行數是3,所以尺寸不匹配。答案是a。
6。Suppose you have nx input features per example. Recall that X=[x(1)x(2)...x(m)]. What is the dimension of X?
- (nx,m)
- (m,nx)
- (m,1)
- (1,m)
答案是a,因為x是行數為nx的行向量,X的列數是m,所以(nx,m)。
7。Recall that "np.dot(a,b)" performs a matrix multiplication on a and b, whereas "a*b" performs an element-wise multiplication.
Consider the two following random arrays "a" and "b":
a = np.random.randn(12288, 150) # a.shape = (12288, 150)nb = np.random.randn(150, 45) # b.shape = (150, 45)nc = np.dot(a,b)n
What is the shape of c?
- The computation cannot happen because the sizes dont match. Its going to be "Error"!
- c.shape = (150,150)
- c.shape = (12288, 45)
- c.shape = (12288, 150)
注意這道題與前面考察廣播概念的題的區別:如果直接用+-*/這些符號,那麼符合應用廣播的條件,如果使用函數(例如這裡的np.dot()),那麼符合應用矩陣運算的規則。
這裡a的列數等於b的行數,符合矩陣運算的規則,答案是c
8。Consider the following code snippet:
# a.shape = (3,4)n# b.shape = (4,1)nnfor i in range(3):n for j in range(4):n c[i][j] = a[i][j] + b[j]n
How do you vectorize this?
- c = a + b
- c = a.T + b.T
- c = a + b.T
- c = a.T + b
怎麼使用矩陣運算代替for循環,這裡的技巧是觀察c的尺寸,再考慮什麼樣的兩個矩陣相加能夠得到這樣尺寸的c(還要考慮廣播的規則),答案是c。
9。Consider the following code:
a = np.random.randn(3, 3)nb = np.random.randn(3, 1)nc = a*bn
What will be c? (If you』re not sure, feel free to run this in python to find out).
- This will invoke broadcasting, so b is copied three times to become (3,3), and ? is an element-wise product so c.shape will be (3, 3)
- This will invoke broadcasting, so b is copied three times to become (3, 3), and ?invokes a matrix multiplication operation of two 3x3 matrices so c.shape will be (3, 3)
- This will multiply a 3x3 matrix a with a 3x1 vector, thus resulting in a 3x1 vector. That is, c.shape = (3,1).
- It will lead to an error since you cannot use 「*」 to operate on these two matrices. You need to instead use np.dot(a,b)
這裡同樣是理解廣播的概念,這裡對b進行擴展為 (3, 3)的矩陣後,不是與a進行矩陣運算,而是對應元素相乘。答案是a。
10。Consider the following computation graph.
What is the output J?
J = (c - 1)*(b + a)
J = (a - 1) * (b + c)
J = a*b + b*c + a*c
J = (b - 1) * (c + a)
歡迎關注專欄:「深度學習+自然語言處理(NLP)」,這裡有更多有價值的文章!
歡迎關注專欄:「深度學習+自然語言處理(NLP)」,這裡有更多有價值的文章!
歡迎關注專欄:「深度學習+自然語言處理(NLP)」,這裡有更多有價值的文章!
推薦閱讀:
※都是假的---生成式對抗網路GAN完全指南終極版(一)
※為什麼你需要計算神經科學
※Andrew NG 深度學習課程筆記:神經網路、有監督學習與深度學習
※淺談神經網路中的梯度爆炸問題
TAG:深度学习DeepLearning | 机器学习 | 神经网络 |