深度學習(Deep Learning)基礎概念3:神經網路實現邏輯回歸模型測試題及解答

此專欄文章隨時更新編輯,如果你看到的文章還沒寫完,那麼多半是作者正在更新或者上一次沒有更新完,請耐心等待,正常的頻率是每天更新一篇文章。

此文章主要是吳恩達在Cursera上的系列課程「深度學習(DeepLearning)」的學習筆記,這一篇是關於第二周課程的筆記,首發於知乎的專欄「深度學習+自然語言處理(NLP)」。

該系列文章的目的在於理清深度學習進的一些基本概念。

以下是正文:

====================================================================

先解釋一下題目:神經網路實現邏輯回歸模型。

這裡可以理解為用神經網路解決邏輯回歸問題。

進一步解釋就是,考慮我們有m個樣本 (x_1,y_1) , (x_2,y_2) , dots , (x_m,y_m)

每一個樣本的y都是一個邏輯判斷(不是0就是1),所以叫邏輯回歸問題。

我們想構建一個神經網路模型解決該問題,也就是利用神經網路的方法確定邏輯回歸模型的參數

以下是本周的10個測試題的解答:

1。What does a neuron compute?

  • A neuron computes the mean of all features before applying the output to an activation function
  • A neuron computes a function g that scales the input x linearly (Wx + b)
  • A neuron computes an activation function followed by a linear function (z = Wx + b)
  • A neuron computes a linear function (z = Wx + b) followed by an activation function

一個神經元先計算線性函數(linear function),然後計算激活函數(activation function)。

也就是說,輸入是x的話,先計算 z = Wx + b,再把z作為輸入計算sigmoid(z),顯然這裡我們假設激活函數是sigmoid。

所以答案是d。

2。Which of these is the "Logistic Loss"?

  • L(i)(y^(i),y(i))=∣y(i)?y^(i)∣
  • L(i)(y^(i),y(i))=?(y(i)log(y^(i))+(1?y(i))log(1?y^(i)))
  • L(i)(y^(i),y(i))=∣y(i)?y^(i)∣2
  • L(i)(y^(i),y(i))=max(0,y(i)?y^(i))

這道題是關於損失函數(loss function),在邏輯回歸神經網路中,損失函數是 -ylog hat y ,進一步推導得到答案b(推導過程見可選課程視頻)。

3。Suppose img is a (32,32,3) array, representing a 32x32 image with 3 color channels red, green and blue. How do you reshape this into a column vector?

  • x = img.reshape((3,32*32))
  • x = img.reshape((32*32,3))
  • x = img.reshape((32*32*3,1))
  • x = img.reshape((1,32*32,*3))

這道題是關於如何把一個圖像轉換成一個輸入列向量,既然是列向量,那麼列就是1,行數等於所有像素數32*32*3,答案是3

4。Consider the two following random arrays "a" and "b":

a = np.random.randn(2, 3) # a.shape = (2, 3)nb = np.random.randn(2, 1) # b.shape = (2, 1)nc = a + bn

What will be the shape of "c"?

  • c.shape = (2, 3)
  • c.shape = (3, 2)
  • c.shape = (2, 1)
  • The computation cannot happen because the sizes dont match. Its going to be "Error"!

這道題涉及到廣播(bradcasting)的概念,首先把b擴展為(2,3)的矩陣,然後與a相加,所以答案是c。

5。Consider the two following random arrays "a" and "b":

a = np.random.randn(4, 3) # a.shape = (4, 3)nb = np.random.randn(3, 2) # b.shape = (3, 2)nc = a*bn

What will be the shape of "c"?

  • The computation cannot happen because the sizes dont match. Its going to be "Error"!
  • c.shape = (4,2)
  • c.shape = (4, 3)
  • c.shape = (3, 3)

同樣是廣播(bradcasting)的概念,這裡a和b的尺寸不同,所以無法應用廣播。

進一步解釋,a和b對應維度上的尺寸必須滿足:要麼相同,要麼其中有一個是1。

比如說a的行數是4,那麼b的行數要麼是4,要麼是1,而這裡,b的行數是3,所以尺寸不匹配。答案是a。

6。Suppose you have nx input features per example. Recall that X=[x(1)x(2)...x(m)]. What is the dimension of X?

  • (nx,m)
  • (m,nx)
  • (m,1)
  • (1,m)

答案是a,因為x是行數為nx的行向量,X的列數是m,所以(nx,m)。

7。Recall that "np.dot(a,b)" performs a matrix multiplication on a and b, whereas "a*b" performs an element-wise multiplication.

Consider the two following random arrays "a" and "b":

a = np.random.randn(12288, 150) # a.shape = (12288, 150)nb = np.random.randn(150, 45) # b.shape = (150, 45)nc = np.dot(a,b)n

What is the shape of c?

  • The computation cannot happen because the sizes dont match. Its going to be "Error"!
  • c.shape = (150,150)
  • c.shape = (12288, 45)
  • c.shape = (12288, 150)

注意這道題與前面考察廣播概念的題的區別:如果直接用+-*/這些符號,那麼符合應用廣播的條件,如果使用函數(例如這裡的np.dot()),那麼符合應用矩陣運算的規則。

這裡a的列數等於b的行數,符合矩陣運算的規則,答案是c

8。Consider the following code snippet:

# a.shape = (3,4)n# b.shape = (4,1)nnfor i in range(3):n for j in range(4):n c[i][j] = a[i][j] + b[j]n

How do you vectorize this?

  • c = a + b
  • c = a.T + b.T
  • c = a + b.T
  • c = a.T + b

怎麼使用矩陣運算代替for循環,這裡的技巧是觀察c的尺寸,再考慮什麼樣的兩個矩陣相加能夠得到這樣尺寸的c(還要考慮廣播的規則),答案是c。

9。Consider the following code:

a = np.random.randn(3, 3)nb = np.random.randn(3, 1)nc = a*bn

What will be c? (If you』re not sure, feel free to run this in python to find out).

  • This will invoke broadcasting, so b is copied three times to become (3,3), and ? is an element-wise product so c.shape will be (3, 3)
  • This will invoke broadcasting, so b is copied three times to become (3, 3), and ?invokes a matrix multiplication operation of two 3x3 matrices so c.shape will be (3, 3)
  • This will multiply a 3x3 matrix a with a 3x1 vector, thus resulting in a 3x1 vector. That is, c.shape = (3,1).
  • It will lead to an error since you cannot use 「*」 to operate on these two matrices. You need to instead use np.dot(a,b)

這裡同樣是理解廣播的概念,這裡對b進行擴展為 (3, 3)的矩陣後,不是與a進行矩陣運算,而是對應元素相乘。答案是a。

10。Consider the following computation graph.

What is the output J?

J = (c - 1)*(b + a)

J = (a - 1) * (b + c)

J = a*b + b*c + a*c

J = (b - 1) * (c + a)

歡迎關注專欄:「深度學習+自然語言處理(NLP)」,這裡有更多有價值的文章!

歡迎關注專欄:「深度學習+自然語言處理(NLP)」,這裡有更多有價值的文章!

歡迎關注專欄:「深度學習+自然語言處理(NLP)」,這裡有更多有價值的文章!


推薦閱讀:

都是假的---生成式對抗網路GAN完全指南終極版(一)
為什麼你需要計算神經科學
Andrew NG 深度學習課程筆記:神經網路、有監督學習與深度學習
淺談神經網路中的梯度爆炸問題

TAG:深度学习DeepLearning | 机器学习 | 神经网络 |