深度學習（Deep Learning）基礎概念3：神經網路實現邏輯回歸模型測試題及解答

01-29

此專欄文章隨時更新編輯，如果你看到的文章還沒寫完，那麼多半是作者正在更新或者上一次沒有更新完，請耐心等待，正常的頻率是每天更新一篇文章。

此文章主要是吳恩達在Cursera上的系列課程「深度學習（DeepLearning）」的學習筆記，這一篇是關於第二周課程的筆記，首發於知乎的專欄「深度學習+自然語言處理（NLP）」。

該系列文章的目的在於理清深度學習進的一些基本概念。

以下是正文：

====================================================================

先解釋一下題目：神經網路實現邏輯回歸模型。

這裡可以理解為用神經網路解決邏輯回歸問題。

進一步解釋就是，考慮我們有m個樣本 $(x_1,y_1)$ , $(x_2,y_2)$ , $dots$ , $(x_m,y_m)$

每一個樣本的y都是一個邏輯判斷（不是0就是1），所以叫邏輯回歸問題。

我們想構建一個神經網路模型解決該問題，也就是利用神經網路的方法確定邏輯回歸模型的參數。

以下是本周的10個測試題的解答：

1。What does a neuron compute?

A neuron computes the mean of all features before applying the output to an activation function
A neuron computes a function g that scales the input x linearly (Wx + b)
A neuron computes an activation function followed by a linear function (z = Wx + b)
A neuron computes a linear function (z = Wx + b) followed by an activation function

一個神經元先計算線性函數（linear function），然後計算激活函數（activation function）。

也就是說，輸入是x的話，先計算 z = Wx + b，再把z作為輸入計算sigmoid（z），顯然這裡我們假設激活函數是sigmoid。

所以答案是d。

2。Which of these is the "Logistic Loss"?

L(i)(y^(i),y(i))=∣y(i)?y^(i)∣
L(i)(y^(i),y(i))=?(y(i)log(y^(i))+(1?y(i))log(1?y^(i)))
L(i)(y^(i),y(i))=∣y(i)?y^(i)∣2
L(i)(y^(i),y(i))=max(0,y(i)?y^(i))

這道題是關於損失函數（loss function），在邏輯回歸神經網路中，損失函數是 $-ylog hat y$ ,進一步推導得到答案b（推導過程見可選課程視頻）。

3。Suppose img is a (32,32,3) array, representing a 32x32 image with 3 color channels red, green and blue. How do you reshape this into a column vector?

x = img.reshape((3,32*32))
x = img.reshape((32*32,3))
x = img.reshape((32*32*3,1))
x = img.reshape((1,32*32,*3))

這道題是關於如何把一個圖像轉換成一個輸入列向量，既然是列向量，那麼列就是1，行數等於所有像素數32*32*3，答案是3

4。Consider the two following random arrays "a" and "b":

a = np.random.randn(2, 3) # a.shape = (2, 3)nb = np.random.randn(2, 1) # b.shape = (2, 1)nc = a + bn

What will be the shape of "c"?

c.shape = (2, 3)
c.shape = (3, 2)
c.shape = (2, 1)
The computation cannot happen because the sizes dont match. Its going to be "Error"!

這道題涉及到廣播（bradcasting）的概念，首先把b擴展為（2,3）的矩陣，然後與a相加，所以答案是c。

5。Consider the two following random arrays "a" and "b":

a = np.random.randn(4, 3) # a.shape = (4, 3)nb = np.random.randn(3, 2) # b.shape = (3, 2)nc = a*bn

What will be the shape of "c"?

The computation cannot happen because the sizes dont match. Its going to be "Error"!
c.shape = (4,2)
c.shape = (4, 3)
c.shape = (3, 3)

同樣是廣播（bradcasting）的概念，這裡a和b的尺寸不同，所以無法應用廣播。

進一步解釋，a和b對應維度上的尺寸必須滿足：要麼相同，要麼其中有一個是1。

比如說a的行數是4，那麼b的行數要麼是4，要麼是1，而這裡，b的行數是3，所以尺寸不匹配。答案是a。

6。Suppose you have nx input features per example. Recall that X=[x(1)x(2)...x(m)]. What is the dimension of X?

(nx,m)
(m,nx)
(m,1)
(1,m)

答案是a，因為x是行數為nx的行向量，X的列數是m，所以(nx,m)。

7。Recall that "np.dot(a,b)" performs a matrix multiplication on a and b, whereas "a*b" performs an element-wise multiplication.

Consider the two following random arrays "a" and "b":

a = np.random.randn(12288, 150) # a.shape = (12288, 150)nb = np.random.randn(150, 45) # b.shape = (150, 45)nc = np.dot(a,b)n

What is the shape of c?

The computation cannot happen because the sizes dont match. Its going to be "Error"!
c.shape = (150,150)
c.shape = (12288, 45)
c.shape = (12288, 150)

注意這道題與前面考察廣播概念的題的區別：如果直接用+-*/這些符號，那麼符合應用廣播的條件，如果使用函數（例如這裡的np.dot()），那麼符合應用矩陣運算的規則。

這裡a的列數等於b的行數，符合矩陣運算的規則，答案是c

8。Consider the following code snippet:

# a.shape = (3,4)n# b.shape = (4,1)nnfor i in range(3):n for j in range(4):n c[i][j] = a[i][j] + b[j]n

How do you vectorize this?

c = a + b
c = a.T + b.T
c = a + b.T
c = a.T + b

怎麼使用矩陣運算代替for循環，這裡的技巧是觀察c的尺寸，再考慮什麼樣的兩個矩陣相加能夠得到這樣尺寸的c(還要考慮廣播的規則)，答案是c。

9。Consider the following code:

a = np.random.randn(3, 3)nb = np.random.randn(3, 1)nc = a*bn

What will be c? (If you』re not sure, feel free to run this in python to find out).

This will invoke broadcasting, so b is copied three times to become (3,3), and ? is an element-wise product so c.shape will be (3, 3)
This will invoke broadcasting, so b is copied three times to become (3, 3), and ?invokes a matrix multiplication operation of two 3x3 matrices so c.shape will be (3, 3)
This will multiply a 3x3 matrix a with a 3x1 vector, thus resulting in a 3x1 vector. That is, c.shape = (3,1).
It will lead to an error since you cannot use 「*」 to operate on these two matrices. You need to instead use np.dot(a,b)

這裡同樣是理解廣播的概念，這裡對b進行擴展為 (3, 3)的矩陣後，不是與a進行矩陣運算，而是對應元素相乘。答案是a。

10。Consider the following computation graph.

What is the output J?

J = (c - 1)*(b + a)

J = (a - 1) * (b + c)

J = a*b + b*c + a*c

J = (b - 1) * (c + a)

歡迎關注專欄：「深度學習+自然語言處理（NLP）」，這裡有更多有價值的文章！