CS224N Lecture2 筆記
02-12
主要內容:
How is word meaning represented in WordNet?
What is one-hot representation? What limitations does it have?
What is the main idea of skip-gram models (for word2vec)?
What is softmax? Why do we use it?
How to train the model, i.e., to optimize the negative log
likelihood?What is the gradient of the model? How to interpret the compact form
(u_0 - sum_x p(x|c)u_x)?What are the benefits of using SGD rather than GD?
推薦閱讀:
※Inception-v2/v3結構解析(原創)
※使用py-faster-rcnn進行目標檢測(object detect)
※CS224N Lecture1 筆記
※【Neural Networks and Deep Learning】4.神經網路可以計算任何函數的可視化證明(1)
※【Neural Networks and Deep Learning】3.改進神經網路的學習方法(過擬合)