CS224N Lecture3 筆記

02-12

主要內容：

Why do we need approximations of the original skip-gram formulation?

How is the problem mitigated by using negative sampling?

How is CBOW different from skip-gram?

What are the limitations of using SVD on the co-occurrence matrix to

get word vectors?

How does GloVe combine the advantages of count-based models and

predictive models?

How to evaluate word vectors? What are the most commonly used tasks

for intrinsic evaluation?

Which factors could affect the quality of learned vectors?