自然語言處理工具:中文 word2vec 開源項目,教程,數據集
中文 word2vec
開源項目
Chinese word vectors
This project uses Word2vec and GloVe tools to train word vectors for Chinese using data from wikipedia dump.
https://github.com/candlewill/Chinsese_word_vectors
wordvectors
Pre-trained word vectors of 30+ languages
https://github.com/Kyubyong/wordvectors
chinese-word2vec
word2vec/glove/swivel binary file on chinese corpus
https://github.com/to-shimo/chinese-word2vec
教程
維基百科語料中的詞語相似度探索
gensim | 我愛自然語言處理
利用word2vec對關鍵詞進行聚類
利用word2vec對關鍵詞進行聚類 - CSDN博客
Training Word2Vec Model on English Wikipedia by Gensim
Training Word2Vec Model on English Wikipedia by Gensim
數據集
wiki
https://dumps.wikimedia.org/zhwiki/latest/zhwiki-latest-pages-articles.xml.bz2
sogou
搜狗實驗室(Sogou Labs)
更多機器學習資源,教程: http://www.tensorflownews.com/
推薦閱讀:
※基於logistic回歸模型的風格輪動預測
※[note]Learning from Simulated and Unsupervised Images through Adversarial Training
※網路表示學習概述