最酷的深度學習聊天機器人資源集合
Seq2seq/chatbot/聊天機器人相關資源收集列表。
在原來的chatbot-links的基礎上,添加了一些更多的中文資源而組建的一個List,主要用於個人收集。感興趣,一起建設完善的資源列表,方便萌新入門,請發送PR或Issue添加更多資源,各層面不限,歡迎添加自己的作品。
Codes
DeepQA
Framework: TensorflowDemo result:
Hi → Hi.What is your name ? → Laura.What does that mean ? → I dunno.How old are you ? → thirty-five.Who is Laura ? → My brother.Say goodbye → Alright.Two plus two → manny...
tf_seq2seq_chatbot
Framework: Tensorflow
No answering randomisation is implemented in this code, so the models answers with the same phrase way each time:未實現回答的是隨機化,僅對模型的末尾加入了稀疏層。
hello baby → hellohow old are you ? → twenty .i am lonely → i am notnice → you re not going to be okay .so rude → i m sorry .are you a robot or human? → no .are you better than siri? → yes .
machine translation model
Framework: Tensorflow
Google Official seq2seq implementation, attention included. Originlly for translation, can be used as Q/A;Google官方的seq2seq實現,採用了注意力機制(Luong et al., 2015),原本用於語言翻譯,也適用於簡單Q/A。
Neural-Dialogue-Generation
Framework: Torch 6.xSource code for a bunch of super nice articles by Jiwei Li - one of my favorite researches in the domain of neural conversational systems. Worth checking out.
tell me ... how long have you had this falling sickness ? → a few months, I guess .so i had the doctors test sammy 』s response to conditioning . → so he took the pills .what are you going to do ? → i 』m going to get you a new car .they fear your power your intellect . → i 』m not afraid of your power .
neuralconvo
Framework: Torch 7
最早的對《A Neural Conversational Model》的實現之一,當時給的Demo效果非常有意思。
Hello? → Hi.How are you? → Im fine.Whats your name? → Its hard to describe.How so? → Im not sure.What color is the sky? → Its blue.What is your job? → Its not that im a fucking werewolf!
chatbot-zh-torch7
Framework: Torch 7
從neuralconvo修改而來的,當時應該是第一個中文的RNN based chatbot DEMO。
Seq2seq_Chatbot_QA
Framework: Tensorflow一個中文的Demo,說明比較詳細,推薦。
result:
你到哪裡去 →你不是說你不是我的你喜歡我嗎? →我喜歡你你吃了嗎? →我還沒吃飯呢你喜歡喝酒嗎? →我不知道你說話沒有邏輯啊 →沒有
ChatterBot
Framework: python
看上去比較完整的一個項目,安裝簡單,說明比較完善,有提供訓練的材料,英文為主。
Good morning! How are you doing? →I am doing very well, thank you for asking.Youre welcome. →Do you like hats?
farizrahman4u/seq2seq
nicolas-ivanov/debug_seq2seq
codekansas/keras-language-modeling
Framework: keras
這幾個都是seq2seq的實現,但是在Chatbot上面還沒有好的demo result。
oswaldoludwig/Seq2seq-Chatbot-for-Keras
A new generative chatbot whose training converges in few epochs, including a model pre-trained on a small but consistent dataset collected from dialogues of English courses online.
這個採用了生成模型的方式,待補充。
user: Hello! How are you?computer: i am fine , ok , well , thank you . user: Whats your name?computer: janet . user: What do you like to do now?computer: let me think a minute . user: I want to chat about computer science.computer: i am glad to hear there . user: Are you a real girl or a computer program?computer: i am self employed . i would love to be well . user: Why dont you look for a job?computer: i do not know . i am the bank . user: So, you dont need a job!computer: no . i have to make a lot of money .
Corpus
AlJohri/OpenSubtitles
Get a lot of raw movie subtitles (~1.2Gb)
Cornell Movie-Dialogs Corpus
~ 40Mb after clearing out the technical data.
dgk_lost_conv
[中文]語料。大部分為由字幕生成的材料,少量其它對話(如以前的小黃雞的材料,我從一位網友朋友那裡要過來了,感謝他)。其中results/xiaohuangji50w_fenciA.conv.zip為上面chatbot-zh-torch7的演示的訓練材料。
[原射手網的打包字幕合集17G]
現已關閉的射手網有一個所有字幕的合集包,感興趣的同學需要自行網上搜索下載。
Some English QA Material
這是他人收集的自然語言處理相關數據集,主要包含Question Answering,Dialogue Systems, Goal-Oriented Dialogue Systems三部分,都是英文文本。可以使用機器翻譯為中文,供中文對話使用。
TODO
dgk_lost_conv中字幕生成的材料的問題是質量較差,這是因為字幕文件中包含了很多的旁白,或者單人連續說話的情況,而這些在處理的時候都沒有剔除掉。希望有同學能夠找到方法。或者從微博、QQ群、微信群等地方挖掘更多的1v1的對話材料。
Papers
- [[1] Sequence to Sequence Learning with Neural Networks][1]
- [[2] A Neural Conversational Model][2]
1http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf
2http://arxiv.org/pdf/1506.05869v1.pdf
貢獻列表
fateleak
廣告:
為了方便中文用戶中對chatbot/NLP/DeepLearning感興趣的朋友們互相交流,建了一個QQ群,歡迎您加入討論:
[QQ群](http://qm.qq.com/cgi-bin/qm/qr?k=RDatP5GTRU0sbPt1znatqs68jQtQsPsV)
fateleak/awesome-chatbot-listA.F.C App推薦閱讀:
※機器學習筆記
※值得收藏的45個Python優質資源(附鏈接)
※數據挖掘有哪些常見的應用模型?
※CS231N 課程筆記合集
※微軟宣布在機器中英雙語翻譯領域取得突破性進展