最酷的深度學習聊天機器人資源集合

Seq2seq/chatbot/聊天機器人相關資源收集列表。

在原來的chatbot-links的基礎上,添加了一些更多的中文資源而組建的一個List,主要用於個人收集。感興趣,一起建設完善的資源列表,方便萌新入門,請發送PR或Issue添加更多資源,各層面不限,歡迎添加自己的作品。


Codes

DeepQA

Framework: TensorflowDemo result:

Hi → Hi.What is your name ? → Laura.What does that mean ? → I dunno.How old are you ? → thirty-five.Who is Laura ? → My brother.Say goodbye → Alright.Two plus two → manny...

tf_seq2seq_chatbot

Framework: Tensorflow

No answering randomisation is implemented in this code, so the models answers with the same phrase way each time:未實現回答的是隨機化,僅對模型的末尾加入了稀疏層。

hello baby → hellohow old are you ? → twenty .i am lonely → i am notnice → you re not going to be okay .so rude → i m sorry .are you a robot or human? → no .are you better than siri? → yes .

machine translation model

Framework: Tensorflow

Google Official seq2seq implementation, attention included. Originlly for translation, can be used as Q/A;Google官方的seq2seq實現,採用了注意力機制(Luong et al., 2015),原本用於語言翻譯,也適用於簡單Q/A。

Neural-Dialogue-Generation

Framework: Torch 6.xSource code for a bunch of super nice articles by Jiwei Li - one of my favorite researches in the domain of neural conversational systems. Worth checking out.

tell me ... how long have you had this falling sickness ? → a few months, I guess .so i had the doctors test sammy 』s response to conditioning . → so he took the pills .what are you going to do ? → i 』m going to get you a new car .they fear your power your intellect . → i 』m not afraid of your power .

neuralconvo

Framework: Torch 7

最早的對《A Neural Conversational Model》的實現之一,當時給的Demo效果非常有意思。

Hello? → Hi.How are you? → Im fine.Whats your name? → Its hard to describe.How so? → Im not sure.What color is the sky? → Its blue.What is your job? → Its not that im a fucking werewolf!

chatbot-zh-torch7

Framework: Torch 7

從neuralconvo修改而來的,當時應該是第一個中文的RNN based chatbot DEMO。

Seq2seq_Chatbot_QA

Framework: Tensorflow一個中文的Demo,說明比較詳細,推薦。

result:

你到哪裡去 →你不是說你不是我的你喜歡我嗎? →我喜歡你你吃了嗎? →我還沒吃飯呢你喜歡喝酒嗎? →我不知道你說話沒有邏輯啊 →沒有

ChatterBot

Framework: python

看上去比較完整的一個項目,安裝簡單,說明比較完善,有提供訓練的材料,英文為主。

Good morning! How are you doing? →I am doing very well, thank you for asking.Youre welcome. →Do you like hats?

farizrahman4u/seq2seq

nicolas-ivanov/debug_seq2seq

codekansas/keras-language-modeling

Framework: keras

這幾個都是seq2seq的實現,但是在Chatbot上面還沒有好的demo result。

oswaldoludwig/Seq2seq-Chatbot-for-Keras

A new generative chatbot whose training converges in few epochs, including a model pre-trained on a small but consistent dataset collected from dialogues of English courses online.

這個採用了生成模型的方式,待補充。

user: Hello! How are you?computer: i am fine , ok , well , thank you . user: Whats your name?computer: janet . user: What do you like to do now?computer: let me think a minute . user: I want to chat about computer science.computer: i am glad to hear there . user: Are you a real girl or a computer program?computer: i am self employed . i would love to be well . user: Why dont you look for a job?computer: i do not know . i am the bank . user: So, you dont need a job!computer: no . i have to make a lot of money .

Corpus

AlJohri/OpenSubtitles

Get a lot of raw movie subtitles (~1.2Gb)

Cornell Movie-Dialogs Corpus

~ 40Mb after clearing out the technical data.

dgk_lost_conv

[中文]語料。大部分為由字幕生成的材料,少量其它對話(如以前的小黃雞的材料,我從一位網友朋友那裡要過來了,感謝他)。其中results/xiaohuangji50w_fenciA.conv.zip為上面chatbot-zh-torch7的演示的訓練材料。

[原射手網的打包字幕合集17G]

現已關閉的射手網有一個所有字幕的合集包,感興趣的同學需要自行網上搜索下載。

Some English QA Material

這是他人收集的自然語言處理相關數據集,主要包含Question Answering,Dialogue Systems, Goal-Oriented Dialogue Systems三部分,都是英文文本。可以使用機器翻譯為中文,供中文對話使用。

TODO

dgk_lost_conv中字幕生成的材料的問題是質量較差,這是因為字幕文件中包含了很多的旁白,或者單人連續說話的情況,而這些在處理的時候都沒有剔除掉。希望有同學能夠找到方法。或者從微博、QQ群、微信群等地方挖掘更多的1v1的對話材料。

Papers

  • [[1] Sequence to Sequence Learning with Neural Networks][1]
  • [[2] A Neural Conversational Model][2]

1papers.nips.cc/paper/53

2arxiv.org/pdf/1506.0586

貢獻列表

fateleak

廣告:

為了方便中文用戶中對chatbot/NLP/DeepLearning感興趣的朋友們互相交流,建了一個QQ群,歡迎您加入討論:

[QQ群](qm.qq.com/cgi-bin/qm/qr?)

fateleak/awesome-chatbot-list?

github.com圖標A.F.C App?

afcapp.boringuniverse.com圖標
推薦閱讀:

機器學習筆記
值得收藏的45個Python優質資源(附鏈接)
數據挖掘有哪些常見的應用模型?
CS231N 課程筆記合集
微軟宣布在機器中英雙語翻譯領域取得突破性進展

TAG:人工智慧 | 機器學習 | 自然語言處理 |