李開復在 CMU Sphinx 項目到底是什麼地位?
李開復的傳記和宣傳文章都是說自己是CMU Sphinx很重要的開發者甚至是創始人,不過最近看洪小文以及黃學東等人的宣傳文章中,大量的都是提他們自己對CMU Sphinx的貢獻,黃學東的文章至少還是提李開復邀請自己進的項目組。談洪小文的文章,就說他是共同創始人,和另外一個同學一起創建了CMU Sphinx,另外一個同學似乎指的是李開復,但是由於微軟和開復的一些歷史糾葛所以故意不寫明?
Sphinx是我的博士論文,當時CMU主流團隊叫做Angel,Sphinx開始時,就是我一個人做的。後來,洪小文剛進入博士班,幫著我做,有些code是他寫的。論文里的想法、實驗都是我自己做的,但是因為洪小文有貢獻,所以在後來出版的一些文章上,除了我和導師的名字,我也有掛他的名字。
畢業後,Sphinx被立項成為CMU主流,我負責這個項目兩年,黃學東是我僱傭的博士後,另外有三、四位成員,包括洪小文。
兩年後,我離開CMU,加入蘋果,黃學東成為這個項目的負責人。
在這件事情上,不要想太多,大家說的都沒有錯誤。Sphinx既是我的博士論文,也是後來組織的名稱。
查ACM和DBLP的記錄,貌似89年開始有李開復和洪小文對sphinx的論文:
Lee, K., Hon, H., and Hwang, M. 1989. Recent progress in the SPHINX Speech Recognition system. In Proceedings of the Workshop on Speech and Natural Language (Philadelphia, Pennsylvania, February 21 - 23, 1989). Association for Computational Linguistics, Stroudsburg, PA, 125-130. DOI= http://dx.doi.org/10.3115/100964.100973
Huang, X. D., Hon, H. W., and Lee, K. F. 1989. Large-vocabulary speaker-independent continuous speech recognition with semi-continuous hidden Markov models. In Proceedings of the Workshop on Speech and Natural Language (Cape Cod, Massachusetts, October 15 - 18, 1989). Association for Computational Linguistics, Stroudsburg, PA, 276-279. DOI= http://dx.doi.org/10.3115/1075434.1075480
Hon, H., Lee, K., and Weide, R. 1989. Towards speech recognition without vocabulary-specific training. In Proceedings of the Workshop on Speech and Natural Language (Cape Cod, Massachusetts, October 15 - 18, 1989). Association for Computational Linguistics, Stroudsburg, PA, 271-275. DOI= http://dx.doi.org/10.3115/1075434.10754
李開復應該在88年開始在cmu做AP,而sphinx是李開復的phd階段的工作,所以貌似洪小文不能算作共同創始人?
SPHINX是李開復在1988發表的博士論文主題(http://portal.acm.org/citation.cfm?id=914540),而洪小文是89才開始出現在之後發表的數篇論文上的,所以看得出來原始想法是李開復的。但後來李開復寫過一本SPHINX的書 "Automatic speech recognition: the development of the SPHINX system",開頭的謝辭(http://goo.gl/tPgR8)有提到洪小文跟他從一開始就有緊密的合作。從洪的Linkedin上可以看出他是86年進CMU,也就是說他們的確可能在那時就開始合作了,只是沒出現在88年的幾篇SPHINX論文上。
(在學術界每個實驗室有不同的論文掛名方式,所以有可能洪有參與實作,但沒出什麼主意所以就不放在論文上。)
先自己貢獻一個答案吧。
從CMU Sphinx的wiki看:
Sphinx
Sphinx is a continuous-speech, speaker-independent recognition system making use of hidden Markov acoustic models (HMMs) and an n-gram statistical language model. It was developed by Kai-Fu Lee.
Sphinx featured feasibility of continuous-speech, speaker-independent
large-vocabulary recognition, the possibility of which was in dispute at
the time (1986). Sphinx is of historical interest only; it has been
superseded in performance by subsequent versions. An archival article describes the system in detail.
一代是李開復開發的。
Sphinx 2
A fast performance-oriented recognizer, originally developed by Xuedong Huang at Carnegie Mellon and released as Open source with a BSD-style license on SourceForge by Kevin Lenzo
at LinuxWorld in 2000. Sphinx 2 focuses on real-time recognition
suitable for spoken language applications. As such it incorporates
functionality such as end-pointing, partial hypothesis generation,
dynamic language model switching and so on. It is used in dialog systems
and language learning systems. It can be used in computer based PBX
systems such as Asterisk.
Sphinx 2 code has also been incorporated into a number of commercial
products. It is no longer under active development (other than for
routine maintenance). Current real-time decoder development is taking
place in the Pocket Sphinx project. An archival article describes the system.
二代是黃學東開發的。二代開始開源。
不了解洪小文的參與時間。
推薦閱讀:
※Siri 通過語音識別提供智能助手服務,這樣一個產品的技術挑戰在哪裡?
※什麼是語音分離技術?它有哪些最新進展?
※訊飛輸入法和搜狗輸入法哪個語音輸入更好使?
※在不久的將來五筆輸入法會不會退出輸入法界,畢竟身邊的人現在漸漸的改用了語音輸入?
※移動設備上的離線語音識別需要哪些技術支持,具體如何實現?
TAG:人工智慧 | 李開復人物 | 語音識別 | Sphinx | 卡內基梅隆大學CarnegieMellonUniversity |