從美國非拉美裔白人的中年危機到現代科研的興起

今天先講個論文故事,然後討論下科研領域正在發生的信息化過程。

2015年,在《美國國家科學院院刊》(也就是PNAS)上發表了一篇論文,題目翻譯過來是《提升中的21世紀中年非拉美裔美國白人的患病率與死亡率》。文章對比了1999年到2013年美國非拉美裔白人及拉美裔白人及六個發達國家(包括法國、德國、英國、加拿大、澳大利亞、瑞典)45到54歲人群的死亡率,發現只有非拉美裔美國白人的死亡率是在上升的,在進一步分析了死亡原因後,作者發現毒品、酒精中毒、自殺還有慢性肝硬化可能主導這種上升,同時教育水平越低,上升比例越快。

這是一篇對左派跟右派都有利的文章,左派認為是這些年福利政策減弱與傳統宗教價值觀的復興造成的,加上這一段算是共和黨小布希的主要執政期,自然鍋是右派的。有意思的是,右派看到這個研究後也發話稱這個鍋恰恰是因為白人藍領對工作、信仰還有家庭觀念的缺失,而這個恰恰是傳統宗教價值觀所倡導的。好了我們不看左右互搏了,解釋怎麼來都行,但現象總該沒問題吧?

有問題。文章發表不久,知名話癆兼統計學家 Andrew Gelman 教授 在自己博客上對這個研究的數據進行了重新分析,其實說「重新分析」是書面說法,真實的情況是對這一組數據進行了校正。因為 Gelman 教授注意到了一個簡單到不能再簡單的問題:你這個死亡率沒有對年齡分布進行校正。

原始數據:

我來解釋下這個校正,舉例來說我有100個人年齡段在45到54歲,那麼在這個15年的研究時間段里,每一年進入這個年齡段的人數應該是差不多一樣的才好跟其他的地方去比。但恰恰這個年齡段包括了二戰後的嬰兒潮,也就是說,每年這個死亡率的基數在變,該年齡段整體平均年齡被拖大了,按照自然規律,年齡大本來就死亡率高。所以應該對每一年的數據除以其人數,也就是認為這個年齡段的人數應該差不多才合適。

校正後數據:

額,從這個結果上看那個上升趨勢就不明顯了。 Gelman 教授進一步分析了其他國家數據,發現其他國家同年齡段死亡率校正後還是一直下跌,那麼美國非拉美裔中年白人比較詭異的死亡率確實是存在的,也就是說原文主要結論沒啥問題。然後Gelman 教授又想到會不會性別上有差異?然後得到了下面這個圖:

感情白人男性其實沒怎麼變,女性死亡率倒是一直在提高。然後Gelman 教授又計算了一下相對死亡率,用1999年為基準,看了下不同年齡段的分布:

結果發現不僅僅45-54歲女性非拉美裔白人死亡率在上升,35-44歲這個年齡段也在上升。那麼問題來了,為什麼當初不去說這個年齡段呢?會不會原文屬於一種發表歧視呢?也就是說對比了半天終於發現了一個顯著的,而其實如果在處理數據時男女分開,這篇報道的題目會不會就成了「35-54歲女性非拉美裔白人死亡率在上升」呢?

其實我講這個故事對這個論文事實興趣不大,我很好奇的是為什麼這樣的評論是以博客的形式出現的。傳統學術界的交流一般依賴期刊論文與會議,但是動輒幾個月的審稿時間是不是對成果交流的一種阻礙呢?誠然學術界絕大多數是要依賴出版物來獲取聲望,但其實有時候很多博客評論的深度與廣度可能並不比3-5個審稿人的審稿意見低。

回到這個案例,原論文的作者在另一個科學博客里回應了質疑,她聲稱研究過程中確實也考察了性別影響,但沒有使用相對死亡率,為了不讓讀者被一大堆圖表覆蓋就沒放到文章里。同時針對Gelman教授的論點,她重新分析後認為吸煙與否對這個年齡段男性女性死亡率差異起了重要貢獻。但是她又說了如下的話:

We spent a year working on this paper, sweating out every number, sweating out over what we were doing, and then to see people blogging about it in real time — that』s not the way science really gets done. . . . And so it』s a little hard for us to respond to all of the blog posts that are coming out. . . . And if this is all people shooting from the hip, I don』t think that』s any way to move science forward, to move the research forward.

也就是說,你們博客評論太草根,懶得理你。但數據的產生者或科學問題的提出者不應該同時也要是問題的正確解決者,有時候提供一個視角就很好了。可了解Gelman教授的人應該清楚,其哥倫比亞大學資深話嘮身份不是白拿的,他馬上就在博客上回應了這樣一封應該來自作者的虛構的信來表明博客這種交流方式其實也應該被科研人員尊敬而不是故作清高姿態:

We spent a year working on this paper, sweating out every number, sweating out over what we were doing, and we』re happy to see see people blogging about it in real time.

We very much appreciate the effort put in by Laudan Aron, Lisa Dubay, Elaine Waxman, and Steven Martin, Philip Cohen, and Andrew Gelman to uncover the aggregation bias in our analysis, to correct for that bias, and to explore subtleties that we did not have a chance to get into in our paper. As Gelman noted, these corrections are in no way a debunking of our work—our comparisons of non-Hispanic American whites to groups in other countries and other ethnic groups still stand.

We think it』s great that, after our paper was published in PNAS, it was possible to get rapid feedback. Had it not been for bloggers, we』d still be in the awkward situation of people trying to trying to explain an increase in death rates which isn』t actually happening. We join Paul Krugman and Ross Douthat in thanking these bloggers for their unpaid efforts on the behalf of everyone interested in this research. We count ourselves lucky to live in an era in which mistakes can be corrected rapidly, so that we and others do not have to wait months or even years for published corrections which themselves could contain further errors.

As economists, we recognize that research work is always provisional, and that anyone studying the real world of human interactions has to accept that mistakes are part of the process. It is only through the efforts of our entire research community—publishing in journals, publishing in blogs, through informal conversations, whatever—that we move toward the truth. We always considered our PNAS paper to be just a single step in this process and we are glad that others have taken the trouble to correct some of our biases and omissions.

Again, we thank the many researchers who have taken a careful look at our analyses. It』s good to know that our main findings are not affected by the corrections, we welcome further research in this area, and we hope that future discussion of our work, both in the scientific literature and in the popular press, make use of the corrected, age-adjusted trends.

– Sincerely, Anne Case and Angus Deaton

P.S. We have heard some people criticize the researchers noted above because they published their work in blogs rather than in peer-reviewed journals. We would never make such a silly, uninformed criticism. Since appearing in print, our work has received a huge amount of publicity. And, to the extent that we made mistakes or did not happen to explain ourselves clearly enough, it is the responsibility of others to publish their corrections and explanations as rapidly as possible. Blogs are a great way to do this. Blogs, unlike newspaper interviews, allow unlimited space to develop arguments and to present graphs of data. And we are of course aware that peer-reviewed journals make mistakes too. We published our paper in the Proceedings of the National Academy of Sciences, a journal that last year published a notorious paper on himmicanes and hurricanes, another discredited paper claiming certain behavior by people whose ages end in 9, and another paper on demographics which neglected to apply a basic age adjustment. So, yes, publication in journals is fine, but we very much welcome researchers who are willing to stick out their necks and correct the record in real time on blogs.

當然,Gelman教授的火爆脾氣自然也會招來一些不滿,但是我認為有些觀點是很有益處的。

博客,作為一種快速的回應方式,理應被尊重,因為科學發展就是要依賴這樣的過程才能快速進步。說白了,現代科學研究不再是躲在黑暗小實驗室里的勤勉鑽研,更應該是一個交流碰撞的過程。最近幾年,預印本伺服器已經在物理、計算機跟生命科學領域大力發展,很多科研報道的記者跟前沿課題組都盯著。同時,基於博客還有微博(當然不是你熟悉的那個娛樂版)對科研成果的討論也逐漸成為一種風氣。數據共享、媒體傳播、在線學術檔案也逐漸成為青年科學家累積學術聲譽、尋找業界合作的方法。嚴肅的學術討論可以發生在任何地方,態度而不是場景產生嚴肅感。我們可以逐漸看到:

  • 大量高質量的問答、博文及報告幻燈片共享其實正在自發地形成一本本最新的網路教材
  • 計算機領域裡最新的演算法很快就會有博文告訴你如何去用並出現一個對應的github repo
  • 這邊剛上傳了一個物種基因組數據,那邊某課題組集群上的自動化腳本就能生成一份報告email到課題組成員的郵箱里
  • 微博上傳閱的最新研究成果很快就被reddit上的匿名專家進行了通俗化解讀與評論並發現了新現象
  • 某公司苦苦追尋的最新技術操作過程竟然在直播平台被前沿科研人員作為論文發表的一部分所展示
  • 某個博士生意外收到某大牛課題組長的報告邀請,只因為他讀了這個學生的博客,感覺他對某個領域的理解很有特色
  • ……

你可以躲在象牙塔里不知道,但這一切都在發生,或許它目前不「正式」,但解決科學問題更應該依賴快速的良性公開交流而不是論文被發表,那終歸只是個起點,現代化的科研方式正在興起。

註:題圖這本書大家感興趣可以看一下


推薦閱讀:

如何評價2018年1月22日京都大學承認ips細胞研究所論文作假行為?
Deep Learning in Recommender System
論文格式排版你真的做對了嗎? 常用格式及其LaTeX書寫方法介紹

TAG:论文 | 科研 | 博客 |