標籤:

R語言可視化學習筆記之添加p-value和顯著性標記

原創 2017-06-16 taoyan EasyCharts

上篇文章中提了一下如何通過ggpubr包為ggplot圖添加p-value以及顯著性標記,本文將詳細介紹。利用數據集ToothGrowth進行演示

#先載入包library(ggpubr)#載入數據集ToothGrowthdata("ToothGrowth")head(ToothGrowth)

## len supp dose## 1 4.2 VC 0.5## 2 11.5 VC 0.5## 3 7.3 VC 0.5## 4 5.8 VC 0.5## 5 6.4 VC 0.5## 6 10.0 VC 0.5

比較方法

R中常用的比較方法主要有下面幾種:

各種比較方法後續有時間一一講解。

添加p-value

主要利用ggpubr包中的兩個函數:

  • compare_means():可以進行一組或多組間的比較

  • stat_compare_mean():自動添加p-value、顯著性標記到ggplot圖中


    compare_means()函數

    stat_compare_means()函數

    比較獨立的兩組

    繪製箱線圖

    p <- ggboxplot(ToothGrowth, x="supp", y="len", color = "supp", palette = "jco", add = "jitter")#添加p-valuep+stat_compare_means()

    #使用其他統計檢驗方法p+stat_compare_means(method = "t.test")

    p+stat_compare_means(aes(label=..p.signif..), label.x = 1.5, label.y = 40)

    也可以將標籤指定為字元向量,不要映射,只需將p.signif兩端的..去掉即可

p+stat_compare_means(label = "p.signif", label.x = 1.5, label.y = 40)

比較兩個paired sample

compare_means(len~supp, data=ToothGrowth, paired = TRUE)

利用ggpaired()進行可視化

ggpaired(ToothGrowth, x="supp", y="len", color = "supp", line.color = "gray", line.size = 0.4, palette = "jco")+ stat_compare_means(paired = TRUE)

多組比較

Global test

compare_means(len~dose, data=ToothGrowth, method = "anova")

可視化

ggboxplot(ToothGrowth, x="dose", y="len", color = "dose", palette = "jco")+stat_compare_means()

#使用其他的方法ggboxplot(ToothGrowth, x="dose", y="len", color = "dose", palette = "jco")+ stat_compare_means(method = "anova")

Pairwise comparisons:如果分組變數中包含兩個以上的水平,那麼會自動進行pairwise test,默認方法為」wilcox.test」

compare_means(len~dose, data=ToothGrowth)

#可以指定比較哪些組my_comparisons <- list(c("0.5", "1"), c("1", "2"), c("0.5", "2"))ggboxplot(ToothGrowth, x="dose", y="len", color = "dose",palette = "jco")+stat_compare_means(comparisons=my_comparisons)+ # Add pairwise comparisons p-value stat_compare_means(label.y = 50) # Add global p-value

可以通過修改參數label.y來更改標籤的位置

ggboxplot(ToothGrowth, x="dose", y="len", color = "dose",palette = "jco")+stat_compare_means(comparisons=my_comparisons, label.y = c(29, 35, 40))+ # Add pairwise comparisons p-value stat_compare_means(label.y = 45) # Add global p-value

至於通過添加線條來連接比較的兩組,這一功能已由包ggsignif實現

##設定參考組compare_means(len~dose, data=ToothGrowth, ref.group = "0.5", #以dose=0.5組為參考組 method = "t.test" )

#可視化ggboxplot(ToothGrowth, x="dose", y="len", color = "dose", palette = "jco")+ stat_compare_means(method = "anova", label.y = 40)+ # Add global p-valuestat_compare_means(label = "p.signif", method = "t.test", ref.group = "0.5") # Pairwise comparison against reference

參考組也可以設置為.all.即所有的平均值

compare_means(len~dose, data=ToothGrowth, ref.group = ".all.", method = "t.test")

#可視化ggboxplot(ToothGrowth, x="dose", y="len", color = "dose", palette = "jco")+stat_compare_means(method = "anova", label.y = 40)+# Add global p-valuestat_compare_means(label = "p.signif", method = "t.test", ref.group = ".all.")#Pairwise comparison against all

接下來利用survminer包中的數據集myeloma來講解一下為什麼有時候我們需要將ref.group設置為.all.

library(survminer)#沒安裝的先安裝再載入data("myeloma")head(myeloma)

我們將根據患者的分組來繪製DEPDC1基因的表達譜,看不同組之間是否存在顯著性的差異,我們可以在7組之間進行比較,但是這樣的話組間比較的組合就太多了,因此我們可以將7組中每一組與全部平均值進行比較,看看DEPDC1基因在不同的組中是否過表達還是低表達。

compare_means(DEPDC1~molecular_group, data = myeloma, ref.group = ".all.", method = "t.test")

#可視化DEPDC1基因表達譜ggboxplot(myeloma, x="molecular_group", y="DEPDC1", color = "molecular_group", add = "jitter", legend="none")+ rotate_x_text(angle = 45)+ geom_hline(yintercept = mean(myeloma$DEPDC1), linetype=2)+# Add horizontal line at base mean stat_compare_means(method = "anova", label.y = 1600)+ # Add global annova p-value stat_compare_means(label = "p.signif", method = "t.test", ref.group = ".all.")# Pairwise comparison against all

從圖中可以看出,DEPDC1基因在Proliferation組中顯著性地過表達,而在Hyperdiploid和Low bone disease顯著性地低表達

我們也可以將非顯著性標記ns去掉,只需要將參數hide.ns=TRUE

ggboxplot(myeloma, x="molecular_group", y="DEPDC1", color = "molecular_group", add = "jitter", legend="none")+rotate_x_text(angle = 45)+ geom_hline(yintercept = mean(myeloma$DEPDC1), linetype=2)+# Add horizontal line at base mean stat_compare_means(method = "anova", label.y = 1600)+ # Add global annova p-value stat_compare_means(label = "p.signif", method = "t.test", ref.group = ".all.", hide.ns = TRUE)# Pairwise comparison against all

多個分組變數

按另一個變數進行分組之後進行統計檢驗,比如按變數dose進行分組:

compare_means(len~supp, data=ToothGrowth, group.by = "dose")

#可視化p <- ggboxplot(ToothGrowth, x="supp", y="len", color = "supp", palette = "jco", add = "jitter", facet.by = "dose", short.panel.labs = FALSE)#按dose進行分面#label只繪製p-valuep+stat_compare_means(label = "p.format")

#label繪製顯著性水平p+stat_compare_means(label = "p.signif", label.x = 1.5)

#將所有箱線圖繪製在一個panel中p <- ggboxplot(ToothGrowth, x="dose", y="len", color = "supp", palette = "jco", add = "jitter")p+stat_compare_means(aes(group=supp))

#只顯示p-valuep+stat_compare_means(aes(group=supp), label = "p.format")

#顯示顯著性水平p+stat_compare_means(aes(group=supp), label = "p.signif")

進行paired sample檢驗compare_means(len~supp, data=ToothGrowth, group.by = "dose", paired = TRUE)

#可視化p <- ggpaired(ToothGrowth, x="supp", y="len", color = "supp", palette = "jco", line.color="gray", line.size=0.4, facet.by = "dose", short.panel.labs = FALSE)#按dose分面#只顯示p-valuep+stat_compare_means(label = "p.format", paired = TRUE)

其他圖形

條形圖與線圖(一個分組變數)

#有誤差棒的條形圖,實際上我以前的文章里有純粹用ggplot2實現ggbarplot(ToothGrowth, x="dose", y="len", add = "mean_se")+ stat_compare_means()+ stat_compare_means(ref.group = "0.5", label = "p.signif", label.y = c(22, 29))

#有誤差棒的線圖ggline(ToothGrowth, x="dose", y="len", add = "mean_se")+stat_compare_means()+ stat_compare_means(ref.group = "0.5", label = "p.signif", label.y = c(22, 29))

條形圖與線圖(兩個分組變數)

ggbarplot(ToothGrowth, x="dose", y="len", add = "mean_se", color = "supp", palette = "jco", position = position_dodge(0.8))+ stat_compare_means(aes(group=supp), label = "p.signif", label.y = 29)

ggline(ToothGrowth, x="dose", y="len", add = "mean_se", color = "supp", palette = "jco")+ stat_compare_means(aes(group=supp), label = "p.signif", label.y = c(16, 25, 29))

Sessioninfo

sessionInfo()## R version 3.4.0 (2017-04-21)## Platform: x86_64-w64-mingw32/x64 (64-bit)## Running under: Windows 8.1 x64 (build 9600)## ## Matrix products: default## ## locale:## [1] LC_COLLATE=Chinese (Simplified)_China.936 ## [2] LC_CTYPE=Chinese (Simplified)_China.936 ## [3] LC_MONETARY=Chinese (Simplified)_China.936## [4] LC_NUMERIC=C ## [5] LC_TIME=Chinese (Simplified)_China.936 ## ## attached base packages:## [1] stats graphics grDevices utils datasets methods base ## ## other attached packages:## [1] survminer_0.4.0 ggpubr_0.1.3 magrittr_1.5 ggplot2_2.2.1 ## ## loaded via a namespace (and not attached):## [1] Rcpp_0.12.11 compiler_3.4.0 plyr_1.8.4## [4] tools_3.4.0 digest_0.6.12 evaluate_0.10 ## [7] tibble_1.3.3 gtable_0.2.0 nlme_3.1-131 ## [10] lattice_0.20-35 rlang_0.1.1 Matrix_1.2-10 ## [13] psych_1.7.5 ggsci_2.4 DBI_0.6-1 ## [16] cmprsk_2.2-7 yaml_2.1.14 parallel_3.4.0 ## [19] gridExtra_2.2.1 dplyr_0.5.0 stringr_1.2.0 ## [22] knitr_1.16 survMisc_0.5.4 rprojroot_1.2 ## [25] grid_3.4.0 data.table_1.10.4 KMsurv_0.1-5 ## [28] R6_2.2.1 km.ci_0.5-2 survival_2.41-3 ## [31] foreign_0.8-68 rmarkdown_1.5 reshape2_1.4.2 ## [34] tidyr_0.6.3 purrr_0.2.2.2 splines_3.4.0 ## [37] backports_1.1.0 scales_0.4.1 htmltools_0.3.6 ## [40] assertthat_0.2.0 mnormt_1.5-5 xtable_1.8-2 ## [43] colorspace_1.3-2 ggsignif_0.2.0 labeling_0.3 ## [46] stringi_1.1.5 lazyeval_0.2.0 munsell_0.4.3 ## [49] broom_0.4.2 zoo_1.8-0

R語言可視化學習筆記之ggrepel包

【重磅】史上最全的論文圖表基本規範關於學術論文Figures,你不能不知道的秘密優雅的操縱json數據地圖素材——打破地理信息可視化的孤島

用R-Shiny打造一個美美的在線Appshiny動態儀錶盤——360度全空間無死角拖拉換膚功能的旋轉地球

aHR0cDovL3N0dWR5LjE2My5jb20vY291cnNlL2NvdXJzZU1haW4uaHRtP2NvdXJzZUlkPTEwMDM1OTIwMDg= (二維碼自動識別)

如需轉載請聯繫EasyCharts團隊!

EasyCharts團隊出品

帥的人都關注了EasyCharts團隊^..^~

QQ交流群:553270834

微信公眾號:EasyCharts

更多信息敬請查看: easychart.github.io/pos

推薦閱讀:

左手用R右手Python系列——任務進度管理
流量結構分布圖——炫酷和弦圖
[原]海納百川 有容乃大:SparkR與Docker的機器學習實戰
【翻譯】Awesome R資源大全中文版來了,全球最火的R工具包一網打盡,超過300+工具,還在等什麼?
Learn R | 數據降維之因子分析(下)

TAG:R编程语言 |