標籤:

R語言可視化學習筆記之基因組數據可視化

原創 2017-06-26 taoyan EasyCharts

本文主要利用ggpubr包來探索基因組數據,主要是可視化TCGA基因組數據的基因表達譜。

library(ggpubr)#載入包

TCGA是一個包含大量癌症數據的資料庫,由Marcin Kosinski創建的RTCGA包可以讓我們很方便的獲取這些數據。主要有三個包:RTCGA、RTCGA.clininal、RTCGA.mRNA。安裝方法如下:

#Load the bioconductor installersource("http://bioconductor.org/biocLite.R"")#設置鏡像,這裡我們選擇中科大的鏡像options(BioC_mirror="http://ustc.edu.cn/bioc")#下載包biocLite("RTCGA")biocLite("RTCGA.clininal")biocLite("RTCGA.mRNA")

library(RTCGA)#查看每一種癌症的數據集infoTCGA()

RTCGA包里的函數expressionTCGA()可以十分方便地從不同數據集中提取基因的表達值,下面我們將從三個數據集BRCA(乳腺癌)、OV(卵巢癌)、LUSC(肺癌)中提取五個基因的表達值。

library(RTCGA)library(RTCGA.mRNA)expr <- expressionsTCGA(BRCA.mRNA, OV.mRNA, LUSC.mRNA, extract.cols = c("GATA3", "PTEN", "XBP1", "ESR1", "MUC1"))expr

查看每個數據集中的樣品數量

nb_samples <- table(expr$dataset)nb_samples## ## BRCA.mRNA LUSC.mRNA OV.mRNA ## 590 154 561

為了方便,我們將部分數據集名稱簡化

expr$dataset <- gsub(pattern = ".mRNA", replacement = "", expr$dataset)expr$bcr_patient_barcode <- paste0(expr$dataset, c(1:590, 1:561, 1:154))expr

接下來繪製圖形:

1、箱線圖

library(ggpubr)ggboxplot(expr, x="dataset", y="GATA3", title="GATA3", ylab = "Expression", color = "dataset", palette = "jco")

我們可以一次性繪製多個基因,然後一一查看,而不用每次寫代碼:

#Creat a list of plotsp <- ggboxplot(expr, x="dataset", y=c("GATA3", "PTEN", "XBP1", "ESR1", "MUC1"),title = c("GATA3", "PTEN", "XBP1", "ESR1", "MUC1"), ylab = "EXpression", color = "dataset", palette = "jco")#接下來一一查看每個plotp$GATA3#Creat a list of plotsp <- ggboxplot(expr, x="dataset", y=c("GATA3", "PTEN", "XBP1", "ESR1", "MUC1"),title = c("GATA3", "PTEN", "XBP1", "ESR1", "MUC1"), ylab = "EXpression", color = "dataset", palette = "jco")#接下來一一查看每個plotp$GATA3

p$PTEN

p$XBP1

p$ESR1

p$MUC1

當一次性繪製多個基因時,xlab,ylab,title也可以是一個跟y等長的向量。 接下來就是添加p-value以及顯著性了

my_comparisons <- list(c("BRCA", "OV"), c("OV", "LUSC"))ggboxplot(expr, x="dataset", y="GATA3", title = "GATA3", ylab = "Expression", color = "dataset", palette = "jco")+ stat_compare_means(comparisons = my_comparisons)

也可以查看每個類型中每一個基因的比較:

compare_means(c(GATA3, PTEN, XBP1)~dataset, data = expr)

可以通過select以及remove來決定比較那幾個類型,比如這裡我們只比較BRCA和OV

ggboxplot(expr, x="dataset", y="GATA3", title = "GATA3", ylab = "Expression", color = "dataset", palette = "jco", select = c("BRCA", "OV"))#通過select選擇

ggboxplot(expr, x="dataset", y="GATA3", title = "GATA3", ylab = "Expression", color = "dataset", palette = "jco", remove = "BRCA")#通過remove選擇

通過order來改變各類型在x軸上的順序

ggboxplot(expr, x="dataset", y="GATA3", title = "GATA3", ylab = "Expression", color = "dataset", palette = "jco", order = c("LUSC", "OV", "BRCA"))

通過rotate=TRUE來變換坐標軸

ggboxplot(expr, x="dataset", y="GATA3", title = "GATA3", ylab = "Expression", color = "dataset", palette = "jco", rotate=TRUE)

通過combine=TRUE來進行分面(類似於facet)

ggboxplot(expr, x="dataset", y=c("GATA3", "PTEN", "XBP1"), ylab = "Expression", color = "dataset", palette = "jco", combine = TRUE)

通過merge=TRUE或者merge=「axis」將三個類型的plot繪製在一個panel中

ggboxplot(expr, x="dataset", y=c("GATA3", "PTEN", "XBP1"), ylab = "Expression",color = "dataset", palette = "jco", merge = TRUE)

通過merge=flip利用不同癌症類型進行group

ggboxplot(expr, x="dataset", y=c("GATA3", "PTEN", "XBP1"), ylab = "Expression", palette = "jco", merge = "flip")

通過add=jitter增加抖動點

ggboxplot(expr, x="dataset", y=c("GATA3", "PTEN", "XBP1"), combine = TRUE,color = "dataset", palette = "jco", ylab = "Expression", add = "jitter", add.params = list(size=0.1, jitter=0.2))

通過add=dotplot增加dotplot

ggboxplot(expr, x="dataset", y=c("GATA3", "PTEN", "XBP1"), combine = TRUE, color = "dataset", palette = "jco", ylab = "Expression", add = "dotplot", add.params = list(binwidth_=0.1, dotsize=0.2))

很多時候我們很像知道箱線圖兩端的數據,我們可以通過label來進行展示

ggboxplot(expr, x="dataset", y=c("GATA3", "PTEN", "XBP1"), combine = TRUE,color = "dataset", palette = "jco", ylab = "Expression", add = "jitter", add.params = list(size=0.1, jitter=0.2), label = "bcr_patient_barcode", label.select = list(top.up=2, top.down=2), font.label = list(size=9, face="italic"), repel = TRUE)

2、小提琴圖

ggviolin(expr,x="dataset", y=c("GATA3", "PTEN", "XBP1"), combine = TRUE, color = "dataset", palette = "jco", ylab = "Expression", add = "boxplot")

通過修改add來更改添加小提琴圖裡的圖形

ggviolin(expr,x="dataset", y=c("GATA3", "PTEN", "XBP1"), combine = TRUE, color = "dataset", palette = "jco", ylab = "Expression", add = "median_iqr")

add有好多選項可以選擇:「mean」, 「mean_se」, 「mean_sd」, 「mean_ci」, 「mean_range」, 「median」, 「median_iqr」, 「median_mad」, 「median_range」.有興趣的可以自己試試。

3、帶狀圖

ggstripchart(expr, x="dataset", y=c("GATA3", "PTEN", "XBP1"), combine = TRUE,color = "dataset", palette = "jco", size = 0.1, jitter=0.2, ylab = "Expression", add = "median_iqr", add.params = list(color="red"))

4、dotplot

ggdotplot(expr, x="dataset", y=c("GATA3", "PTEN", "XBP1"), combine = TRUE, color = "dataset", palette = "jco", fill = "white", binwidth = 0.1, ylab = "Expression",add = "median_iqr", add.params = list(size=0.9))

5、密度圖

ggdensity(expr, x=c("GATA3", "PTEN", "XBP1"), y="..density..", combine = TRUE, xlab = "Expression", add = "median", rug = TRUE)

將dataset映射給顏色

ggdensity(expr, x=c("GATA3", "PTEN", "XBP1"), y="..density..", combine = TRUE,xlab = "Expression", add = "median", rug = TRUE, color = "dataset", fill = "dataset", palette = "jco")

將三幅圖整合進一個panel中,並對y軸進行..count..,而不是..density..

ggdensity(expr, x=c("GATA3", "PTEN", "XBP1"), y="..count..", xlab = "Expression",add = "median", rug = TRUE, palette = "jco")## $GATA3

## ## $PTEN

## ## $XBP1

顏色映射,將x軸變數映射給顏色

ggdensity(expr, x=c("GATA3", "PTEN", "XBP1"), y="..count..", color = ".x.", fill = ".x.", merge = TRUE, xlab = "Expression", add = "median", rug = TRUE, palette = "jco")

按dataset進行分面

ggdensity(expr, x=c("GATA3", "PTEN", "XBP1"), y="..count..", color = ".x.", fill = ".x.", merge = TRUE, xlab = "Expression", add = "median", rug = TRUE, palette = "jco", facet.by = "dataset")

6、直方圖

gghistogram(expr, x=c("GATA3", "PTEN", "XBP1"), y="..density..", xlab = "Expression", add = "median", rug = TRUE)## $GATA3

## ## $PTEN

## ## $XBP1

將dataset映射給顏色

gghistogram(expr, x=c("GATA3", "PTEN", "XBP1"), y="..density..", xlab = "Expression", add = "median", rug = TRUE, color = "dataset", fill = "dataset", palette = "jco")## $GATA3

## ## $PTEN

## ## $XBP1

後面還有一些將幾幅圖整合在一個panel以及分面等大同小異就不講了。

7、Q-Q圖

ggqqplot(expr, x=c("GATA3", "PTEN", "XBP1"), combine = TRUE, size = 0.5)

顏色映射

ggqqplot(expr, x=c("GATA3", "PTEN", "XBP1"), combine = TRUE, size = 0.5, color = "dataset", palette = "jco")

sessionInfo

請大家多多關注我的個人博客:ytlogos.github.io/

sessionInfo()## R version 3.4.0 (2017-04-21)## Platform: x86_64-pc-linux-gnu (64-bit)## Running under: Ubuntu 16.04.2 LTS## ## Matrix products: default## BLAS: /usr/lib/libblas/libblas.so.3.6.0## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0## ## locale:## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C ## [3] LC_TIME=zh_CN.UTF-8 LC_COLLATE=en_US.UTF-8 ## [5] LC_MONETARY=zh_CN.UTF-8 LC_MESSAGES=en_US.UTF-8## [7] LC_PAPER=zh_CN.UTF-8 LC_NAME=C ## [9] LC_ADDRESS=C LC_TELEPHONE=C ## [11] LC_MEASUREMENT=zh_CN.UTF-8 LC_IDENTIFICATION=C ## ## attached base packages:## [1] stats graphics grDevices utils datasets methods base ## ## other attached packages:## [1] bindrcpp_0.2 RTCGA.mRNA_1.4.0 RTCGA_1.6.0 ggpubr_0.1.3 ## [5] magrittr_1.5 ggplot2_2.2.1 ## ## loaded via a namespace (and not attached):## [1] zoo_1.8-0 reshape2_1.4.2 purrr_0.2.2.2 ## [4] splines_3.4.0 ggthemes_3.4.0 lattice_0.20-35 ## [7] colorspace_1.3-2 htmltools_0.3.6 viridisLite_0.2.0## [10] yaml_2.1.14 survival_2.41-3 XML_3.98-1.9 ## [13] survMisc_0.5.4 rlang_0.1.1 foreign_0.8-68 ## [16] glue_1.1.0 bindr_0.1 plyr_1.8.4 ## [19] stringr_1.2.0 ggsignif_0.2.0 munsell_0.4.3 ## [22] gtable_0.2.0 ggsci_2.7 rvest_0.3.2 ## [25] psych_1.7.5 evaluate_0.10 labeling_0.3 ## [28] knitr_1.16 parallel_3.4.0 broom_0.4.2 ## [31] Rcpp_0.12.11 xtable_1.8-2 scales_0.4.1## [34] backports_1.1.0 cmprsk_2.2-7 km.ci_0.5-2 ## [37] gridExtra_2.2.1 mnormt_1.5-5 digest_0.6.12 ## [40] stringi_1.1.5 ggrepel_0.6.5 dplyr_0.7.0 ## [43] KMsurv_0.1-5 grid_3.4.0 rprojroot_1.2 ## [46] tools_3.4.0 lazyeval_0.2.0 tibble_1.3.3 ## [49] tidyr_0.6.3 Matrix_1.2-10 data.table_1.10.4## [52] xml2_1.1.1 survminer_0.4.0 assertthat_0.2.0 ## [55] rmarkdown_1.6 httr_1.2.1 viridis_0.4.0 ## [58] R6_2.2.2 nlme_3.1-131 compiler_3.4.0

R語言可視化學習筆記之添加p-value和顯著性標記

R語言可視化學習筆記之ggrepel包

【重磅】史上最全的論文圖表基本規範關於學術論文Figures,你不能不知道的秘密

優雅的操縱json數據地圖素材——打破地理信息可視化的孤島

用R-Shiny打造一個美美的在線App

shiny動態儀錶盤——360度全空間無死角拖拉換膚功能的旋轉地球

如需轉載請聯繫EasyCharts團隊!

EasyCharts團隊出品

帥的人都關注了EasyCharts團隊^..^~

QQ交流群:553270834

微信公眾號:EasyCharts

更多信息敬請查看: easychart.github.io/pos


推薦閱讀:

一技之長——我的大數據學習之路
請用正確的姿勢維護新手的熱情和自信----學慣用R操作資料庫MySQL以及項目模塊化的應用
R語言分析NBA球員數據

TAG:R编程语言 |