[Pipeline] 在Anaconda下用Snakemake構建ChIP-seq流程(4)
(5)Peak annotation
使用Homer對得到的Peak進行注釋
正常的注釋命令:
annotatePeaks.pl <peak/BED file> <genome> > <output file>
通過對Peak/Bed 文件的注釋,我們能得到這樣的一個表格(使用Homer官網的例子,其實我更喜歡Y叔的ChIPseeker,可視化更好看一些):
annotatePeaks.pl ERpeaks.txt hg18 > outputfile.txt
An example from Homer annotation,from http://homer.ucsd.edu/homer/ngs/index.html
如果你需要一個其他的參考基因組,比如hg38,需要預先進行下載:
perl /anaconda3/envs/chip/share/homer-4.9.1-6/.//configureHomer.pl -install hg38
在我們的pipeline里可以寫成:
rule homer_anno:
input:
"peak_file/{sample}_peaks.narrowPeak"
output:
"peak_file/{sample}_peakanno.xls"
shell:
r
annotatePeaks.pl {input} hg38 > {output}
別忘了添加註釋文件到rule all 下。
rule all:
input:
expand(fastq_file/{sample}.fastq, sample=SAMPLES),
expand(trim_file/{sample}.fastq, sample=SAMPLES),
expand(bam_file/{sample}.bam, sample=SAMPLES),
expand(peak_file/{sample}_peaks.narrowPeak, sample=SAMPLES),
expand(peak_file/{sample}__peakanno.xls, sample=SAMPLES)
所以完整的pipeline 是這樣的(註:未經測試,有問題可以問我):
import glob
SAMPLES = []
for filename in glob.glob(rsra_file/*.sra):
SAMPLES.append(filename.replace(.sra,).replace(sra_file/,))
rule all:
input:
expand(fastq_file/{sample}.fastq, sample=SAMPLES),
expand(trim_file/{sample}.fastq, sample=SAMPLES),
expand(bam_file/{sample}.bam, sample=SAMPLES),
expand(peak_file/{sample}_peaks.narrowPeak, sample=SAMPLES),
expand(peak_file/{sample}__peakanno.xls, sample=SAMPLES)
rule sra_fastq:
input:
sra_file/{sample}.sra
output:
fastq_file/{sample}.fastq
shell:
r
fastq-dump -O fastq_file {input}
rule perform_qc:
input:
fastq_file/{sample}.fastq
params:
out_dir = qc
output:
qc/{sample}_fastqc.html,
shell:
r
fastqc -o qc_file -t 4 -f fastq {input}
rule seq_trim:
input:
fastq_file/{sample}.fastq
output:
trimed_file/{sample}.fastq
shell:
r
seqtk trimfq -l 30 {input} > {output}
rule bowtie2_align:
input:
fastq = "trim_file/{sample}.fastq"
output:
"bam_file/{sample}.bam"
shell:
r
bowtie2 -p 10 --local -x /ref/GRCh38/hg38 -U {input.fastq} | samtools sort -O bam -o {output}
rule call_peak:
input:
"bam_file/{sample}.bam"
output:
"peak_file/{sample}_peaks.narrowPeak"
shell:
r
macs2 callpeak -t {input} -q 0.05 -f BAM -g mm -n {wildcards.sample} --outdir bam_file/
rule homer_anno:
input:
"peak_file/{sample}_peaks.narrowPeak"
output:
"peak_file/{sample}_peakanno.xls"
shell:
r
annotatePeaks.pl {input} hg38 > {output}
如果需要輸出其他格式的文件,比如 .bw,可以使用相應的工具進行轉換,這裡給出bam轉bw文件的代碼(需要deeptools 和samtools):
samtools index {input}
bamCoverage -b {input.bam} -o {output}
差不多就是這樣,以後應該不會給出這樣詳盡的pipeline了,但是也可能會用其他的流程管理工具或者別人寫好的工具講一下
推薦閱讀: