1. 本际云推荐 - 专业推荐VPS、服务器,IDC点评首页
  2. 云主机运维
  3. VPS运维

python实现GATK多线程加速示例

实现GATK多线程加速

我是本际云服务器推荐网的小编小本本,写这篇文章的目的是为了给大家解答如何实现GATK多线程加速。大数据样本在GATK变异分析时可能会比较慢,因此我们可以按照染色体拆分后进行多线程并行计算。

python实现GATK多线程加速示例

多线程脚本示例

下面是我写的一个Python多线程脚本,供大家参考(仅供参考,拙劣之处敬请指正):

import _thread
import os
import threading
import time

muthreads=[]
bam_file="a.mkdup.bam"
out_file_prefix="flower"
chr_list=["CHR01","CHR02","CHR03","CHR04","CHR05","CHR06","CHR07","CHR08","CHR09","CHR10","CHR11","CHR12","CHR13"]

for chr in chr_list:
    threads_comonder_name="gatk HaplotypeCaller--intervals"+chr+"-R/mnt/j/BSA/02-read-align/Tifrunner2.fasta-I"+bam_file+"-ERC GVCF-O"+out_file_prefix+"-"+chr+".erc.g.vcf"
    muthreads.append(threads_comonder_name)

exitFlag=0

class myThread(threading.Thread):
    def __init__(self,threadID,name,counter,comander):
        threading.Thread.__init__(self)
        self.threadID=threadID
        self.name=name
        self.counter=counter
        self.comander=comander
    
    def run(self):
        print("开始线程:"+self.name)
        print_time(self.name,self.counter,5,self.comander)
        print("退出线程:"+self.name)
    
    def print_time(threadName,delay,counter,comander):
        #while counter:
        if exitFlag:
            threadName.exit()
        time.sleep(delay)
        print(comander)
        os.system(comander)#调用操作系统命令行处理数据
        #counter-=1

#创建新线程
threadlist=[]
for i,threadsnu in enumerate(muthreads[0:11]):
    print(i)
    print(threadsnu)
    threadsnew=myThread(1,"Thread-"+str(i),2,threadsnu)
    threadlist.append(threadsnew)

#开启新线程
for threads in threadlist:
    threads.start()

for threads in threadlist:
    threads.join()

print("运行结束退出主线程")

多条染色体的同样本VCF文件合并

如果我们需要将多条染色体的同一个样本的VCF文件合并起来,可以使用以下命令:

#for i in{1..22}X Y;do echo "-I final_chr$i.vcf";done
#for i in{10..19}{1..9}M X Y;do echo "-I final_chr$i.vcf";done

module load java/1.8.0_91
GATK=/home/jianmingzeng/biosoft/GATK/gatk-4.0.3.0/gatk
$GATK GatherVcfs
-I final_chr1.vcf
-I final_chr2.vcf
-I final_chr3.vcf
-I final_chr4.vcf
-I final_chr5.vcf
-I final_chr6.vcf
-I final_chr7.vcf
-I final_chr8.vcf
-I final_chr9.vcf
-I final_chr10.vcf
-I final_chr11.vcf
-I final_chr12.vcf
-I final_chr13.vcf
-I final_chr14.vcf
-I final_chr15.vcf
-I final_chr16.vcf
-I final_chr17.vcf
-I final_chr18.vcf
-I final_chr19.vcf
-I final_chr20.vcf
-I final_chr21.vcf
-I final_chr22.vcf
-I final_chrX.vcf
-I final_chrY.vcf
-O merge.vcf

在合并时需要注意的是,VCF文件的顺序跟每个VCF文件里面头文件的顺序是相同的。

以上就是我的解答,希望可以对大家有所帮助。

原创文章,作者:小编小本本,如若转载,请注明出处:https://www.benjiyun.com/yunzhujiyunwei/vps-yunwei/7124.html