How Do I Install and Use BUSCO on Ubuntu 14.04.1 LTS

What is BUSCO

BUSCO stand for Benchmarking Universal Single-Copy Orthologs which can be used to assess the completeness of genome assembly and annotation.

Why BUSCO

I tried several time to use the CEGMA (Core Eukaryotic Genes Mapping Approach) on my Ubuntu14.0 mechine but failed. Then I found the developer of CEGMA has stopped to give any support for it and has suggested to use BUSCO.

Requirements

  • Python 3
run this command in your terminal
sudo apt-get install python3
run this command in your terminal
sudo apt-get install ncbi-blast+
  • HMMER (HMMER 3.1b2)
run this command in your terminal
sudo apt-get install hmmer
  • Augustus 3.0.x (genome only)
Download from HERE and install accordingly
  • EMBOSS tools 6.x.x (transcriptome only)
run this command in your terminal
sudo apt-get install emboss

Installation

  • Download latest script of BUSCO from HERE, from Software & User Guide section, and unzip it. It will create a directory 'busco'. This directory shoul have following files : BUSCO_userguide.pdf, LICENSE, release_notes,BUSCO_v1.1.py,README.txt,sample_data
  • Download the library of lineage-specific BUSCO data from HERE, from Dataset section, and extract in same directory of script. I downloaded the eukryotes specific file whose name is eukryota

Uses

  • Genome assembly assessment
python BUSCO_v1.1b.py -o NAME -in ASSEMBLY -l LINEAGE –m genome
  • Gene set assessment:
python BUSCO_v1.1b.py -o NAME -in GENE_SET -l LINEAGE -m OGS
  • Gene set assessment:
python BUSCO_v1.1b.py -o NAME -in TRANSCRIPTOME -l LINEAGE -m trans
NAME- name to use for the run and all temporary files ASSEMBLY/GENE_SET/TRANSCRIPTOME - file in fasta format
LINEAGE - path to the lineage to be used (-l eukryota for example)

How to run BUSCO

To test the BUSCO, I downloaded the core eukryotic gene list from CEGMA and choose 9 Arabidopsis genes
At1g73030                      
At3g60360                    
At5g11900                   
At3g56490                    
At3g25980                    
At5g23900                     
At1g06790                     
At5g49510                     
At5g10780
I run the BUSCO python script like this
python3 BUSCO_v1.1.py -o test -in input -l eukaryota -m trans -f
which produces following in my terminal
*** Running tBlastN ***


Building a new DB, current time: 08/13/2015 09:09:35
New DB name:   test
New DB title:  input
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 9 sequences in 0.000671148 seconds.
*** Getting coordinates for candidate transcripts! ***
*** Extracting candidate transcripts! ***
Translating candidate transcripts !
Translate nucleic acid sequences
Translate nucleic acid sequences
Translate nucleic acid sequences
Translate nucleic acid sequences
Translate nucleic acid sequences
Translate nucleic acid sequences
Translate nucleic acid sequences
Translate nucleic acid sequences
Translate nucleic acid sequences
*** Running HMMER to confirm transcript orthology ***
Total complete BUSCOs found in assembly (<2 data-blogger-escaped-3="" data-blogger-escaped-:="" data-blogger-escaped-buscos="" data-blogger-escaped-duplicated="" data-blogger-escaped-partially="" data-blogger-escaped-recovered="" data-blogger-escaped-sigma="" data-blogger-escaped-total="">2 sigma) :  0
Total groups searched: 429
Total BUSCOs not found:  426
Total running time:   5.930385589599609 seconds
Since I didn't use the plant specific library that may be the reason that BUSCO identified only 3 genes in default mode

Problems

I notice few problem during installation and use of BUSCO
  1. As you have notice that I have run the BUSCO script as python3 not python. So if run your BUSCO script like this
    python BUSCO_v1.1.py -o test -in input -l eukaryota -m trans -f
    you may get an error like this
    Traceback (most recent call last):
      File "BUSCO_v1.1.py", line 32, in 
        import queue
    ImportError: No module named queue
    because python script is using the older python version.
  2. After successful run, BUSCO script will produce one directory run_test (name depend upon your '-o' value) and a file 'temp'. If you run BUSCO script without deleting the old file and directory it will give following error
    python3 BUSCO_v1.1.py -o test -in input1 -l eukaryota -m trans -f
    *** Running HMMER to confirm transcript orthology ***
    Traceback (most recent call last):
      File "BUSCO_v1.1.py", line 598, in 
        name=i[:-7];group=transdic[name]
    KeyError: 'AT1G06790.1'
    
    
        
Here AT1G06790.1 is the fasta header from my previous run

No comments:

Post a Comment

Have Problem ?? Drop a comments here!