How Do I Install and Use BUSCO on Ubuntu 14.04.1 LTS
What is BUSCO
BUSCO stand for Benchmarking Universal Single-Copy Orthologs which can be used to assess the completeness of genome assembly and annotation.
I tried several time to use the CEGMA (Core Eukaryotic Genes Mapping Approach) on my Ubuntu14.0 mechine but failed. Then I found the developer of CEGMA has stopped to give any support for it and has suggested to use BUSCO.
- Python 3
sudo apt-get install python3
sudo apt-get install ncbi-blast+
- HMMER (HMMER 3.1b2)
sudo apt-get install hmmer
- Augustus 3.0.x (genome only)
- EMBOSS tools 6.x.x (transcriptome only)
sudo apt-get install emboss
- Download latest script of BUSCO from HERE, from Software & User Guide section, and unzip it. It will create a directory 'busco'. This directory shoul have following files : BUSCO_userguide.pdf, LICENSE, release_notes,BUSCO_v1.1.py,README.txt,sample_data
- Download the library of lineage-specific BUSCO data from HERE, from Dataset section, and extract in same directory of script. I downloaded the eukryotes specific file whose name is eukryota
- Genome assembly assessment
python BUSCO_v1.1b.py -o NAME -in ASSEMBLY -l LINEAGE –m genome
- Gene set assessment:
python BUSCO_v1.1b.py -o NAME -in GENE_SET -l LINEAGE -m OGS
- Gene set assessment:
python BUSCO_v1.1b.py -o NAME -in TRANSCRIPTOME -l LINEAGE -m trans
NAME- name to use for the run and all temporary files ASSEMBLY/GENE_SET/TRANSCRIPTOME - file in fasta format
LINEAGE - path to the lineage to be used (-l eukryota for example)
How to run BUSCOTo test the BUSCO, I downloaded the core eukryotic gene list from CEGMA and choose 9 Arabidopsis genes
At1g73030 At3g60360 At5g11900 At3g56490 At3g25980 At5g23900 At1g06790 At5g49510 At5g10780I run the BUSCO python script like this
python3 BUSCO_v1.1.py -o test -in input -l eukaryota -m trans -fwhich produces following in my terminal
*** Running tBlastN *** Building a new DB, current time: 08/13/2015 09:09:35 New DB name: test New DB title: input Sequence type: Nucleotide Keep Linkouts: T Keep MBits: T Maximum file size: 1000000000B Adding sequences from FASTA; added 9 sequences in 0.000671148 seconds. *** Getting coordinates for candidate transcripts! *** *** Extracting candidate transcripts! *** Translating candidate transcripts ! Translate nucleic acid sequences Translate nucleic acid sequences Translate nucleic acid sequences Translate nucleic acid sequences Translate nucleic acid sequences Translate nucleic acid sequences Translate nucleic acid sequences Translate nucleic acid sequences Translate nucleic acid sequences *** Running HMMER to confirm transcript orthology *** Total complete BUSCOs found in assembly (<2 data-blogger-escaped-3="" data-blogger-escaped-:="" data-blogger-escaped-buscos="" data-blogger-escaped-duplicated="" data-blogger-escaped-partially="" data-blogger-escaped-recovered="" data-blogger-escaped-sigma="" data-blogger-escaped-total="">2 sigma) : 0 Total groups searched: 429 Total BUSCOs not found: 426 Total running time: 5.930385589599609 secondsSince I didn't use the plant specific library that may be the reason that BUSCO identified only 3 genes in default mode
ProblemsI notice few problem during installation and use of BUSCO
- As you have notice that I have run the BUSCO script as python3 not python. So if run your BUSCO script like this
python BUSCO_v1.1.py -o test -in input -l eukaryota -m trans -fyou may get an error like this
Traceback (most recent call last): File "BUSCO_v1.1.py", line 32, inbecause python script is using the older python version.
import queue ImportError: No module named queue
- After successful run, BUSCO script will produce one directory run_test (name depend upon your '-o' value) and a file 'temp'. If you run BUSCO script without deleting the old file and directory it will give following error
python3 BUSCO_v1.1.py -o test -in input1 -l eukaryota -m trans -f *** Running HMMER to confirm transcript orthology *** Traceback (most recent call last): File "BUSCO_v1.1.py", line 598, in
name=i[:-7];group=transdic[name] KeyError: 'AT1G06790.1'
First of all, let me clear it that i am not a Bioinformatician. i am simple plant biology researcher who face problem in her daily research life and who bother to post solution of those problem on this webpage. That's it.