How to download only viridiplantae miRNA from miRBase

There is no direct way to download the organism specific miRNA from miRBase database. So I extracted the miRNA of viridiplantae plant from miRBase using some unix command. Steps are as follows
  •  Download the information regarding organisms from HERE.
  • Download the mature miRNA sequence from HERE
  • Extract both files in same directory
  • Download the fasta dereplicating python script from HERE
  • Now run the bash script given from the same directory
  • #!/bin/bash
    #script to extact plant mirna from mirbase database
    
    # convert fasta to tab
    awk 'BEGIN{RS=">"}{gsub("\n"," ",$0); print ">"$0}' mature.fa >mature.tab
    
    
    #extract the organisms belong to Viridiplantae. You can extract the miRNA for other
    # organism too by changing the word "Viridiplantae"
    grep Viridiplantae organisms.txt >plants_mirbase.txt
    
    # extract name of plants
    awk '{ print $3 " " $4 }' plants_mirbase.txt >plant_name.txt
    
    #extract mirna for plants
    grep -f plant_name.txt mature.tab >plant_mirna.tab
    
    #convert tab to fasta
    awk '{print ""$1" "$2" "$3" "$4" "$5"\n"$6}' plant_mirna.tab > plant_mirna.rna
    
    #convert RNA to DNA
    sed '/^[^>]/ y/uU/tT/' plant_mirna.rna  >plant_mirna.fasta
    
    
    #dereplicate mirna file
    python derep.py -i plant_mirna.fasta
    
    #cleaning fasta header
    cat derep_plant_mirna.fasta | awk -F ';' '{print $1}' >plant_mature_mirna_unique.fasta
    
    
    rm mature.tab
    rm plants_mirbase.txt
    rm plant_mirna.tab
    rm plant_mirna.rna
    rm plant_name.txt
    rm derep_plant_mirna.fasta
    
    echo mature mirna from all plants are in plant_mirna.fasta!!!
    echo unique mature mirna from all plants are in plant_mature_mirna_unique.fasta!!!
    echo all job done!!!
    
    
Basically the above bash script extract the miRNA from plant deposited to miRBase database and save them to a file plant_mirna.fasta. In second part, it remove the duplicate miRNAs and save them in another file plant_mature_mirna_unique.fasta.
How to remove duplicate sequences from FASTA file HERE

BLAST Database creation error

I was trying to create a BLAST database but I got this error
Building a new DB, current time: 12/03/2015 09:44:18
New DB name:   plant_protein
New DB title:  /home/sanjay/bin/Genomes/plant_protein_from_plantgdb.fa
Sequence type: Protein
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B

volume: plant_protein

file: plant_protein.pin
file: plant_protein.phr
file: plant_protein.psq

BLAST Database creation error: FASTA-Reader: No residues given
Then I looked whether my any FASTA sequence is empty or not by running this command
grep -c "^$" ~/bin/Genomes/plant_protein_from_plantgdb.fa
I found that there is one sequence which have only FASTA header. To remove the empty FASTA sequence I run this command
awk 'BEGIN {RS = ">" ; FS = "\n" ; ORS = ""} $2 {print ">"$0}' ~/bin/Genomes/plant_protein_from_plantgdb.fa >~/bin/Genomes/plant_protein_from_plantgdb.fasta
And finally I got the happy success message
Building a new DB, current time: 12/03/2015 09:48:01
New DB name:   plant_protein
New DB title:  /home/sanjay/bin/Genomes/plant_protein_from_plantgdb.fasta
Sequence type: Protein
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 980219 sequences in 24.2583 seconds.

How to install NCBI BLAST program on your computer HERE

eXpress Error : exists in MultiFASTA but not alignment (SAM/BAM) file

How Do I Install and Use BUSCO on Ubuntu 14.04.1 LTS