How to download only viridiplantae miRNA from miRBase

There is no direct way to download the organism specific miRNA from miRBase database. So I extracted the miRNA of viridiplantae plant from miRBase using some unix command. Steps are as follows
  •  Download the information regarding organisms from HERE.
  • Download the mature miRNA sequence from HERE
  • Extract both files in same directory
  • Download the fasta dereplicating python script from HERE
  • Now run the bash script given from the same directory
  • #!/bin/bash
    #script to extact plant mirna from mirbase database
    
    # convert fasta to tab
    awk 'BEGIN{RS=">"}{gsub("\n"," ",$0); print ">"$0}' mature.fa >mature.tab
    
    
    #extract the organisms belong to Viridiplantae. You can extract the miRNA for other
    # organism too by changing the word "Viridiplantae"
    grep Viridiplantae organisms.txt >plants_mirbase.txt
    
    # extract name of plants
    awk '{ print $3 " " $4 }' plants_mirbase.txt >plant_name.txt
    
    #extract mirna for plants
    grep -f plant_name.txt mature.tab >plant_mirna.tab
    
    #convert tab to fasta
    awk '{print ""$1" "$2" "$3" "$4" "$5"\n"$6}' plant_mirna.tab > plant_mirna.rna
    
    #convert RNA to DNA
    sed '/^[^>]/ y/uU/tT/' plant_mirna.rna  >plant_mirna.fasta
    
    
    #dereplicate mirna file
    python derep.py -i plant_mirna.fasta
    
    #cleaning fasta header
    cat derep_plant_mirna.fasta | awk -F ';' '{print $1}' >plant_mature_mirna_unique.fasta
    
    
    rm mature.tab
    rm plants_mirbase.txt
    rm plant_mirna.tab
    rm plant_mirna.rna
    rm plant_name.txt
    rm derep_plant_mirna.fasta
    
    echo mature mirna from all plants are in plant_mirna.fasta!!!
    echo unique mature mirna from all plants are in plant_mature_mirna_unique.fasta!!!
    echo all job done!!!
    
    
Basically the above bash script extract the miRNA from plant deposited to miRBase database and save them to a file plant_mirna.fasta. In second part, it remove the duplicate miRNAs and save them in another file plant_mature_mirna_unique.fasta.
How to remove duplicate sequences from FASTA file HERE

3 comments:

  1. i got an error when running the script in line 5 syntax error, how can i edit ?

    ReplyDelete

Have Problem ?? Drop a comments here!