How to Extract Multiple Sequence from Multi Fasta File with PERL-III

This is another PERL script to extract the FASTA sequence from a file by their accession number or ID stored in another file.
Perl Script

You can download the FASTA sequence extract PERL scripts from here Download Download


perl >result.txt

Sort FASTA File by Sequence Name


I got a multi fasta file containing several isoform of the same locus. So my task was to get rid of the multiple version of same gene and retain the longest isoform. To do so I decided to sort the FASTA sequences alphanumerically by their header and then discard the smaller versions of the same locus. Previously I share a method to sort FASTA file by sequence length but in this case I was interested to filter the FASTA file by sequence name.  


I found on PERL script for FASTA file sorting from Dr. Naoki Takebayashi's lab from University of Alaska. This perl script depends upon to different modules of Bioperl: Bio::SeqIO and Getopt::Std
How to install Bioperl HERE
PERL script
#!/usr/bin/perl -w

my $usage="\nUsage: $0 [-hrg] [fastaFileName1 ...]\n".
    "  -h: help\n".
    "  -r: reverse\n" .
    "  -g: remove gaps '-' from the sequence\n".
    "Sort FASTA sequences alphabetically by names.  If multiple files are \n".
    "given, sequences in all files are marged before sorting.  If no \n".
    "argument is given, it will take STDIN as the input.\n" .
    "Note that the entire sequence label including spaces is used as\n".
    "the name.\n";

our($opt_h, $opt_g, $opt_r);

use Bio::SeqIO;

use Getopt::Std;
getopts('hgr') || die "$usage\n";
die "$usage\n" if (defined($opt_h));

my $format = "fasta";
my @seqArr = ();

@ARGV = ('-') unless @ARGV;
while (my $file = shift) {
    my $seqio_obj = Bio::SeqIO->new(-file => $file, -format => $format);
    while (my $seq = $seqio_obj->next_seq()) {
       # need to deal with spaces
 $seq->desc( $seq->id . " ". $seq->desc);

 push(@seqArr, $seq);

if (defined($opt_r)) {
    @seqArr = sort { - ($a->desc() cmp $b->desc()) } @seqArr;
} else {
    @seqArr = sort { $a->desc() cmp $b->desc() } @seqArr;

my $seqOut = Bio::SeqIO->new(-fs => \*STDOUT, -format => $format);
foreach my $s (@seqArr) {
    # prints "id desc", and desc was modified, returning it to original
    my $thisDesc = $s->desc;
    $thisDesc =~ s/^\S+ //; # remove the first word.

    if(defined($opt_g)) {
 my $tmp = $s->seq();
 $tmp =~ s/-//g;


