How to Parse/Edit FASTA header

Imagine a situation that you have thousands of FASTA sequences in a file and want to shorten or edit the FASTA header instead of whole long unnecessary information.

  1. You can also parse FASTA header using PERL scripts. Some are given below.     DOWNLOAD
#!/bin/env perl
while (<>) {
Uses input.txt     DOWNLOAD
#!/bin/env perl
while (<>) {
  if (/^(>\S+)/) {
    print "$1\n";
  } else {
Uses input.txt     DOWNLOAD

use strict;
use warnings;
use Bio::SeqIO;

=head1 Synopsis

Input header >gi|120419786|gb|EH270482.2|EH270482 Gp_mxAA_21G01_M13R mxA Gammarus pulex cDNA clone Gp_mxAA_21G01 5', mRNA sequence

Output header >gi120419786 


unless (@ARGV ==1){ die "Usage:  fastaFileName";} 

my $origFile = shift;  
my $newFile=$origFile . ".txt";

my $seq_in  = Bio::SeqIO->new( -format => 'fasta',
                                   -file => $origFile);

my $seq;
my $seq_out = Bio::SeqIO->new('-file' => ">$newFile",
                                       '-format' => 'fasta');

while( $seq = $seq_in->next_seq() ) 
 my $seqName = $seq->id;
  $seqName =~ s/\|/\./g; #replace pipe with dot
        $seqName =~ s/(gi)\.(\w*)\..*/$1$2/;  
 #my $desc = $seq->description;

print "Your sequences have been renamed and are in the file $newFile\n\n";
Uses input.txt
Dependencies This FASTA header PERL script required Bio::SeqIO; for proper functioning.

No comments:

Post a Comment

Have Problem ?? Drop a comments here!