Extract Part of a FASTA Sequences with Position
Actually I have hundreds of protein sequence and I identified the conserved domain sequence from all those hundreds of protein sequences. Now I got the location of all those domains and want to extract the exact sequence from that locations. So it is easy if I have a single sequence and have location of one or more domain in my protein but it's very difficult to extract out the domain sequences from many protein sequences with the help of domain location coordinates. I found a easy python script to extracting fasta sequences based on position. I have also shared a online program originally written by Dr Pierre Lindenbaum HERE .
Remove Empty Fasta Sequences from a file
How to Extract Multiple Sequence from Fasta File
Add FASTA Description to Multiple Sequences
Example FASTA file with protein sequence
>AT1G01250 MSPQRMKLSSPPVTNNEPTATASAVKSCGGGGKETSSSTTRHPVYHGVRKRRWGKWVSEIREPRKKSRIWLGSFPVPEMAAKAYDVAAFCLKGRKAQLNFPEEIEDLPRPSTCTPRDIQVAAAKAANAVKIIKMGDDDVAGIDDGDDFWEGIELPELMMSGGGWSPEPFVAGDDATWLVDGDLYQYQFMACL >AT1G03800 MTTEKENVTTAVAVKDGGEKSKEVSDKGVKKRKNVTKALAVNDGGEKSKEVRYRGVRRRPWGRYAAEIRDPVKKKRVWLGSFNTGEEAARAYDSAAIRFRGSKATTNFPLIGYYGISSATPVNNNLSETVSDGNANLPLVGDDGNALASPVNNTLSETARDGTLPSDCHDMLSPGVAEAVAGFFLDLPEVIALKEELDRVCPDQFESIDMGLTIGPQTAVEEPETSSAVDCKLRMEPDLDLNASP
Example ID file with domain location
AT1G01250 45 102 AT1G03800 65 109
python domainseq.py input.fasta ids.txt > result.fasta
>AT1G01250:45-102 IREPRKKSRIWLGSFPVPEMAAKAYDVAAFCLKGRKAQLNFPEEIEDLPRPSTCTPR >AT1G03800:65-109 AEIRDPVKKKRVWLGSFNTGEEAARAYDSAAIRFRGSKATTNFP
First of all, let me clear it that i am not a Bioinformatician. i am simple plant biology researcher who face problem in her daily research life and who bother to post solution of those problem on this webpage. That's it.