KEGG Sequence Downloader : retrieve gene sequences in Fasta format from KEGG database

I wanted to download the gene sequence of tobacco from NCBI. Since NCBI also contains the isoform and some other unwanted genes, therefore I choose to get it from KEGG. Although KEGGREST is a wonderful R package to retrieve the data from KEGG, but it limits the retrieval. The following bash script can help to download the thousands of sequences in a single go without any limitation. Although this is a crude solution and there must be an efficient way to do it but it worked for me. Basically, this bash script works in three steps:
  • Split IDs in a given chunk 
  • Download fasta sequences as HTML file 
  •  Clean HTML file and save the result

Uses

bash KEGG_sequence_downloader.sh query_file number_of_sequence
How to download only viridiplantae miRNA from miRBase HERE

Script


Script name Download
KEGG_sequence_downloader.sh

Easiest way to find number of cluster in gene expression data

Gene clustering is a common method to find the groups of the gene with similar expression patterns.  However, it is not always easy to decide the number of clusters in the whole datasets. The following R script uses the most popular methods for determining the optimal clusters. This R script uses "TF_average.csv" as input and saves the result as "optial_cluster.png".

How to perform Non-metric multidimensional scaling (NMDS) analysis script HERE

Prerequisite

We need the following R libraries to run the script
  • factoextra
  • NbClust

optimal_cluster_finder.R


Easiest way to calculate Ka Ks ratio and divergence time

Draw a heatmap with Custom Symbol in Cell

How to add function descriptions to FASTA sequences