Genome annotation is to add an annotation to the genome, more specifically, to describe the attribute (Qualifier) of the feature identified on the genome sequence.
Feature type is classified as Feature Key. The position of Feature is recorded as Position. By the Position on the genome base sequence, the nucleotide sequence occupied by the Feature can be found. In addition, coding region (CDS) translated into amino acid indirectly holds translated amino acid sequence information via Genetic Code Table. It is also important that features such as CDS are identified on either side of the double helix, and it is usually written using the position operator Complement.
To execute annotation, it is necessary to create a database of nucleic acid sequences or amino acid sequences with annotations added.
IMC has a function to automatically generate a database for Blast search simply by loading the genome sequence.
This function can be used to annotate unknown genomes of closely related species.
A function of generating a large database is also implemented.
The CreateDB function generates a database for local blast search using a file of SuperKingdom that can be downloaded from NCBI and others.
The search database can generate both nucleic acid sequences and amino acid sequences databases.
These files are very large and numerous. When searching, you can join many generated databases and use them as one database name.
It is also possible to build a database on an external server so that it can be searched via the net.
Tool group for performing genome annotation.
Mainly, the following software are used.
Many software run on Linux, so when using it on Windows, we use an external server etc.
By using Windows emulator, it can also be used in Windows PC local environment.
Since genome annotation processing requires a lot of computation time and often takes a long time, it is widely practiced to execute this on an external server and obtain the result.
In addition, the genetic identification software used for genome annotation is developed to operate on Linux in many cases, it may not be possible to run on Windows.
Even in such a case, there is the merit of installing those software on the external Linux server and annotating it.
Because Mac OS X is based on Linux, many of those identification software work directly in the local environment of the Mac.
However, also in this case manual operation may be delayed when performing manual annotation in parallel, there is a merit of annotating on the external server even on Mac.
IMC has a function to create an annotation database on an external server, and it is easy to generate a search database on the server.
It is a function to perform genome annotation fully automatically.
When you register a genome base sequence to be annotated, annotation to that genome sequence is performed fully automatically.
You can select the identification software to use before execution and the sequence database to use.
Manual annotation can be performed using the annotation function installed in the IMC.
IMC has the following annotation tools.
During annotation, we may refer to a lot of information and use special features and qualifiers.
In addition, naming and sequential number addition often become unavoidable in using temporary things in the middle of annotation.
However, when the annotation is over, much of the above information is unnecessary, and submission of annotation may result in errors.
IMC has a function to organize and refine the annotation results when the annotation is over.
The main items are listed below.