We have published a number of advanced and unique solutions that can be realized using in silico biology's software. It describes how to solve in silico biology products mainly for given tasks.
|
|
There are the following methods to launch IMC.
There are the following methods to launch IMC.
There are the following methods to launch GT.
Double click the desktop icon
Launch from the start menu (Windows)
Launch from the command prompt (Windows)
Launch from Terminal (Mac)
There are the following ways to terminate IMC.
The following operations are often used in IMC.
Please visit when using IMC for the first time.
Please read the following when using GenomeTraveler (GT) for the first time.
Sample data will be installed automatically when GT is started for the first time.
Using this sample data, you can practice the operation.
If sample data is not displayed, confirm the demo directory and change it to the root directory and it will be displayed.
Sample data includes the following.
Contains the result of importing the BAM file used for assembly and mapping. Each fragment of the BAM file can be viewed with the Alignment Viewer.
The reference genomic data and the result of mapping NGS lead using LAST on its genome are stored. You can browse and analyze the reference genome data by activating the IMC, and you can view and analyze the mapping by starting the Alignment Viewer. It is also possible to execute new mapping using the same NGS data.
Contains the AFG file of assembly result of NGS lead using OASES. Convert this to a BAM file and start the Alignment Viewer. It is also possible to use the same lead and execute a new assembly.
contains reference genomic data and results mapped using SLIDESORT on its genome. You can browse and analyze the reference genome data by activating the IMC, convert the mapping result to a BAM file, and view and analyze it with the Alignment Viewer. It is also possible to execute new mapping using the same NGS data.
The AFG file of assembly result of NGS lead is stored using Velvet assembler. Convert it to BAM file and start up Alignment Viewer. It is also possible to use the same lead and execute a new assembly.
|
IMC's workbench consists of the following elements.
Elements marked with * can be docked out (removed) and docked in (incorporated) from the workbench window. |
|
The menu bar launches each function of the IMC.
Many functions have both a menu bar menu and a toolbox icon, both of which operate equally.
There are things with menus other than the menu bar.
There are menus on the lane or on the feature lane feature, and perform limited functions on preselected ranges.
The main tool box launches each function of the IMC.
Many functions have both a menu bar menu and a toolbox icon, both of which operate equally.
The main toolbox is usually docked in at the top of the main window, but you can dock out of the window or minimize the display as needed.
Individual main tool icons can be hidden independently.
Various windows and dialogs other than the main window also have their own tool boxes.
However, these visibility can not be changed.
Default feature key
Create, edit, delete feature keys
Template function
Feature key search
Feature Keys
Attributes of Feature Keys
IMC has its original feature key set.
These Original feature keys exist to control the drawing and editing of features.
There is also a function to remove all original feature keys at once.
Types and Roles of Qualifiers are explained here.
It collects information on the display method and editing method of Qualifier showing the attribute of Feature.
The following shows two ways of displaying the value of Qualfier described in Feature.
Explanation on the location of the feature on the genome sequence.
Fragmentation of features by restriction enzymes and PCR amplification.
Explain synthesis of features by PCR or ligation.
Feature Fusion is a feature that merges features belonging to the same Feature Key in two identical positions into one. Qualifiers recorded in each feature to be merged are kept equal on either side as the Qualifier of the merged feature.
This feature makes it easy to combine multiple annotations separately attached to the same feature into one.
The Feature operators are logical arithmetic functions of performing logical operations on two mutually overlapping features belonging to the same Feature Key and generating new features.
A feature where the upper and lower arrows in the above figure overlap, and the center divided arrow is a computed feature generated.
You can link to individual features and activate the visualization tool as follows.
Mapping to Individual Features
You can select the base sequence or amino acid sequence of the feature and export it as a file.
The Feature Export function has the following.
Besides, using the function to output the result of the search function as a file actually becomes a feature export.
Add a serial number to any Feature Qualifier.
You can select Qualifier to be serial number.
Normally, users can not register their original qualifiers, but you can create your own original qualifier that is used only when adding sequential numbers.
Sequential numbers consist of prefixes + sequence numbers, and consecutive numbers are further generated with a starting number and an increment.
The features that are subject to the sequential numbering are added in order from the one closest to the base position of the genome base position.
Not only features belonging to one type of feature key, but also features belonging to multiple feature keys can be mixed and added sequentially.
Sequential numbers can also be added for narrowed objects such as search results.
Sequential numbers can be added with the following search function.
Genome feature map creation / drawing function
|
By using the feature layout style, you can easily draw feature maps with complicated feature layouts by applying pre-registered layouts when drawing a feature map by combining multiple functional lanes.
In the sequence lane, the base of the currently loaded genome sequence file and the amino acid sequence if it is the coding region are displayed for each strand.
A sequence lane can be placed anywhere in the feature map. You can place more than one sequence lane on one feature map if necessary.
Four bases and 20 amino acids can be displayed individually in different colors.
Changing the drawing color and font size is done in the Sequence Lane tab pane of feature setting dialog.
You can change which position of amino acid 1 character code is placed in codon.
You can not change the font type.
Content profile lane is a lane for displaying numerical values such as base composition, sliding window method and their profile.
The registered profiles that can be displayed in the content profile lane are as follows.
Navigation Lane displays the bird's-eye view map of the entire Feature Map, and you can move the display position of the Feature Map with the slider.
The local genome rearrangement map lane (LGRM lane) is a lane for drawing the local genome rearrangement map on the main feature map.
Display mutations in base units between two closely related strains.
The mutation list dialog is linked to this lane.
Implementation Edition
IMCGE, IMCAE, IMCDS, GT
Expression profile lane is a lane for graphically displaying expression data by tiling array or RNA-Seq.
Implementation Edition
IMCAE, IMCDS, GT
Multiple expression profile lanes can be registered and displayed in the main feature map, and the number of lanes that can be registered and displayed on one feature map is not limited.
When the number of display lanes increases, it is necessary to scroll vertically.
The lane can display the expression level depending on the genome position as a bar graph or a line graph.
You can change display and calculation settings for each lane.
Vertical Scroll Bar Lane
Sequence Scale Lane and Map Scale Lane
This is an operation explanation about the sequence data file, feature data file, optional data file, etc. used in IMC.
Load genomic sequence and amino acid sequence file into IMC.
Save genomic sequence and amino acid sequence as file.
Import data from the outside and map it on the sequence.
Extract data from the sequence and export it externally.
Convert data from one format to another.
Describe operations of restriction enzyme data file, primer data file, option data · file, etc.
|
The operations common to various search functions are explained here.
Counts the number of feature keys registered in the main current array and displays a list.
Several features to be searched can be selected, and the search range can be limited.
On the search result screen, buttons for feature key number, position on array, base length, gene name, base sequence, button to annotation to descriptions are listed.
In CDS, it is possible to perform operations such as selection of amino acid sequence, + chain strand only, collective deletion, codon table, serial number addition, Fusion PCR.
When you click a line, the main feature map shows the location of that feature.
File output in CSV / FastA / GenBank format is possible.
By simply loading the genomic base sequence into the reference genome map, a database for Blast search is automatically created and homology search (Blast) search for that database can be easily performed.
For homology search, it is necessary to specify the query sequence and sequence database and search program.
A homology search is performed using the entered sequence on the genome sequence displayed on the main feature map.
Homology search is performed in bacth and mass with all the features present in the designated genomic region as query sequences.
Please refer to restriction enzyme treatment.
Please refer to mutation search function.
IMC has various display windows as follows.
|
It is a viewer / editor prepared for genome annotation.
It is a window independent from the main window.
Multiple Linear Genome Map Viewer is a viewer that compares annotation such as genes of multiple closely related species genomes in parallel for each genome in parallel and is docked in the IMC main window.
We call aliases as linear genome map viewer, reference genome map, reference map etc.
Circular Genome Map (Circular Genome Map) is a function to compare and draw Features and Contents from arbitrary genomic chromosomes or genomic chromosomes of closely related species and arbitrary numerical data specific to genome position on concentric circles.
You can display distribution status of each gene such as genes of each genomic chromosome, change of base composition, etc.
The order of concentric circles of features and content profiles to be placed can be changed arbitrarily.
Drawing parameter setting can be changed for each concentric circle.
It is also possible to change the diameter of the circular genome map and the printing parameters.
For printing a circular genome map, in addition to direct printing, you can output files in PDF, PNG, and EMF formats.
A large number of features drawn on the concentric circle of the circular genome map can be classified and displayed in different colors depending on the annotation type if there are annotations.
Sequence files drawn on the circular genome map are referenced in the order shown in the main current directory. For this reason it is easy to manipulate drawing a number of different circular genome sequences in the same format.
By clicking on a part of the circular genome map it is possible to display the area of the corresponding main feature map.
In the circular genome map, the sequence file name and total base length (when single genomic chromosome) are drawn at the center, and the base number scale is displayed around it.
The starting position base of the scale can also be changed.
The map layout can be registered as a layout style, and the registered circular genome layout style can be called later.
It is a viewer / editor which is used when reading output file from ABI, SCF capillary sequencer.
Displays not only the base sequence but also the trace waveform.
The description window is a window for entering and editing values in the feature's Qualifier.
Launched with a right click on the feature and you can edit all the Qualifiers for that feature.
Change the Position of the feature.
Change the Feature Key to which the feature belongs.
This is a viewer that displays profiles such as secondary structure of amino acid sequence in parallel. Total profile design and display, motif display and so on are possible.
Multiple alignment viewer is a window for displaying multiple alignment of DNA sequences and amino acid sequences.
Multiple alignment viewer can be launched from homology search result dialog etc.
Molecular phylogenetic tree viewer is a tool to draw molecular phylogenetic trees from sequences of such as genes.
This viewer can be launched from multiple alignment viewer.
Alignment Viewer is a viewer editor that displays the primary analysis result of GT and executes secondary analysis etc.
Alignment Viewer can be launched from GT's main window.
The Alignment Viewer consists of a menu bar, a tool box, and four panes.
Although the menu bar is fixed in the Viewer window, the toolbox and the four panes can be removed from the window and displayed at any position on the desktop.
The toolbox can also be hidden.
The four panes are named as Navigation Pane, Feature Pane, Consensus Pane and Fragment Pane from the top.
Navigation Pane is a special pane for navigating the other three panes and has a two-step scale of genome-wide scale and scale within the selected region.
Feature Pane has functions similar to IMC's main feature map and can perform most functions of IMC.
Consensus Pane is a function to display the Assembly result as a sequence alignment of Contig and Read constituting it, and a function to display the Mapping result as a sequence alignment of the reference genome sequence and Read.
Fragment Pane is a function to graphically display Consensus Pane, which allows you to view assemblies and mapping in a more bird's-eye view.
This is a window for drawing a restriction enzyme map.
It also draws a feature map similar to the main feature map.
Zoom, horizontal scroll, and gel electrophoresis diagrams are available.
IMC is characterized by being able to perform operations of cloning experiment on computer. In this case, special data is unnecessary and it is possible to clone DNA sequence data which can be obtained from a public database such as GenBank or EMBL as it is. For cloning, restriction enzyme digestion, PCR primer design, PCR amplification, and ligation can be performed without changing the annotated sequence. All resulting cloning products are output in GenBank / EMBL format. Since Primer information is pasted and stored on the DNA sequence, it is also useful for Primer management.
It is a cloning function that can actually cut / ligate.
in silico cloning experiment is possible.
It will help you to understand invisible molecular biology experiments. It can be used for assisting and simulating molecular biology experiments.
You can construct arbitrary Vector / Plasmid.
We will list optimal restriction enzymes for Vector insert check and simulate the gel electrophoresis results.
Using IMC's in silico cloning function, continuous cloning experiment on PC or Mac is possible.
Digestion of the genomic DNA sequence with restriction enzyme, gel electrophoresis of the fragment, ligation to the opened vector, etc. can be carried out in succession.
With the restriction enzyme function of IMC, if annotation (Feature) is added to the original DNA base sequence, it can be fragmented with the annotation kept.
If one feature is divided by a restriction enzyme, its feature itself is inherited, but in some circumstances the feature key may be changed automatically.
The terminal shape of restriction enzyme digested fragments is preserved correctly. By doing this, we check whether ligation is possible when ligation between pieces is done.
In E. coli, it can be checked whether the methylation site affects restriction enzyme digestion.
It is also possible to register frequently used restriction enzymes as a set.
It is also possible to register newly provided restriction enzymes or delete restriction enzymes that are no longer used.
IMC has various PCR Primer design methods.
The ability to simply drag the nucleotide sequence on the sequence lane and register it as a primer
Ability to design features on the feature lane and primers to amplify inside and outside the selected area
A function of designing primers that amplify a large number of areas at once (with Iterate Design function: it is a function to repeat the design until there is no area that can not be designed)
The following is a function to design a group of primers for cloning multiple DNA fragments at once, such as gene cluster design. From design to cloning, you can load cloned products into the IMC.
In the PCR function of IMC, the priming site of the registered primer set is searched from the genome base sequence. If both primers have a priming site on the genome, PCR amplify the region sandwiched between the primers.
In this case, adenine can be made to protrude by 1 base. It is used for TA cloning.
An annotation is inherited also in the amplified PCR product when the template DNA sequence is annotated (Feature).
It is also possible to perform PCR on multiple template DNA sequences.
Purpose and overview
Function
Restrictions
Algorithm
When a sequence of PCR products is searched for a group of contig sequences and the PCR product bridges between two contigs, combine the contigs into one sequence.
For Plasmid Map creation, please see Plasmid Map Viewer.
When DNA sequence is digested and fragmented with restriction enzyme, when PCR amplification is carried out, ligation can be displayed as results of gel electrophoresis of those products.
You can modify the ends of DNA fragment sequences.
The terminal modification functions currently provided by IMC are as follows.
If both ends of the DNA fragment to be inserted are cleaved with the same restriction enzymes or have blunt ends by ligation etc., the optimal restriction enzyme candidate for displaying the orientation of the insert is indicated by gel electrophoresis diagram .
In the genome design function, a homology arm for homologous recombination is designed from the region to be homologously recombined and the insert sequence.
This function is used to recombine the insert sequence with the designed homology arm into the genome.
Regardless of molecular biology, base sequence processing by information manipulation
Information on large-scale sequencing by NGS and sequencing by conventional capillary sequencer.
Data analysis of NGS can be executed with GenomeTraveler.
Capillary sequencing Data analysis can be performed with IMC.
The lead sequences from the sequencer are linked to each other using homology to generate a consensus sequence.
There are large lead assembly from NGS and assembly from capillary sequencer.
Assembly of NGS leads can be done in GenomeTraveler.
The lead assembly from the capillary sequencer can be executed with the IMC tool in silico Assembler.
Nanopore related assemblies can be launched from GenomeTraveler.
It is a function to assemble the reads from the next generation sequencer (NGS).
Velvet and OASES are implemented.
This function is implemented on the following edition.
GenomeTraveler
Assemble Reads from Nanopore Seqeuncer.
canu and miniasm are implemented.
This function can be executed with the following edition.
GenomeTraveler
Assembly function of Read from capillary sequencer.
This function is implemented in the following edition.
IMCGE, IMCAE, IMCDS, GenomeTraveler
We will control the quality of the lead from the sequencer.
We will trim out areas that are not accurate, eliminate leads that are totally inaccurate, and improve assembly accuracy.
Finishing function which is the final phase of genome sequencing is explained.
Explanation of basic composition analysis of genes and genomes, codon analysis, ORF extraction, amino acid translation, amino acid profile analysis, motif analysis and so on.
IMC analyzes the base composition of the genome sequence.
You can perform this analysis on sequence files loaded in the current sequence directory.
The Feature Statistics function of the Genome Analysis menu outputs the base composition of all the features in the currently loaded nucleotide sequence file.
The content profile lane that can be displayed on the main feature map is also one of the functions of displaying base composition.
Changes in composition by moving average method are displayed graphically.
The content profile lane is also implemented in the circular genome map viewer.
The "Cluster Design Checker" function evaluates the base composition of the specified cluster and checks whether it is in the preset base composition range or not.
IMC outputs the codon composition and amino acid composition table of coding region (CDS).
You can perform this analysis on annotated sequence files loaded into the current sequence directory.
The "Show Codon Usage ..." function in the Genome Analysis menu not only outputs the individual amino acid composition and codon composition of all the CDSs identified in the currently loaded sequence, but also the amino acid composition of the total CDS , Codon composition can also be output.
Even with the Statistics function, the type of start codon is output for each CDS.
The codon substitution function "Change Codon" allows you to replace the codon composition according to the specified Codon Usage file.
IMC provides a simple ORF candidate extraction function.
Extracts on the 6 frames of the current base sequence those having a length equal to or longer than the base length specified in the region from the stop codon to the next stop codon.
ORF candidates can be converted to CDS and amino acid translation can also be performed.
When the ORF candidate region encompasses other ORF candidate regions, one with a larger base length can be adopted.
In IMC, in addition to this, you can launch the Gene Finding programs Augustus and MetaGenome Annotator as external commands and capture the results.
In addition, you can import the output result of a gene identification program such as Glimmer and map it on the current sequence.
The base sequence of the CDS feature on the current base sequence is translated into amino acids.
You can specify the translation range for all CDS on current sequence, CDS on selected area, and one CDS.
Each codon is translated according to the specified Genetic Table.
The secondary structure of the amino acid sequence of the gene are shown as a profile.
Structures such as alpha helix, beta sheet, turn, etc., profiles such as hydrophilicity index, hydrophobicity index, surface quality, chain flexibility, etc. can be displayed in parallel.
Also, you can freely create a comprehensive index that linearly combines these indicators.
If a motif is registered in the motif list of amino acid profile analysis, the position of the motif can be displayed on the amino acid sequence of the amino acid profile display dialog.
Amino acid motif search can be executed from pattern search function.
Genome annotation is to add an annotation to the genome, more specifically, to describe the attribute (Qualifier) of the feature identified on the genome sequence.
Feature type is classified as Feature Key. The position of Feature is recorded as Position. By the Position on the genome base sequence, the nucleotide sequence occupied by the Feature can be found. In addition, coding region (CDS) translated into amino acid indirectly holds translated amino acid sequence information via Genetic Code Table. It is also important that features such as CDS are identified on either side of the double helix, and it is usually written using the position operator Complement.
To execute annotation, it is necessary to create a database of nucleic acid sequences or amino acid sequences with annotations added.
IMC has a function to automatically generate a database for Blast search simply by loading the genome sequence.
This function can be used to annotate unknown genomes of closely related species.
A function of generating a large database is also implemented.
The CreateDB function generates a database for local blast search using a file of SuperKingdom that can be downloaded from NCBI and others.
The search database can generate both nucleic acid sequences and amino acid sequences databases.
These files are very large and numerous. When searching, you can join many generated databases and use them as one database name.
It is also possible to build a database on an external server so that it can be searched via the net.
Tool group for performing genome annotation.
Mainly, the following software are used.
Many software run on Linux, so when using it on Windows, we use an external server etc.
By using Windows emulator, it can also be used in Windows PC local environment.
Since genome annotation processing requires a lot of computation time and often takes a long time, it is widely practiced to execute this on an external server and obtain the result.
In addition, the genetic identification software used for genome annotation is developed to operate on Linux in many cases, it may not be possible to run on Windows.
Even in such a case, there is the merit of installing those software on the external Linux server and annotating it.
Because Mac OS X is based on Linux, many of those identification software work directly in the local environment of the Mac.
However, also in this case manual operation may be delayed when performing manual annotation in parallel, there is a merit of annotating on the external server even on Mac.
IMC has a function to create an annotation database on an external server, and it is easy to generate a search database on the server.
It is a function to perform genome annotation fully automatically.
When you register a genome base sequence to be annotated, annotation to that genome sequence is performed fully automatically.
You can select the identification software to use before execution and the sequence database to use.
Manual annotation can be performed using the annotation function installed in the IMC.
IMC has the following annotation tools.
During annotation, we may refer to a lot of information and use special features and qualifiers.
In addition, naming and sequential number addition often become unavoidable in using temporary things in the middle of annotation.
However, when the annotation is over, much of the above information is unnecessary, and submission of annotation may result in errors.
IMC has a function to organize and refine the annotation results when the annotation is over.
The main items are listed below.
|
Make an alignment between multiple nucleic acid or amino acid sequences.
Multiple selections are possible from the sequence loaded in the main current directory.
You can also activate this function from the homology search result screen.
From the result dialog, it is possible to perform phylogenetic tree drawing function, simple editor function, copy to clipboard, print.
You can change the number of characters displayed per line.
Function to draw the feature and the composition of more than one related species genome in parallel on the concentric-circle. The addendum - the manipulation - a circular lane can be freely deleted from the drawing DIALOG. It is the designation only of the designation feature content imaging order alteration GenomeMap designation assignment feature in the content skew of the judgement of the chromosome base-sequence of more than one designation in the feature of the judgement of the chromosome base-sequence with more than one annotation. Independent color setting by the concentric-circle.
Implemented Editions:IMCGE and IMCAE. |
This function is installed in the following software
IMCGE
IMCAE
GenomeTraveler
Venn Diagram is a diagram used in set theory, showing the distribution of the number of elements between 3 sets (4 or more sets can be drawn, but IMC draws only 3 sets or less) It is.
Elements belonging to each set are shown in the circle, and overlapping parts are shown as common elements.
It is used for biological applications to illustrate the number of orthologous or unique genes between closely related genomes.
Drawing window for global genome rearrangement map. Genomic sequences that are closely related to each other are represented in a straight line at the top and bottom. Many colored line segments that connect the upper and lower line segments are highly homologous in the upper and lower genomes. The color of the line connecting the top and bottom shows the difference in the strength of homology. The line that gathers at one point in the center indicates that this basin is inverted in each other's genome. A small window in the center shows what is displayed when you zoom in on the whole in the navigation window. Screen display and output can be set with the button tools at the top and bottom.
This feature is implemented in the following software editions
Create and draw a genome rearrangement map across the entire length of related species genomes at the feature level.
Compares the total genome length pairwise between the current genome sequence and the selected (several) genome sequences among the related species genome sequences loaded in the current reference directory.
Comparisons with nucleic acid sequences and amino acid sequences are possible. The criteria for determining homology can be changed.
For comparison by amino acid sequence, both genomes must be annotated with CDS.
From the result screen, you can zoom in and out, scroll, pick up homologous lines, display arbitrary feature keys, display non-homologous regions, and display a navigation window.
Result images can be output in PDF / PNG / EMF format. The homology list can be output as a file in CSV format.
The local genome rearrangement map (Local Genome Rearrangement Map) is one of the functions to compare closely related species. A comparison of homology regions along the maximum homology path between two genomes is displayed at the base sequence and amino acid sequence level and annotations on those genomes are displayed.
Detect homologous regions with different closely related genomes on the reference genome and draw a local genomic rearrangement map in which the nucleotide sequence and the features of each genome are aligned. I call this function the local genome rearrangement map.
We create and draw a local genomic rearrangement map across the two closely related species.
For the reference genome, we map the entire genomes of closely related species to be compared.
It displays the homologous region at the base sequence level throughout the genome between the two genomes.
In the coding region, corresponding amino acid residues are aligned.
You can move the map to the corresponding position by listing the mismatch points of amino acid residues and clicking on each mismatch point in the list.
A list of coincident parts and residues can be output as a CSV file.
The list will be reproduced and displayed when reloading.
In the homologous region, the features of both genomes and their positions are aligned.
For regions where there is no homology with the reference genome, the features of the comparison genome will not be displayed.
This is judged from the principle that the LGRM maps the comparison genome to the reference genome, as a region which is not mapped, that is, a region deleted from the comparison target genome.
Analysis results can be saved and re-browsed as GenBank format files of the reference genome.
LGRM display setting can be registered as layout style (LGRM lane).
The LGR Map lane can be placed at any position in the main feature map.
You can customize comparison display of base sequence and amino acid sequence
LGRM reads ordinary array lane for LGRM and uses it.
Displays a control list of EC numbers annotated to the CDS present in each genome from multiple annotated genomic base sequence files.
Gene cluster alignment focuses on one gene of a plurality of closely related genomes, and includes the vicinity thereof (N gene ranges in the upstream and N gene ranges in the downstream), and the relationship between the closely related species genome
The presence or absence of homology is investigated, and when there is BBH, homologous genes are combined and displayed in bands.
The homology score allows you to change the drawing color of the binding band.
For alignment display, use the reference feature map (multiple genome viewer).
Each genome can be scrolled left and right individually.
It is possible to zoom the whole.
If you change the display order of genome, clustering will end.
To print, use the Print button. You can dock out this pane.
Extract the core genome from the pan genome.
Specifically, common genes are extracted from closely related genomic populations.
It is possible to save analysis results in a file.
Once you have finished IMC, you can load this file and continue the core genome analysis.
Calculation time is required if the group is large.
In that case, we recommend running on a computer with a large number of cores.
Implementation edition: IMCGE, AE, DS
Mutation point analysis detects mutation points between related genomes (pangenomes) in base units.
Implementation edition: IMCGE, AE, DS
From the multiple alignment result, draw the molecular phylogenetic tree of the selected feature.
You can specify horizontal display, vertical display, and rootless tree. It is also possible to show / hide the evolution distance.
You can select the shape of the node from Box, Circle, None.
A phylogenetic tree diagram can be output as a file in PDF format.
You can read and display the phylogenetic tree file (dnd format) created externally.
It is possible to map various sequences onto genomic base sequence (s) and register them as features.
Mapping possible sequence files
Feature mapping is the function of mapping the exported feature data onto the genomic sequence and registering it as a feature of its genomic sequence.
As a method of identifying the position to be mapped, there is a method based on the position information of the feature and a method of identifying the position by homology search using the base sequence of the feature.
EST (cDNA) mapping refers to the mapping of EST (cDNA) sequence file to the reference genome base sequence by using its homology to identify the genomic position from which EST is derived and registering the EST (cDNA) feature at that position.
Depending on the EST library, local redundancy may be high, and many EST features may be mapped to the same site on the genome.
To draw such locally highly redundant EST mapping results on the feature map, there is a feature layout feature called Pack Lane. (For details, please refer to FLS: Feature Layout Style, Pack Lane.)
There are two types of EST mapping.
Trace mapping is a function to map the trace waveform data from the capillary sequencer onto the genomic base sequence from which it is derived.
Using the trace waveform viewer, you can view the aligned trace waveforms.
It is a function to map mutation data of gbSNP or jSNP onto the corresponding genomic base sequence.
Map the amino acid sequence on the reference genome base sequence and register it as a new feature.
It maps the specified amino acid sequence (s) to the reference genome currently displayed as a feature map and registers the hit entry as a new feature (such as mRNA).
For the mapping, the tBlastN algorithm is used.
Mapping results are classified into three types.
Results of the former two types can be registered as new features on the reference genome. For example, a perfect match sequence can be registered as an mRNA feature and incomplete match sequences can be registered as miscRNA features.
You can set the same Value for any Qualifier of the newly registered feature. In addition, it is possible to register sequential numbers starting with arbitrary numbers, with the prefix specified in Qualifier / locus_id and any number of digits.
In the case of a eukaryote with an intron, it is registered as a feature by identifying the exon-intron region.
You can limit the maximum base length of introns.
Before performing array analysis, it is necessary to map the tiling array probe to the target genomic base sequence and register the position and information on the genome of each probe as a feature.
Expression data of the array is obtained for each probe, so when loading the expression file, individual expression level information is posted on the features of each probe.
In IMC, Blast search results of each CDS feature identified on the genome sequence can be saved as qualifiers of each feature.
These results of homology search can be written on qualifiers of each CDS feature by mapping the result of Blast search on an external server or the like to the current genome sequence using the Blast search result mapping function.
You can correctly map Blast search results by exporting Feature Key Search results in advance and using them as a query sequence.
This function maps the Read of Next Generation Sequencer (NGS) on the reference genome sequence.
It is implemented in the following software.
GenomeTraveler
This function can be executed with the following edition.
AE, DS, GT
Automatically design tiling array probes.
You can specify the probe base length.
You can specify the distance (base number) between the leading bases of each probe.
You can specify the strand to design the probe (Forward Strand, Reverse Strand, Both). Probes can be designed avoiding specified annotations (feature keys) such as rRNA.
If there is any annotation at the position where the probe was designed, it is possible to capture its contents.
An arbitrary initial letter (prefix) and a sequential number of the specified number of digits can be specified for the probe name.
Currently it is possible to output in three different formats.
This function can be executed with the following edition.
AE, DS, GT
This is a function to map a probe to annotated nucleotide sequence file.
You can read the probe file and map it to the origin position on the current genome sequence.
When there is position information on the genome, paste the probe feature to the corresponding genome position based on the position information of each probe.
If position information does not exist and only the base sequence of the probe is given, mapping is performed based on the homology of the probe base sequence.
Currently mappable probes belong to the following array makers.
This function can be executed with the following edition.
AE, DS, GT
Import tiling array expression data files and NGS RNA-Seq files so that they can be used for array expression analysis.
There is no limit on the number of expression data files that can be imported, but you can only import 10 files with one operation.
Probe mapping must be performed before this function can be executed.
You can load a genome base sequence file, map a probe file, import a expression data file, register it as a batch process, and execute it in the background.
The expression level for each probe can be displayed as a bar graph or a line graph on the array profile lane of the main feature map.
There is no limit on the number of lanes to display at the same time, but as you increase the number of lanes, it consumes more memory.
Any number of expression profile lanes can be displayed on the main feature map.
Also, you can move its position freely up and down.
Expression profiles belonging to that region are also inherited if genomic sequences with expression profiles are digested, truncated, amplified, ligated, etc. by cloning operation.
This function is installed in the following software edition
IMCAE, IMCDS, GenomeTraveler
On the expression profile lane you can perform the following operations.
This function corrects the expression profile data.
Data correction functions include the following.
These data corrections are pipelined to execute multiple correction items in an arbitrary order.
The applied state of these data corrections is shown on the list displayed for each expression data file.
A statistical number of correction values of the results of these data corrections can be displayed in a list for each expression data file.
The distribution of the expression levels before and after the data correction can be displayed in a graph.
This function can be executed with the following edition.
AE, DS, GT
Lengths of probes of tiling arrays are usually shorter than genes, and multiple probes are designed at approximately equal intervals on genes and between genes.
In addition, the measured primary data is the expression level for each probe.
Therefore, focusing on one gene, multiple probes and multiple expression levels can be obtained there.
This function converts the expression level of the probe placed on the gene into the expression level of the gene.
This function can be executed with the following edition.
AE, DS, GT
This function can be executed with the following edition.
AE, DS, GT
Detect peaks from the expression profile.
Detection results are listed in the Peak List dialog.
The list can be saved as a CSV file.
When you click on the list, the main feature map will automatically scroll so that its position is centered.
This function can be executed with the following edition.
AE, DS, GT
Plot the correlation of expression levels between arbitrary arrays.
This function can be executed with the following edition.
AE, DS, GT
Clustering of gene expression levels.
Clustering results can be displayed as dendrograms and heat maps.
This function can be executed with the following edition.
AE, DS, GT
As for the presently imported expression level data file, the expression levels by feature are listed.
Features to display in the list dialog can be selected (multiple selections possible).
The list can be sorted with up to four sort keys per column.
This function is implemented in the following edition.
AE, DS, GT
Arithmetic operation between arrays is executed, and the result is displayed in the profile lane.
It is possible to calculate up to 3 arrays.
This function can be executed with the following edition.
AE, DS, GT
Local Pathway Builder is a tool that uses the information contained in the loaded genome-scale model to extend metabolic pathways one reaction at a time.
On the design of gene composition for recombination into heterologous host
Codon substitution
Composition automatic design
On the design of gene level
It is a function to check if the gene cluster is designed correctly.
Gene cluster sequence fragmentation design to reconstruct cluster by OGAB method.
Design homology arms for homologous recombination and PCR primer design for amplification of homology arms.
After designing insert sequences and homology arms, actually perform homologous recombination into the genome.
Change various parameter settings.
Description of Windows and Dialogs.
Dialogs that instruct execution such as analysis.
Dialog for setting optional parameters such as execution and display.
Dialog for displaying search results and analysis results.
in silico Assembler is de novo Assembler.
You can assemble DNA fragments of 50 bp or more.
It is mainly used for assembly of fragment reads from capillary sequencer.
Reads from NGS can also be assembled (up to 1 million reads, please use Velvet etc for larger size).
In the preprocessing, limited number of processing, designation of minimum QV, maximum N bases can be limited.
iSpider is a tool for downloading necessary sequence data files from various external sequence database servers.
By registering information of the external servers and the sequence data files acquired from it in advance, you can easily download the latest sequence data and save it in the specified local directory.
Although it is possible to execute the download immediately manually, it is also possible to specify the date and time to execute and download it automatically at that time.
Because it is possible to specify multiple files with regular expressions, in some cases you will download quite a lot of files.
Please do not load much load on the servers according to the rules of external server.
TaxiSpider is a sister tool of iSpider.
It is a tool to download all the indexes to the base sequence file and the amino acid sequence file from the specified site, but it is characterized by storing those array indexes in one big Taxonomy Tree like database.
The downloaded array data is stored on the corresponding branch of Taxonomy Tree according to its Taxonomy information.
However, it requires very large disk space locally.
As of December 2018, 47 GB of space is required.
ARM is a metabolic pathway editor / viewer.
There is an atom trace function, which can track carbon, oxygen and sulfur atoms.
You can search all single pathways from source compound to target compound.
However, since the compound and reaction data are old, the latest pathway is not included.
It is possible to add compounds and reactions.
You can combine the searched pathways automatically on the canvas and draw a synthetic pathway.
The following new functions are also added to the original function.
You can import quantitative data for each compound obtained from metabolome analysis, etc., and display the series on each compound (bar graph).
In addition, expression level data of enzyme genes obtained from transcriptome analysis can be displayed in series (bar graph) on each reaction.
ARM is software developed by Prof. Arita of the University of Tokyo (present National Institute of Genetics).
It is an interactive ARM interface.
You can run ARM's single pathway discovery function interactively.
ARM is software developed by Prof. Arita of the University of Tokyo (present National Institute of Genetics).
We convert NGS 's paired end read into fragments.
GenBank GBFF Expander is a tool to expand NCBI's GBFF format file containing a large number of genome sequences into individual genomic sequences and place them in the tree directory according to the Taxomy.
It is a prototype of TaxiSpider.
This function checks whether the GenBank EMBL format file, which is the main format of the genome base sequence file used in IMC, is correct, and corrects it automatically if there are errors.
Storing multiple GenBank EMBL files under one directory will check all the files there.
In addition, the inspection result file can be saved in a different directory.
Here we summarize the functions and operations of the genome Traveler.
It may overlap with description of each function.
GT's data management and project management is explained.
Pre-processing of NGS data is explained.
GT's de novo assembly is explained.
GT's NGS reads mapping is explained.
GT's Alignment Viewer is explained.
GT's secondary analysis is explained.
Explaining the scaffolding function of GenomeTraveler.