Venn Diagram Functions and Operations
What is a Venn Diagram?
A Venn Diagram is a diagram used in set theory that shows the distribution of the number of elements between three sets (four sets can generally be drawn, but IMC only draws three sets or less). Elements belonging to each set are shown in a circle, and the overlapping parts are shown as common elements. It is used to illustrate the number of homologous genes between closely related genomes in biological applications.
Functions and Features
Using GenBank format files for the genome base sequences of three closely related species strains whose genes have already been identified, it determines whether all genes present on each genome are common genes, and draws a Venn diagram between three genomes or two genomes.
To determine common genes, an amino acid sequence search using NCBI Blast is used.
The criteria for determination are the Percent Identity and Overlap Length of the Blast results.
Sets with Percent Identity and Overlap Length equal to or greater than the specified value are defined as common.
Venn Diagram is drawn in color graphics.
Clicking on each number displays the corresponding gene list.
The drawing color can be changed.
Printing is possible.
Displays a list of common genes, etc., and outputs it to a file.
Number of genes common to three species and a list of them
Number of genes common to any two of the three species and a list of them
Number of genes that exist only in that genome and do not have genes common to the other two genomes and a list of them
Each list can be output as a CSV file.
Gene feature alignment is performed for common genes or unique genes.
By clicking each common gene or unique gene on the list, gene feature alignment is performed for each genome on MGV (Reference Genome Map) at that gene position.
Limitations
Venn Diagram function is implemented in IMC GE/AE. It is not implemented in IMCSE.
All genomes to be drawn in Venn Diagram must be loaded on MGV.
Operation
Preparation
Load genome base sequence files of two or more closely related species into MGV (=Reference Map).
The genome to be analyzed is loaded into MGV in order to perform inter-genome alignment.
Execution procedure
Click the Venn Diagram button.
The Venn Diagram Setting dialog box will appear.
A list of GenBank files loaded in the Multiple Genome Viewer (MGV) Map will be displayed.
Check up to three target genomes for the VENN Diagram from the list.
Click the Set button.
The Venn Diagram execution progress message will be displayed.
Depending on the size of the genome, it may take several minutes to several tens of minutes until processing is complete.
When processing is complete, the results screen will be displayed.
Tab switching operation
Click the top tab of the result display dialog box.
You can switch the displayed content.
Display of 3-genome common list
A list of common genes and unique genes that make up the Venn Diagram between the three genomes.
It consists of a total of seven lists, which correspond to the seven regions of the Venn Diagram between the three genomes.
List of common genes between 3 genomes (Bi-Directional Hit in the list above) (number of genes)
List of genes unique to genome 1 against genomes 2 and 3 (No Hit NC_000915 in the list above) (number of genes)
List of genes unique to genome 2 against genomes 3 and 1 (No Hit NC_000921 in the list above) (number of genes)
List of genes unique to genome 3 against genomes 1 and 3 (No Hit in the list above) NC_014560) (number of genes)
List of genes common between genome 1 and genome 2, but not common to genome 3 (NC_000915-NC_000921 in the list above) (number of genes)
List of genes common between genome 2 and genome 3, but not common to genome 1 (NC_000921-NC_014560 in the list above) (number of genes)
List of genes common between genome 3 and genome 1, but not common to genome 2 (NC_014560-NC_000915 in the list above) (number of genes)
List of genes common between 3 genomes
No.: Consecutive number of common or unique genes counted from upstream on genome 1.
Genome 1
gene: Name of gene annotated to gene on genome 1.
locus_tag: Locus_tag annotated to the gene.
protein_id: Example. This display item can be freely selected from the Qualifier list.
Start: Start position of gene on genome 1 that belongs to this common gene
End: End position of gene on genome 1 that belongs to this common gene
Strand: Strand on which gene on genome 1 that belongs to this common gene is carried
Identity: Amino acid residue matching between genes on genome 1 and genome 2
Genome 2
gene: Gene name annotated to gene on genome 2.
locus_tag: Locus_tag annotated to gene on genome 2.
protein_id: Example. This display item can be freely selected from the Qualifier list.
Start: Start position of gene on genome 2 that belongs to this common gene
End: End position of gene on genome 2 that belongs to this common gene
Strand: Strand on which gene on genome 2 that belongs to this common gene is carried
Identity: Amino acid residue matching between genes on genome 2 and genome 3
Genome 3
gene: Gene name annotated to gene on genome 3.
locus_tag: Locus_tag annotated to gene on genome 3.
protein_id: Example. This display item can be freely selected from the Qualifier list.
Start: Start position of the gene on genome 3 that belongs to this common gene
End: End position of the gene on genome 3 that belongs to this common gene
Strand: Strand on which the gene on genome 3 that belongs to this common gene is located
Identity: Degree of amino acid residue identity between genes in genome 3 and genome 1
Display list of common homologous genes between any two genomes
Three types of lists of common genes between two genomes are created, and can be opened by clicking on each tab.
No.: Sequential number of common or unique genes counted from upstream on genome 1.
Genome 1
gene: Name of gene annotated to gene on genome 1.
locus_tag: Locus_tag annotated to the gene.
protein_id: Example. This display item can be freely selected from the Qualifier list.
Start: Start position of gene on genome 1 that belongs to this common gene
End: End position of gene on genome 1 that belongs to this common gene
Strand: Strand on which gene on genome 1 that belongs to this common gene is located
Identity: Degree of amino acid residue identity between genes on genome 1 and genome 2
Genome 2
gene: Name of gene annotated to gene on genome 2.
locus_tag: Locus_tag annotated to gene on genome 2.
protein_id: Example. This display item can be freely selected from the Qualifier list.
Start: Start position of gene on genome 2 that belongs to this common gene
End: End position of gene on genome 2 that belongs to this common gene
Strand: Strand on which gene on genome 2 that belongs to this common gene is located
Identity: Degree of amino acid residue identity between genes on genome 2 and genome 1
Change drawing color and text color of each circle
The following drawing colors can be changed.
Change drawing color of genome 1
Change drawing color of genome 2
Change drawing color of genome 3
Change text color of numbers
Click each color box.
A color palette will be displayed.
Select the color you want to change.
The color in the palette will change.
Click "Show".
The drawing color change will be reflected.
Displaying the corresponding position of common genes in the reference genome map and genome alignment at the same position
Click each row in each common gene list.
The corresponding genes in each genome will be aligned and displayed in the Multiple Genome Viewer.
The gene positions on the Multiple Genome Viewer will be displayed and highlighted even for genes common to two genomes and unique genes that exist only in one genome.
Printing Venn Diagram
Use the button at the bottom of the window displaying the Venn Diagram to print.
Change page settings --> Click the Page Setup button.
Print directly to the printer --> Click the Print button.
Output print image as a [[PDF]] file --> Click the PDF button.
Output print image as a [[PNG]] file --> Click the PNG button.
Output print image as a [[EMF]] file --> Click the EMF button.
Output common and unique gene list CSV file
After selecting the tab for each gene list, click the CSV button at the bottom.
The output file specification dialog will be displayed.
Specify the file name and directory (folder) name.
The output file can be viewed and processed in MS Excel.
Loading and displaying the results of a previous VENN Diagram run
Click the VENN Diagram run button.
The VENN Diagram run dialog will be displayed.
Click "Read...".
The File Chooser will be displayed.
Specify the result file of a previous VENN Diagram run.
A confirmation message will be displayed.
Click "Yes (Y)".
The specified results will be loaded and displayed.
Algorithm
The amino acid sequences of the CDSs identified on the genomes to be compared are compared using the NCBI Blast homology search.
When comparing two genomes, a CDS on one genome is searched for amino acid sequence homology against all CDSs on the other genome, and if the top hit CDS matches the query CDS of the original genome, it is defined as a common gene.
However, if the specified conditions (Percent Identity and Overlap Length) are below the criterion value, it will be rejected.