Skip to main content

Advertisement

Fig. 1 | Biology Direct

Fig. 1

From: KGCAK: a K-mer based database for genome-wide phylogeny and complexity evaluation

Fig. 1

An brief overview for KGCAK functionality. a A tree built from 5-mer protein sequences of nuclear genomes based on kitsch method. b A tree built from 5-mer protein sequences of mitochondrial genomes based on kitsch method. c An example for genomic parameter result from DNA sequences in five nuclear genomes. "A Content", "C Content", "G Content", "T Content", "GC Content", and "Purine Content" represent the percentages of nucleotide A, C, G, T, G + C and A + G in T + C + A + G of genomic sequences; N Content means percentage of nucleotide N in T + C + A + G + N of genomic sequences. d An example for K-mer statistics from cDNA sequences in five nuclear genomes in terms of 5-mer. In particular, "Information Entropy" is defined as the Shannon information entropy calculated from a K-mer array and the formula is H = −∑Pilog(Pi), where Pi is frequency of each K-mer. "Distance to Even" indicates the summary of square of difference between individual element and global average value in the K-mer array. e An example for uniqueness ratio from DNA sequences in five nuclear genomes. f An example for frequency distribution from DNA sequences in five nuclear genomes in terms of 8-mer. g An example for genome-complexity-3D compared between entropy-DNA-10mer, genome GC content, and genome size from five nuclear genomes

Back to article page