From: Gene overlapping and size constraints in the viral world

Overlapping rate is negatively correlated to genome length. a Illustration of overlapping scenarios. The definition of overlapping in this study is restricted to the presence of two genes that overlap in their coding regions while the other parts of the gene are ignored (e.g., 5′ and 3′ UTRs, or intergenic regions). The same applies for the rare cases of viral genes with introns. We consider only pairs of genes that use different ORFs as overlapping genes. It follows that the first example gene (marked S1) overlaps only with Gene 1, while its “overlap” with Gene 2 that shares the same ORF (frame +2) is not considered (the later is considered a trivial overlap). The second example gene (marked S2) demonstrated that a single gene could participate in multiple overlapping events. The third example gene (marked S3) is not involved in any (non-trivial) overlapping event. The light pink marks the only segments of overlapping. For clarity, we identified each ORF by its own color. b A scatter plot demonstrating the negative correlation between genome lengths and overlapping rate in viral families. Both axes are in log scale. 13 families without any overlapping were filtered out (to allow the use of log scale, as had been done in the original work by Belshaw et al. [1] we replicated here ), leaving 80 families out of the complete data set of 93. The families are represented as ellipses, whose width and height correspond to the standard deviation of the genera within them (see Methods). The ellipses are colored by the partition of the families to viral replication groups (see Background). Spearman’s rank correlation: ρ = −0.59, p-value = 6.97·10E-9

