Reviewer's report 1
Kenneth H. Wolfe, University of Dublin, Trinity College, Dublin 2, Ireland (nominated by Nicolas Galtier, CNRS-Université Montpellier II, France). 21 May 2007
Martin et al. are mistaken. The answer to the question they pose in their title is "Yes". There are several different lines of evidence that all support the WGD hypothesis, and the interleaving pattern is one such line of evidence. I suppose that one could make the philosophical argument that in evolution it is impossible to absolutely prove anything, but the evidence for WGD is so strong that I think that any reasonable scientist would consider it proven at this stage.
Martin et al. place too much faith in the Hannenhalli-Pevzner algorithm. They ignore the fact that, even though their PGD model can explain the observed data using fewer rearrangement steps than the WGD model, the PGD model requires an exceedingly unrealistic pattern of rearrangements to happen. In order for their model to produce the observed pattern of double conserved synteny (DCS) interleaving, a bizarre series of nested inversions of pieces of DNA of progressively increasing size must have occurred, as I describe in Comment 1 below. There is no known evolutionary mechanism that can produce such a pattern, and in my opinion this makes the PGD model untenable, regardless of the results of the Hannenhalli-Pevzner algorithm. Parsimony is not just a question of how many steps are required, but also whether those steps are plausible events.
Author's response
We disagree with Dr. Wolfe's statements. As we will elaborate below, DS patterns of rearrangements under the PGD model are not "unrealistic" or "bizarre". The generation of these patterns under the PGD hypothesis does not require "a complex series of events to happen in an orchestrated manner" as the reviewer suggests. They can be considered the natural consequence of the chromosomal rearrangement process, very much as genome duplication and selective deletion of genes can be considered the engines of the WGD model. Even in cases that involve nested inversions in hot spots of rearrangement, they can be explained by known biological phenomena (see below). In fact, two or more simple steps of nested or overlapping inversions (reversals) will produce DS patterns at two ends of the segments that are being rearranged ( Fig. 6). This process is clearly more parsimonious and plausible than a rare WGD and many targeted deletions.
In our study, we use algorithmic operations (steps) of chromosomal rearrangement, duplication and deletion to compare the WGD and PGD hypotheses or to establish in which lineage the WGD occurred. All operations should be considered plausible events, as they all represent outcomes of known biological phenomena (e.g. recombination, mutation processes, etc). We use these operations to challenge the concept that DS patterns constitute irrefutable proof of the WGD model [11, 12]by using Ockham's razor ('Pluralitas non est ponenda sine neccesitate'), a principle of preferring simple explanations in hypotheses to complex ones. This principle is fundamental to scientific inquiry. Because the rearrangement process is a critical and complex problem [42], we use one of the most advanced algorithms known to date to reconstruct rearrangement operations in genome evolution. The algorithm has been used successfully for example to study genome rearrangement in mammals [26]. The issue is not to "place faith in the Hannenhalli-Pevzner algorithm", but to test the validity of WGD versus a null hypothesis of no WGD with this modern bioinformatic tool. It turns out that our analysis falsifies the WGD hypothesis and that under this light, the evidence of paleopolyploidization in yeast may not be so strong as Dr. Wolfe contends. It now behooves WGD supporters to find an improved rearrangement algorithm that does not falsify their model.
Comment 1. The PGD model requires a highly unlikely series of nested inversion events to have occurred in each interleaved region
Let us consider the genomic region used in Martin et al's Figure 1A, corresponding to Ashbya gossypii chromosome I genes AAL118C to AAL087C. Additional File 1 shows a multispecies view of this region using the Yeast Gene Order Browser [48]. My explanation of this pattern is that an ancestor of S. cerevisiae, C. glabrata, S. castellii and K. polysporus had a gene order essentially identical to what is currently seen in A. gossypii, K. waltii and K. lactis. After WGD in the common ancestor of the first four species, many genes were deleted. In S. cerevisiae this left the current paired region between chromosomes XV and XVI. The interleaving pattern was later disrupted slightly by a subsequent inversion of the region between YPL172C and YPL176C on chromosome XVI, as marked in Additional File 1. The left part of Martin et al's Figure 1A describes this scenario accurately, with 24–36 steps (depending on the size of the deletions), though I note that the 4 "terminal rearrangements" are actually reciprocal translocations.
The PGD scenario for the same region is summarized in Martin et al's Figure 1B. They say that it is more parsimonious because the total number of steps is only 23. But the steps are very strange. In Additional File 2 I show the details of the 19 rearrangement steps required in this panel (reconstructed using the GRIMM server, the same method used by Martin et al.). The 19 steps are a series of nested inversions, centered on gene 12 and getting progressively larger. They make a series of flip-flop sorting movements that gradually moves all the red genes to the left, and all the green genes to the right. Look at how gene 18 ends up beside gene 20. At step 7→8, the link between genes 18 and 19 is broken by inverting an eight-gene region (shown by the yellow bar). At the next step, step 8→9, a slightly larger nine-gene region that spans and extends from the original eight, undergoes reinversion so that gene 18 is placed beside gene 20, and gene 19 ends up in the growing red area on the left. In total, in the course of the 19 steps required under the PGD model, gene 12 undergoes inversion 19 times; gene 14 is inverted 17 times, gene 16 is inverted 15 times, and so on (see Additional File 3). I do not know of any evolutionary mechanism that can result in a nested series of inversions of progressively increasing size like this. If one made 19 random inversions in this region, it would be exceedingly unlikely that they would form a nested pattern like this. If the PGD model is correct, it implies that there were 55 (or more) central points, such as gene 12 above, that were continuously inverting and re-inverting, each time carrying a slightly larger region around them, in order to form the 55 interleaved regions that we now see [1]. Moreover, each of these whirlpools of inversion managed to perform the flip-flip sorting process without crashing into the next whirlpool further along the chromosome; otherwise we would see triple- or quadruple-interleaving patterns, but we do not.
Hence I consider the PGD model unparsimonious. It requires a complex series of events to happen in an orchestrated manner, with no obvious mechanism, to produce the synteny relationship between A. gossypii and S. cerevisiae that we now see. In contrast, the WGD model requires only one unlikely event (the WGD itself) followed by a lot of very simple events (gene deletions, and reciprocal translocations at random genomic sites).
On Discussion paragraph 4 (page 8) Martin et al. admit that it could be argued (as I do) that under the PGD model the double-synteny patterns "would ultimately arise as a consequence of a highly fortuitous and unlikely sequence of rearrangements". I do not understand the basis on which they then reject this argument as "frequentist" and fallacious, but they do say that "valid comparisons need to reconstruct the ancestral gene arrangement first". I show in comment 2 below that they have failed to reconstruct the ancestral arrangement correctly.
Author's response
Dr. Wolfe points out appropriately that under PGD and during rearrangement of A. gossypii DS block 7 (encompassing A. gossypii genes AAL188 to AAL087) a series of nested inversions centered on gene AAL107 (gene 12 of Additional File 3) occur. For accuracy, we would like to mention that Additional files 2and 3describe several different rearrangement operations besides inversions (reversals). In fact, the optimal transformation of DS block 7 into S. cerevisiae homolog regions involves 6 translocations and one fission, besides the first 12 reversals, and these operations are not all centered on AAL107. Dr. Wolfe then critiques the rearrangement patterns of nested inversions as being unnatural, stating there is no known evolutionary mechanism that can produce them. This is incorrect. Nested inversions are common, have functional roles in programmed gene rearrangement processes in bacteria [49], and occur even in large tracts for example around bacterial origins of replication [50]or plant chromosomal arms [51]The existence of 'hot spots' of rearrangements with one end in 'fragile breakage' sites and another in random locations may be common in mammalian genomes and the result of long regulatory regions and inhomogeneity of gene distribution [46, 52]. Besides mammals [26], the existence of fragile sites has been confirmed in other genomes, including Drosophila [53], and our knowledge of why recombination hot spots occur preferentially in certain genomic regions is expanding and suggest common mechanisms for their formation and function in eukaryotes [54]. In particular, hot spots in yeast are not distributed randomly and can be associated with transcriptionally active regions, nucleosome excluding sequences, and GC rich chromosomal regions [54]. Consequently, inversion "whirlpools" captured by the rearrangement algorithm at global scale may represent natural phenomena in yeast. Dr. Wolfe also questions the unlikely nature of having inversion whirlpools occurring in each DS block that could collide with each other to form triple- or quadruple-interleaving patterns. This concern is again unfounded. As we state in Methods, when restricting rearrangement analysis to chromosomal segments, patterns of shuffling (the whirlpools) may appear evident, but such patterns vanish or distribute as the analysis is extended to the entire chromosome. We thank Dr. Wolfe for bringing this subject to discussion because it induces clarification of the importance of genomic context. It is unnatural to perform rearrangement operations in a DS block, because the algorithm fails to use other genomic regions in more parsimonious rearrangement scenarios. Consequently, DS block-specific whirlpools are generally replaced by more global rearrangement patterns as one encompasses more genomic sequence around the DS block (culminating with the chromosome containing the block and then the entire genome). To illustrate this point, we analyzed DS block 7 in conjunction with neighboring blocks 6 and 8 (encompassing A. gossypii genes AAL127 to AAL031), together or in isolation ( Fig. 7). Original whirlpool patterns of individual DS blocks disappear and are replaced by more global rearrangement processes that involve genes in all three blocks. Rearrangement patterns for genes in A. gossypii chromosome I become even more global ( Fig. 8). In conclusion, whirlpools do not collide with each other and the PGD scenario, very much as the WGD model, involves many simple events that are plausible. In fact, two or more inversion steps can generate one or two DS blocks in regions that are terminal to the rearranged fragments ( Fig. 6).
Comment 2. The ancestral gene order inferred by Martin et al. does not make sense
In Figures 3, 4, 5, Martin et al. consider the evolution of gene orders in S. cerevisiae, A. gossypii and K. waltii from their common ancestor. To do this, they tried to infer the gene order that existed in the common ancestor, but they did not do this correctly. Consider their Figure 5, which shows the same region that I have shown in Additional File 1. In this region the A. gossypii and K. waltii gene orders are essentially identical (see Additional File 1); they are colinear (the only differences between them are the absence of a homolog of gene YOR264W in A. gossypii, and the absence of a homolog of YOR258W in K. waltii). Therefore, parsimony says that the common ancestor of A. gossypii and K. waltii had virtually the same gene order as each of these species has today. Indeed, the same gene order is also seen in K. lactis (Additional File 1). But Martin et al's Figure 5 shows an ancestor (chromosomal segment alpha) that is 15 steps different from each of K. waltii and A. gossypii under a PGD model. Their scenario implies that an identical series of 15 rearrangement steps happened convergently in both K. waltii and A. gossypii after they diverged from their common ancestor. This scenario is so unrealistic that I cannot believe that any of the results they derive from this ancestral order are accurate.
Author's response
The argument of the reviewer is again faulty and as discussed above relates to disregard for genomic context. The ancestral arrangements of genes belonging to DS blocks described in Figures 4 and 5(with segments labeled with greek letters) were reconstructed from an analysis of the entire A. gossypii chromosome I ( Fig. 3) and not from analyses of only genes specific to the DS blocks in question. The ancestral gene arrangements for individual DS blocks of A. gossypii chromosome I take into consideration rearrangement operations that are global. Consequently, the steps that better explain evolution of gene order in DS block 7 based on genes in chromosome I ( Fig. 5) are quite different when only using information in DS block 7-specific genes, explaining why the ancestor is 15 steps away both from K. waltii and A. gossypii and not less. No convergent evolutionary processes are needed to explain results because optimal rearrangement scenarios occur more naturally at chromosomal and not DS block levels.
Comment 3. Other evidence supports the WGD hypothesis and is incompatible with PGD
Let me mention briefly two other pieces of evidence, apart from the interleaving pattern, that support the WGD hypothesis and show how they are incompatible with the PGD model.
(i) In the S. cerevisiae genome, 551 pairs of duplicated genes are arranged in a pattern where a series of genes in one region of the genome has a series of paralogs in the same order in another region [1, 38]. Under Martin et al's PGD model, these 551 pairs originated as independent tandem duplications. Therefore, if we consider a large duplicated region such as "Block 34" which contains 13 duplicated pairs [1] in the same order on chromosomes VII and XVI, the PGD model requires that these identical orders are the result of convergent evolution. The PGD model required that after tandem duplication of each of the 13 genes, the 13 extra copies were moved to new locations and placed in the same order (and relative orientations) as their progenitors. How can that happen? Just by chance? It is extremely improbable, and the same pattern is seen in all the duplicated blocks, not just Block 34.
(ii) The conserved orientation of the blocks relative to the centromeres [1] indicates that they were formed from larger duplicated blocks (i.e. duplicated whole chromosomes) by reciprocal translocation. Under the PGD model, conservation of block orientation could only happen if a whole chromosome is subjected to a huge number of nested inversion events of the type described above, centered on one point (the centromere), so that the genes first become sorted into two monotonic subsets of the original order, and then a series of reciprocal translocations occurs. Such a model is not credible. In the last section of Discussion (titled "Gene orientation of duplicate gene components") Martin et al. discuss this issue but consider the orientations of individual genes rather than the orientations of blocks of adjacent genes. The orientation of individual genes is more liable to be affected by species-specific inversion of small segments of DNA after the WGD (e.g. the genes that are labeled in parentheses in Figure 2 of ref. [11]), and so should be less well preserved after WGD than the overall block orientation. Martin et al. state (in the same Section) that for S. cerevisiae homologs of genes on A. gossypii chromosome I, "almost half" of them show non-conserved transcriptional orientation with respect to the centromere. The relatively low level of orientation conservation of homologs of genes on A. gossypii chromosome I is, I believe, a special situation resulting from an unusual series of rearrangements of genes near the MAT locus. I find that when the whole A. gossypii genome is considered (520 syntenic-homolog duplicate gene pairs from Table S3 of ref. [3]), 710 of the 1040 S. cerevisiae genes (68%) are transcribed in the same orientation relative to the centromere as in A. gossypii (Additional File 4). This is significantly more than the 50% expected under the PGD model (P = 10-17 by Fisher test), so falsifying a model of independent gene duplications and relocations.
Author's response
Dr. Wolfe brings to discussion two additional lines of evidence that are often used in favor of the WGD hypothesis, both of which are related to the orientation (relative to the centromere) of duplicated A. gossypii-homologous genes that exist in S. cerevisiae (the 551 pairs listed in Additional File 4).
One argument, originally presented in the comparison between the S. cerevisiae and K. lactis genomes [1], is that conservation of order and orientation of duplicated genes in S. cerevisiae (defining blocks when using three pre-defined criteria) can only be explained by tetraploidy and reciprocal translocations under a WGD scenario. Under this argument, alternative scenarios (e.g. the PGD hypothesis) will require that gene order and orientation of duplicates, exemplified in genes of block 34 [1], result from convergent evolution. However, definition of blocks, which can be arbitrary, is unnecessary under our PGD model. Syntenic gene-interleaving relationships under the PGD model are the result of genomic rearrangements that occur more parsimoniously and in concert within the larger genomic context. These rearrangements are difficult to visualize without the help of rearrangement algorithms, sometimes involve local nested inversions ( Fig. 6), but more generally encompass a gradual rearrangement of genomic segments of different lengths within and between chromosomes. These processes do not need nested centromere-centered inversions followed by sorting in monotonic subsets, as the reviewer contends. These trivial explanations are misleading and do not portrait appropriately the complex rearrangement process. When considering gene duplications under PGD (arising by independent tandem or segmental duplications), gene orientation should not be made relative to other genes in a block because gene-interleaving patterns do not delimit the rearrangement process in a chromosome. Instead, gene orientation should be studied on a gene-by gene basis relative to the centromere and to corresponding homologues.
The other argument is statistical. According to the WGD hypothesis, duplicated genes in S. cerevisiae should preserve their orientation with respect to the centromere. Dr. Wolfe agrees that the low level of conservation of gene orientation of homologs of genes on A. gossypii chromosome I (only 46% of duplicated pairs preserve their orientation) does not support the WGD model, but believes this constitutes a special situation arising from the existence of the MAT locus in chromosome I. He then states that when considering all 551 duplicate pairs of the S. cerevisiae genome, 68% are transcribed in the same orientation relative to the centromere ( Additional File 4). Because this is significantly more than the 50% expected under a pure rearrangement model (not the PGD model as Dr. Wolfe contends)(P = 10-17by Fisher test), the reviewer correctly states this falsifies a model of independent gene duplications and relocations. However, the argument is misleading because the test relates only to the validity of the WGD scenario and should involve the null hypothesis of no orientation change in gene duplicates. What the reviewer fails to mention is that because 32% of genes do change orientation and given a sample size that will make expectations highly significant, the null hypothesis of no change will be rejected by a more significant P value (P = 10-113) by Fisher test, therefore rejecting the WGD model.
Reviewer's report 2
Austin L Hughes, Department of Biological Sciences, University of South Carolina, Columbia, SC 29208, USA (nominated by Eugene Koonin, NCBI, NLM, NIH, Bethesda, MD 20894, USA). 29 August 2007
Ohno [37] was the original champion of the hypothesis that whole genome duplication (WGD) has played an important role in evolution. Ohno [37] had some very odd ideas about the mechanism of gene expression that led him to believe that tandem duplication could never lead to anything productive. Today we know that Ohno was wrong about tandem duplication, which in fact is a ubiquitous feature of genomes. Moreover, through functional differentiation of duplicates, tandem duplication is clearly a major source – perhaps the major source – of evolutionary novelty [55]. Polyploid organisms are known, but it is unclear that polyploidization has ever given rise to any important novelty. Thus, the recent obsession with alleged cases of WGD on the part of evolutionary genomicists is puzzling to say the least.
As my colleagues and I have discussed extensively elsewhere, a significant problem with virtually all published claims of WGD is that in these studies the authors show evidence of patterns consistent with WGD but they do not conduct critical hypothesis tests [56]. To my way of thinking, what is distinctive about natural science (as opposed to other forms of human intellectual effort) is the use of a specific method (the "hypothetico-deductive method") that decides among competing hypotheses by formulating falsifiable predictions of each hypothesis [57]. In science, the null or starting hypothesis must always be the hypothesis of no effect. Thus, we must always approach the study of genomic evolution with the null hypothesis that no WGD has taken place. Only if the observed results are highly unlikely under the null hypothesis should we (tentatively) accept the alternative hypothesis of WGD.
The paper of Martin et al. is unusual in that it applies a rigorous hypothesis-testing framework to WGD in the case of the yeast S. cerevisiae. The approach is based on the assumption that the more parsimonious evolutionary scenario is more likely. This seems reasonable, particularly since WGD advocates typically make the claim that WGD is more parsimonious than hypotheses invoking independent events of segmental duplication because the latter duplicates many genes with one event. However, Martin et al. show that in fact a hypothesis of multiple segmental duplications is more parsimonious than that of WGD, given biologically reasonable assumptions.
Of course, a limitation of any such study is that only certain scenarios are compared, since the number of theoretically possible scenarios is very large. WGD advocates will perhaps claim that there may exist an unexplored scenario involving WGD that would be more parsimonious than the scenarios without WGD examined here. But the burden of proof is on the WGD advocates. Unless they can present actual proof that WGD is more parsimonious, the null hypothesis stands. The paper of Martin et al. is important because it provides a solid quantitative demonstration that WGD is not a plausible hypothesis in a species for which the hypothesis of WGD has rarely if ever been questioned.
Author's response
We thank Dr. Hughes for his comments and for placing our work into the right context, the Popperian hypothesis-testing framework. We apply this framework to an important question related to evolutionary change. Is genomic change gradual or saltatory? The question applies in our case to the yeast lineage and to levels of chromosomal duplication, but the theme is recurrent in biology.
We agree that not all possible scenarios are compared and that this limits our study. For example, our analysis proceeds by duplicating, deleting and then rearranging genes, in that order. However, there could be other more parsimonious explanations that would consider steps of duplication, deletion and rearrangement in any possible sequence. We hope a powerful algorithm and a formal distance measure can be devised in the near future that can handle this complicated and computationally intense problem.
Reviewer's report 3
Mikhail S. Gelfand, Institute for Information Transmission Problems, Moscow, Russian Federation.
The reviewer provided no comments for publication
Reviewer's report 4
Mark Gerstein, Yale University, New Haven, CT 06520, USA)
The reviewer provided no comments for publication