Positive selection on the nonhomologous end-joining factor Cernunnos-XLF in the human lineage

Background Cernunnos-XLF is a nonhomologous end-joining factor that is mutated in patients with a rare immunodeficiency with microcephaly. Several other microcephaly-associated genes such as ASPM and microcephalin experienced recent adaptive evolution apparently linked to brain size expansion in humans. In this study we investigated whether Cernunnos-XLF experienced similar positive selection during human evolution. Results We obtained or reconstructed full-length coding sequences of chimpanzee, rhesus macaque, canine, and bovine Cernunnos-XLF orthologs from sequence databases and sequence trace archives. Comparison of coding sequences revealed an excess of nonsynonymous substitutions consistent with positive selection on Cernunnos-XLF in the human lineage. The hotspots of adaptive evolution are concentrated around a specific structural domain, whose analogue in the structurally similar XRCC4 protein is involved in binding of another nonhomologous end-joining factor, DNA ligase IV. Conclusion Cernunnos-XLF is a microcephaly-associated locus newly identified to be under adaptive evolution in humans, and possibly played a role in human brain expansion. We speculate that Cernunnos-XLF may have contributed to the increased number of brain cells in humans by efficient double strand break repair, which helps to prevent frequent apoptosis of neuronal progenitors and aids mitotic cell cycle progression. Reviewers This article was reviewed by Chris Ponting and Richard Emes (nominated by Chris Ponting), Kateryna Makova, Gáspár Jékely and Eugene V. Koonin.


Background
Double-strand breaks (DSBs) are highly cytotoxic DNA lesions caused by ionizing radiation, spontaneous chromosomal breaks, activity of cellular endonucleases, or during replication of other DNA lesions such as singlestrand breaks. If unrepaired, DSBs efficiently trigger arrest of cell cycle progression and cell death by apoptosis [1]. In response to this danger, cells have developed mechanisms that repair DSBs. In eukaryotic cells, there are two major groups of DSB repair pathways [2]: homologous recombination (HR) and nonhomologous end-joining (NHEJ). In contrast to HR, NHEJ does not require a highly identical undamaged partner DNA strand to repair DSBs and, after some processing, can ligate virtually any two DNA ends. This makes NHEJ a very efficient, yet error-prone DSB repair mechanism.
The lack of mutation in known NHEJ components in a patient with characteristic phenotypic effects of defective NHEJ lead to the conclusion that there must be at least one undiscovered component of the NHEJ pathway [3]. The search for this additional element lead to the recent discovery of a new NHEJ factor called Cernunnos-XLF [4,5]. Homozygous Cernunnos-XLF mutations are manifested by autosomal recessive immunodeficiency associated with mental retardation and microcephaly [4]. This 2q35 gene encodes a protein that interacts with the core NHEJ ligation complex composed of DNA ligase IV and XRCC4 [5,6]. The Cernunnos-XLF protein shows similarity to XRCC4 [5] and is homologous to the Nej1 NHEJ factor from yeast [6]. The locus seems to be present in all animals, most fungi, but not in plants.
The presence of microcephaly in patients prompted us to look closely for evolution of Cernunnos-XLF in primates, because several other genes linked to microcephalyrelated disorders and brain size are under positive selection in hominoid primates and humans [7][8][9][10][11][12][13][14][15]. By comparing Cernunnos-XLF genes in five different mammalian species, we discovered strong evidence for adaptive evolution of this locus in the human lineage. Therefore, Cernunnos-XLF can be considered as yet another strongly selected factor, potentially contributing to increased skull and brain size in humans.

Conservation of Cernunnos-XLF in mammals
Human (CAI99410), cow (XP_586059), and dog Cernunnos-XLF (XP_848099) proteins and the corresponding coding sequences (CDS) were extracted from Genbank. The macaque and chimpanzee copies were assembled from the Genbank trace archive and genome assembly, respectively (see Methods). Except for dog, all the genes appear to encode 299 aa long proteins; the predicted dog coding sequence contains an additional domain at the 5' end. Since this domain is not conserved in other species, it very likely represents an error in automated gene annotation and we shortened the dog ortholog to the 299 aa segment that is homologous to the remaining mammalian proteins.
Comparison of individual mammalian copies revealed a variable rate of amino acid replacements along Cernunnos-XLF (Fig. 1). While synonymous changes are dispersed relatively uniformly, nonsynonymous changes are clustered in several domains (Fig. 1B,C). Most variable is the C-terminal part between aa 212-281. Another less pronounced variable region is at aa positions 87-99. This profile is similar to the protein conservation in vertebrates [5]. There are five nonsynonymous substitutions between the human and chimpanzee genes (four of them seem to be human-specific) and no synonymous changes. Interestingly, these five changes are unevenly distributed along the protein. Position 124, which changed in the human lineage, and the chimpanzee substitution at aa 127 are located within a conserved linker between the N-terminal globular head domain and the remaining coiled-coil part (Fig. 1D). Three other positions 216, 223, and 235 changed in humans, and cluster within the predicted end of a coiled-coil C-terminal domain [ Figure 1SA in ref 5].

Adaptive evolution of Cernunnos-XLF genes in the human lineage
The analysis of individual branches in the phylogenetic tree ( Fig. 2) revealed signs of negative selection (Ka/Ks < 1) on most branches, but the presence of five nonsynonymous and the lack of synonymous substitutions indicate possible positive Darwinian selection in humans and chimpanzees. Indeed, likelihood ratio tests confirm that the human and possibly also chimpanzee lineages evolved under different Ka/Ks rates compared to the rest of the tree (significant; Fig. 2). These results are robust even when one by one we discarded all individual changes (not shown). When both human and chimpanzee lineages were combined into one group, the resulting joined Ka/Ks ratio is above 1 (borderline significant) suggesting positive selection. Therefore, we can conclude that the Cernunnos-XLF locus evolved adaptively under positive selection in humans. Whether chimpanzees also experienced positive selection is unclear, but the rate of protein evolution seems to be lower compared to humans. Finally, we were also interested in how Cernunnos-XLF evolves in the recent human population. HapMap data indicates the lack of recent positive selection on Cernunnos-XLF [16]. However, given the presence of two nonsynomous and no synonymous polymorphic positions in the human population [17] we cannot rule out that some positive selection still operates on this locus.
As mentioned above, the amino acid replacements in the human and chimpanzee lineages are clustered and, as a consequence, adaptive evolution in Cernunnos-XLF appears to be concentrated in very specific regions. One hotspot is located in the region between the predicted Nterminal globular head domain and the long coiled-coil part (Fig. 1D). The second rapidly evolving region is located at the putative C-terminal end of the coiled-coil domain (not shown). The exact structure and function of these regions in Cernunnos-XLF is unknown, but in the case of the structurally similar XRCC4 protein, the head domain seems to interact with DNA and/or proteins while the coiled-coil region binds the linker connecting two BRCT repeats of ligase IV [18][19][20]. It is tempting to speculate that the adaptive evolution around the coiled-coil region is related to a putative interaction of this region with ligase IV and by extension to the proposed Cernunnos-XLF function: promoting the DNA ligation function of the XRCC4-ligase IV complex [4,5].

Cernunnos-XLF -another factor in human brain expansion?
Genome-wide comparisons have revealed that a significant number of protein-coding genes undergo adaptive evolution in humans [17,21,22]. Notably, the dramatic increase in brain size and complexity during human evolution was accompanied by accelerated, often positive, selection on several genes involved in regulation of brain size and the nervous system in general [7][8][9][10][11][12][13][14][15]. These genes include two primary microcephaly loci under strong positive selection in humans ASPM (abnormal spindle-like), and microcephalin/MCPH1; and possibly also other microcephaly-associated loci with an increased Ka/Ks rate in primates PAFAH1B1 (alpha subunit of platelet-activating factor acetylhydrolase 1B) and SHH (sonic hedgehog), although the latter two may be merely under relaxed constraints [12]. Adaptive evolution of Cernunnos-XLF thus fits the general pattern of simultaneous selection acting upon several microcephaly-associated genes in humans.  5]. The N-terminal globular head domain is marked in yellow, the remaining coiled-coil structure in blue, the putative nuclear localization signal is in red. The second scheme shows the positions of the coding exons (2)(3)(4)(5)(6)(7)(8) in the CDS (the odd exons are black and the even ones are white). B. CDS substitutions during evolution. The expected ancestral coding sequence was estimated using maximum likelihood codon reconstruction implemented in PAML. Nonsynonymous/synonymous (ω = Ka/Ks) ratios were free to vary in all branches. Positions marked in green correspond to synonymous changes in a given lineage. Bars representing nonsynonymous changes are black if conservative, red if nonconservative (see methods). "MCH-CH" corresponds to the ancestral lineage between the common ancestor of macaque, chimpanzee, and human (MCH) to the common ancestor of human and chimpanzee (CH), "anc-MCH" represents the lineage from the common ancestor of all taxons to MCH (see Fig. 2). C. Conservation at the nucleotide level in primates, and protein level in primates and mammals. The Y axis corresponds to the proportion of conserved (identical) positions in the CDS (a 60-bp overlapping window and 6-bp steps) and the protein alignment (window 20-aa, step 2-aa). D. Predicted structure of the Cernunnos-XLF protein. The structure for the first 185 aa was predicted by structural alignment to XRCC4 (see methods). The red parts highlight positions that are different between human and chimpanzee, aa 124 changed in the human lineage, aa 127 in the chimpanzee branch. Three other positions were changed in humans -aa 216, 223, 235 (see Fig. 2B).

Structure and evolution of Cernunnos-XLF proteins inmammals
How can the Cernunnos-XLF function in nonhomologous end-joining contribute to our brain size? It seems natural to assume that brain expansion should reflect an increased number of cells, and thus cell divisions during brain neurogenesis [15,23]. A direct extrapolation of this assumption is that the increased brain size could be achieved by an increased efficiency of factors involved in cell cycle progression, mitosis, or by preventing apoptosis. Consistent with this hypothesis are cellular functions of two strongly selected primary microcephaly genes ASPM and microcephalin. ASPM is a mitotic spindle protein that may participate in regulation of cell division during neurogenesis [24]. Microcephalin encodes a DNA damage response protein regulating the BRCA1-CHK1 DNA damage response pathway [25,26]. This suggests that microcephalin-linked primary microcephaly is related to cellular checkpoint defects causing increased cellular apoptosis in neural lineages [26]. Therefore, effective repair of DNA damage at cellular checkpoints is a prerequisite for efficient cell proliferation during neurogenesis, and adaptive evolution of microcephalin may reflect this requirement.
It appears that both functional homologous recombination and nonhomologous end-joining (NHEJ) are essen-tial during nervous system development. Inactivation of some NHEJ components, including ligase IV and XRCC4, in mouse causes apoptosis of post-mitotic neurons [27]. As a consequence, positive selection on Cernunnos-XLF may be related to the essential role of this factor in efficient DNA damage repair by NHEJ and, in turn, in preventing apoptosis in neuronal progenitors.
In summary, adaptive evolution of Cernunnos-XLF in humans fits into the broader scheme of microcephaly gene evolution in primates. On one hand, each positively selected gene operates at a different level: the spindle protein ASPM on the level of cell division, microcephalin by participating in DNA damage response during cellular checkpoints, and Cernunnos-XLF by direct involvement in NHEJ repair of damaged DNA. On the other hand, the phenotypic effect is similar -an increased number of neurons in the developing brain by either efficient cell proliferation (presumably in the case of ASPM) or prevention of apoptosis (microcephalin, Cernunnos-XLF).
While association of Cernunnos-XLF selection with increased brain size is attractive in the context of simultaneous adaptation of several brain size determinants, there Phylogenetic tree and Ka/Ka ratio forCernunnos-XLF coding sequences For the human and chimpanzee branches we could not calculate the Ka/Ks ratio and instead we list the number of synonymous (S) and nonsynonymous (N) changes in square brackets. The boxes list selected tested hypotheses. The Ka/Ks rate is designated as ω H for the in the human lineage, ω C for the chimpanzee lineage, and ω 0 for all other lineages. A single asterisk indicates P < 5%, χ 2 1 = 3.84, double asterisk indicates P < 1%, χ 2 1 = 6.63. The left box tests the hypotheses that the Ka/Ks ratio for the human lineage is the same as for the rest of the tree (rejected at P < 5%), and that both human and chimpanzee lineages have the same Ka/ Ks ratio shared with other branches (rejected at P < 1%). The right box shows tests Ka/Ks ≤ 1 for the human lineage (not significant) and for both human and chimpanzee lineages (rejected at P < 5%). B. Amino acid residues for five critical positions changed between human and chimpanzee. The tree also includes orthologous positions from the orangutan Cernunnos-XLF protein. The figure shows conservation of the critical positions in macaque and orangutan, which represents the most likely ancestral state. Four human and one chimpanzee changes indicated in the figure represent the most parsimonious scenario of Cernunnos-XLF evolution.
are also other possible explanations. Cernunnos-XLF deficiency is manifested by an increased susceptibility to infections due to immunodeficiency caused by impaired renewal of T and B cells [4]. Delayed reproduction in humans may require a highly efficient immune system that is able to fight infections during the prolonged prereproductive period of life. Another possibility is increased pressure on the general tumor suppression function of DSB repair in humans due to differences in reproductive cycle, changes in diet, lifestyle and/or exposure to mutagenic agents. Given its essential role in NHEJ, Cernunnos-XLF deficiencies may be associated with an increased cancer risk [4,5]. Indeed, the tumor suppressor BRCA1 is another, well studied DNA repair factor under positive selection in humans [28,29]. Moreover, tumor suppressor genes in general seem to evolve under higher Ka/Ks rate in humans [22]. While several possible explanations are possible, it is clear that the complete elucidation of Cernunnos-XLF evolution in humans will require better understanding of Cernunnos-XLF function and its impact on various cellular processes.

Conclusion
Cernunnos-XLF is a new component of the nonhomologous end-joining machinery mutated in human immunodeficiency with microcephaly [4,5]. Using newly obtained coding sequences in chimpanzee and rhesus macaque as well as dog and cow orthologs, we reconstructed the evolutionary history of Cernunnos-XLF in mammals. We found that Cernunnos-XLF is under positive selection in the human lineage. Hotspots of adaptive evolution are concentrated around the putative DNA ligase IV binding domain. After ASPM and microcephalin, Cernunnos-XLF is the third identified microcephaly-associated locus under strong adaptive evolution in humans and possibly played a role in the expansion of brain size in humans. We speculate that Cernunnos-XLF may contribute to the increased number of brain cell in humans by efficient double strand break repair, which helps to prevent frequent apoptosis of neuronal progenitors and aids mitotic cell cycle progression.

Reconstruction of the macaque and chimpanzee Cernunnos-XLF coding sequence
We used human coding sequence (CDS) as a probe for discontiguous Mega BLAST [30] searches against the macaque whole genome shotgun trace archive (Macaca mulata WGS). For all highly similar hits in the trace archive, the full-length trace sequences were aligned using BLAT [31] to the human Cernunnos-XLF gene, including introns, to ensure proper localization. The consensus sequence obtained from the alignment of individual trace sequences represents the expected macaque Cernunnos-XLF coding sequence. The predicted macaque CDS was covered by two or more sequences from the trace archive along its complete length (Fig. 3). The chimpanzee Cernunnos-XLF gene was obtained from BLAT [31] alignment of the human copy with the chimpanzee genome assembly, and the coding sequence homologous to human CDS was extracted.

Sequence analysis
Mammalian Cernunnos-XLF protein sequences were aligned using Dialign2 [32] and the alignment was visualized in GeneDoc [33]. Synonymous and nonsynonymous substitutions were obtained using SNAP [34]. Gonnet PAM250 matrix [35] was applied to classify substitutions as conservative or non-conservative. We considered changes to be conservative if the score was > 0.5. We used ancestral sequence reconstruction and the free ratio codon model in PAML v. 3.13 [36] to reconstruct phylogeny and estimate placement of substitutions along individual branches of the phylogenetic tree. The phylogenetic tree was drawn in TREEVIEW [37].

Detection of positive selection
Positive selection along individual branches was detected by likelihood ratio tests as described previously [38]. First, we compared the log-likelihood value for one-ratio and two-ratio models to detect possible different Ka/Ks ratios in individual lineages. To test whether these lineages evolve with Ka/Ks significantly >1, we compared the two ratio models with the Ka/Ks ratio set to 1 and with free (estimated) Ka/Ks for the lineages under consideration.

Prediction of the protein structure
Structural alignment of human Cernunnos-XLF protein to the DNA repair protein XRCC4 (1fu1) was performed using 3D-PSSM [39] and SWISS MODEL [40] servers, analogously to ref [5]. The predicted structure of the human Cernunnos-XLF protein was visualized in PyMOL [41].  There is much interest in identifying nucleotide substitutions that might underlie human-specific biology. Pavlicek & Jurka have undertaken an evolutionary analysis of Cernunnos-XLF and propose that this gene has experienced positive selection of one or more nonsynonymous nucleotide substitutions. As this gene is mutated in individuals with microcephaly, the authors propose a causa-tive link between brain enlargement and Cernunnos-XLF adaptive evolution.

Reviewers' comments
Pavlicek & Jurka base their proposal of adaptive evolution in chimpanzee and human Cernunnos-XLF upon 5 inferred nucleotide substitutions, all of which are proposed to have been nonsynonymous. The major issue in the authors' conclusion of positive selection is whether the extremely short branch lengths of chimpanzee and human sequences affect predictions. For example, approximately 20% of chimpanzee/human divergence is due to substitutions that are not fixed. If any one of the 5 substitutions were discarded, would the significance of these findings remain? Similarly, if the chimpanzee sequence were to be discarded would the predictions still hold? Reconstruction of rhesus macaque coding sequences  We cannot discard the chimpanzee sequence, because this sequence is crucial in defining human-specific changes. When we discarded the chimpanzee copy, the Ka/Ks ratio was not significantly different from the rest of tree. However, in this test we did not analyze Ka/Ks in the human lineage, but the long lineage from the common ancestor of human and macaque to modern humans. Figure 2

(now 2A) shows that most of the time this lineage was under negative selection.
It is probable that the issue of resolving power at the branch tips could be resolved by additional information, particularly of additional ape orthologous sequences.
Author response: We agree, but we could not reconstruct other full-length primate orthologs, as sequence traces are incomplete. The most complete is the orangutan Cernunnos-XLF copy, which lacks only one exon. However, since the only difference between human and chimpanzee are five nonsynonymous changes, we decided to concentrate on these five positions (newly added Fig 2B).

We would like to stress that all five changes between human and chimpanzee would be nonsynonymous no matter what method we use (supposing that the genomic sequences are correct). The only question is in which lineage they occurred. This additional orangutan data gave us confidence that reconstruction of the human-chimpanzee ancestral sequence was correct and that there is a high probability that four changes occurred in the human lineage and only one in chimpanzees.
Additionally, haplotype analysis of Cernunnos-XLF would be required to investigate whether positive selection has been ongoing in more recent times. Results from these approaches would have been appropriate to bolster the authors' proposal.  (Voight et al. 2006). The application of haplotype analysis for detection of selection in longer periods is limited. Indeed, in many cases  [16]. However, given the presence of two nonsynomous and no synonymous polymorphic positions in the human population [17]we cannot rule out that some positive selection still operates on this locus." Also, it is unclear why the authors have not taken advantage of the mouse and rat genome sequences, or pig and opossum ESTs, or even the unassembled sequences from rabbit, armadillo and elephant, which are provided from the UCSC's genome browser site. Would consideration of these sequences provide evidence to support, or otherwise, the authors' prediction? I would also advise a greater degree of scepticism in the manuscript. Causal relationships between microcephaly genes, their proposed adaptive evolution and brain enlargement cannot yet be accepted without the consideration of other possible explanations. For example, as with other microcephaly genes, Cernunnos-XLF is expressed widely and is not brain-specific (indeed its expression in the brain is not obviously elevated relative to other tissues). The chimpanzee brain appears not to have enlarged greatly since the last common ancestor with humans, and the three-fold enlargement of our brains only occurred in the last 3 million years. Timing initiation of Cernunnos-XLF adaptive evolution relative to physiological innovations would provide greater insights into these causal relationships than this manuscript can yet provide.
Author response: We agree with reviewers that the role of Cernunnos-XLF in human brain expansion is speculative. In the last paragraph of the Discussion we wrote that: "While association of Cernunnos-XLF selection with increased brain size is attractive in the context of simultaneous adaptation of several brain size determinants, there are also other possible explanations." Two such explanations are mentioned in the same paragraph. In the Abstract, we clearly stated that Cernunnos-XLF "possibly played a role" in human brain expansion. So far, little is known about its function. However, the involvement of another positively selected candidate gene microcephalin/ MCPH1/BRIT1 in DNA damage response (Lin et al, 2005) indicates that efficient DNA repair can be crucial in early brain development. In this context, the proposed selection on the repair factor Cernunnos-XLF is speculative, but in line with some current proposals on the mechanisms of human brain expansion.
Concerning the selection on chimpanzees, the mode of chimpanzee evolution is unclear. Given the single change encountered after the split with humans we cannot conclude if the chimpanzee locus is under selection or not. For this reason we used two tests that include and exclude the chimpanzee branch from positively selected lineages. Clearly the human lineage was under positive selection, and this result is robust even when we considered the human lineage alone from the rest of the tree including the chimpanzee branch ( Figure 2). The fact that the human lineage, not chimpanzee, exhibits a higher nonsynonymous rate is in fact in agreement with human, not chimpanzee, brain enlargement. It seems that our description was confusing; we changed the corresponding paragraph of Results to clearly state that the major part of positive selection happened in humans and is significant by itself. In the Abstract we speak only about adaptive evolution in humans.
Other comments p3 "Interestingly, these five changes are nonrandomly distributed along the protein." Either apply a statistical test or delete.
Author response: "Nonrandomly" was replaced by more accurate "unevenly", but we prefer to keep that sentence in the text, because it points out potential hotspots of adaptive evolution and interesting regions for functional studies.
p4 The Dorus et al. findings (11) relevant to SHH and PAFAH1B1 do not conclusively show that positive selection, as opposed to relaxed constraints, for example, has occurred. This should be made clear.

Author response:
We agree and the sentence was changed.
p4-5. In the summary, there is no caveat that these genes might not, after all, have evolved adaptively due to brain enlargement.
Author response: We agree. The last paragraph before Conclusions clearly indicates that other explanations are possible (e.g. "While several possible explanations are possible, it is clear that the complete elucidation of Cernunnos-XLF evolution in humans will require better understanding of Cernunnos-XLF function and its impact on various cellular processes"). Also in the Conclusions we clearly state "Cernunnos-XLF ... possibly played a role in the expansion of brain size". A similar sentence is used in the Abstract.
(5) It is unclear whether Figure 3 is required.
Author response: The macaque coding sequence is crucial for estimating the ancestral state before the split of humans and chimpanzees and therefore we prefer to include it in some form in the manuscript. The figure can be moved to a supplement, but since the manuscript is very short (and the journal is electronic), we decided to keep Figure 3 in the main text. The authors identified signs of positive selection during human evolution in the Cernunnos protein. It is interesting given that certain mutations in the human gene lead to microcephaly. The authors speculate that positive selection in the gene may have contributed to increased brain size evolution in humans.

Reviewer's report 2
However, as discussed in the paper, loss of Cernunnos activity also leads to immunodeficiency in humans and it is equally possible that positive selection acted to modify immune functions during primate evolution. To decide between these two possibilities it may help to check in large-scale comparative expression datasets (e.g. Science 5566:340-3) whether expression levels of components of the XRCC4-Ligase IV complex changed in humans in the brain.