Overall, our data indicate that regions enriched with the positively charged amino acids of nucleotide-binding domains can indeed serve as genuine NLSs. These NLSs are integrated into domains, and their evolution might depend on the evolution of the corresponding domains. Such NLSs, even if they are present in prokaryotic proteins, can interact with karyopherins. Karyopherins have many functions in the cell and, in particular, can act as chaperones [26, 27]. The protein domains interacting with karyopherins might have evolved before the origin of the nuclear envelope, with these domains containing sequences that potentially play a role in NLSs. The presence of sequences similar to NLSs in DNA-binding domains of prokaryotic proteins might create an advantage for nuclear accumulation of these proteins during evolution of the nuclear-cytoplasmic barrier, influencing which proteins accumulated and became compartmentalized inside the forming nucleus (the content of the nuclear proteome). Proteins that did not harbor such integrated NLSs might have acquired them de novo after nuclear envelope formation, and such NLSs can be considered separate units of genome evolution. Interestingly, sequences that are similar to NLS can also be predicted and experimentally defined as being present in some cytoplasmic proteins of modern organisms. This indicates that during evolution, some proteins, albeit possibly resident inside nuclei due to the presence of an integrated NLS, were excluded from the nucleus via different mechanisms, as discussed elsewhere [28].
Reviewers’ comments.
Reviewer’s report 1.
Sergey Melnikov.
Reviewer comments:
I reviewed this manuscript in detail when it was submitted to Molecular Biology and Evolution. I recommended the authors to make numerous changes, and they addressed every single of my comments. I therefore have no reason to criticize this work any further. This study is important to the field as it shows that the nuclear localization signals in modern eukaryotic proteins could simply emerge from DNA−/RNA-binding domains of cellular proteins, because having a DNA- or RNA-binding domain is frequently sufficient for a protein to be recognized as a nucleus resident. This is an important finding and I encourage you to publish this work as is.
In this concise and thought-provoking manuscript, Olga Lisitsyna et al. investigate a central evolutionary enigma: the origin of the cell nucleus. The authors convincingly show that, in most instances, all that a protein needs to enter the cell nucleus is a DNA-binding domain. For instance, in their experiments with prokaryotic proteins, they show that – even in the absence of predicted NLS sequences – some DNA-binding prokaryotic proteins are actively transported into the cell nucleus (Fig. 1). This experiment, along with their analysis of NLS overlaps with DNA-binding domains in protein structures, suggests that NLSs have initially evolved from (and within) DNA-binding domains of chromatin-binding proteins – the conclusion that makes the perfect sense from the point of evolutionary contingency. Furthermore, in their supplementary data, the authors have collected a wonderful review of the experimentally identified and predicted nuclear localization signals. This information alone will be very useful for other scientists working in the field of the origin of eukaryotes and origin of the nucleus.
My only suggestion to the authors is to divide their data set of NLSs into two groups – experimentally-defined vs in silico predicted: when they describe their statistics on the % of NLSs overlap with RNA/DNA-binding domains, it seems useful to me to provide it first for the experimentally-defined NLSs (as a more reliable data), and then complement these numbers with additional data for in silico-identified NLSs.
Author’s response:
We thank the reviewer for the critical evaluation of our work and the positive feedback. Of course, we agree that results based only on analysis of experimentally defined NLSs should be more robust and reliable than those based on analysis of consolidated datasets (both experimentally defined and in silico-predicted NLSs). Unfortunately, the number of experimentally defined NLSs is not as large as necessary for the appropriate statistical analysis. Therefore, we used a dataset of NLSs, including both experimentally defined and in silico-predicted NLSs.
Reviewer’s report 2.
Igor Rogozin.
Reviewer comments:
The authors demonstrated that NLS and NLS-like motifs may be integrated inside nucleotide binding domains of both eukaryotic and prokaryotic proteins and may co-evolve with these domains. They proposed that there are NLS-like motifs inside prokaryotic proteins that may be functionally important.
The authors need to choose the theoretical framework. If the authors would like to operate within the framework of evolutionary biology, they cannot use sentences like: “We propose that the pre-existence of NLSs inside prokaryotic proteins dictated, at least partially, the nuclear proteome composition.”. Prokaryotes do not have nucleus thus they do not have NLS and those NLS-like sequences cannot “... dictated, at least partially, the nuclear proteome composition” (due to the absence of the nucleus). Those NLS-like sequences may have some functional roles, this is possible. Just an example, fragments of mobile elements (MEs) may be a part of promoter or protein coding regions. However I doubt that the “pre-existence” of MEs “dictated” regulatory pathways or functions of protein coding genes. According to Wojtek Makalowski it is something like scrap yard (Makałowski W. Genomic scrap yard: how genomes utilize all that junk. Gene. 2000, 259(1–2):61–7). I think that the authors need to use something like “prokaryotic sequences similar to NLSs or NLS like signals etc.” (if they are willing to operate within the framework of evolutionary biology). If the authors would like to operate within frameworks of alternative hypotheses, it is better to notify readers about that. Otherwise a careful correction of logic and language is required.
This structure: … However, it remains unclear how the proteins were selected for import into the forming nuclei, i.e., how the nuclear proteome evolved." The Methods section The Results section To address this question, we analysed data on NLSs and their localization relative to protein domains. .. does not look good to me. The question and attempts to answer are separated by the Methods section.
Author’s response:
We thank the reviewer for taking the time to review our manuscript and for providing these comments.
We substantially modified the sentence “We propose that the pre-existence of NLSs inside prokaryotic proteins dictated, at least partially, the nuclear proteome composition”. Our logic was based on the data presented as well as on some published results (references [15,16,17,18,19,20,21,22,23]), which indicate that the NLSs in modern eukaryotic proteins might have evolved from the DNA-binding domains of prokaryotic proteins. As a result, some DNA-binding domains are sufficient for interaction with karyopherins, and as a consequence, a protein may have had features of a nuclear protein before the origin of the cell nucleus. Of course, these features would not be useful before the origin of the nuclear envelope. Interestingly, sequences that are similar to NLSs can also be found in some domains of cytoplasmic proteins of modern organisms (Kharitonov A.V., Shubina M.Y., Nosov G.A., Mamontova A.V., Arifulin E.A., Lisitsyna O.M., Nalobin D.S., Musinova Y.R., Sheval E.V. Switching of cardiac troponin I between nuclear and cytoplasmic localization during muscle differentiation. Biochimica et Biophysica Acta – Molecular Cell Research. 2020. 1867(2):118601). We described this as follows: “The presence of sequences similar to NLSs in DNA-binding domains of prokaryotic proteins might create an advantage for nuclear accumulation of these proteins during evolution of the nuclear-cytoplasmic barrier, influencing which proteins accumulated and became compartmentalized inside the forming nucleus (the content of the nuclear proteome). Proteins that did not harbor such integrated NLSs might have acquired them de novo after nuclear envelope formation, and such NLSs can be considered separate units of genome evolution. Interestingly, sequences that are similar to NLS can also be predicted and experimentally defined as being present in some cytoplasmic proteins of modern organisms. This indicates that during evolution, some proteins, albeit possibly resident inside nuclei due to the presence of an integrated NLS, were excluded from the nucleus via different mechanisms, as discussed elsewhere [28]”.
We modified the first sentence of the “Results” section as follows: “To detect possible mechanisms of NLS origin, we analyzed data for NLSs localization relative to protein domains in modern organisms.”
Finally, it should be noted that the manuscript was edited by American Journal Experts to improve phrasing and remove grammar and writing errors.