We appreciate the reviewer’s comments from Prof Michael Gromiha, from Prof Narayanaswamy Srinivasan and from Prof Thomas Dandekar. We have revised the manuscript accordingly.
Reviewer 1 (First Round): Prof Michael Gromiha, Dept of Biotechnology, IIT Madras
In this work the authors have proposed an accurate homology based prediction method for identifying host-pathogen interactions. The approach has been tested with H. sapiens-M. tuberculosis PPIS and showed that the results are promising. Further, the occurrence of charged residues have been discussed. The paper is well written and the results are presented in detail.
1. The definition for homology should be discussed in terms of sequence identity, coverage etc.
Authors’ response: As we are using the BBH-LS software system for identifying homologous between different species, in the manuscript we use the definition of BBH-LS score threshold set as 0.01. As explained in our manuscript, BBH-LS uses a complex combination of sequence identity, coverage, and similarity of the genomic context to determine homology. So it is hard to give a straightforward definition. While it is possible to compute and provide sequence identity of the results determined at the BBH-LS score threshold of 0.01, doing so is very likely to mislead the readers on how the homologs were actually determined.
2. For the analysis of protein sequence based properties, it will be better to report the statistical significance.
Authors’ response: We have revised the manuscript by including the statistical significance by calculating the p-value based on Student’s t-test.
3. In the title Proteint should be corrected into Protein. Authors’ response: Thanks very much for pointing out the typo. We have revised the manuscript to get rid of the typo. We have changed the title of this manuscript into “Stringent Homology-based Prediction of H. sapiens-M. tuberculosis H37Rv Protein-Protein Interactions”
Reviewer 1 (Second Round): Prof Michael Gromiha, Dept of Biotechnology, IIT Madras
The authors addressed my comments.
Authors’ response: Thanks very much for your comments and suggestions that made our work better.
Reviewer 2 (First Round): Prof Narayanaswamy Srinivasan, India Institute of Science
Authors aim to predict protein-protein interactions across human and mycobacterium tuberculosis primarily using homology with human and pathogen proteins respectively in a dataset of host-pathogen protein-protein interactions (PPIs). Use of a database with experimentally derived host-pathogen PPIs as a template to predict human-Mtb PPIs is the main feature proposed by authors as new in this manuscript.
Authors’ response: Thanks very much for the comments.
I would like to draw the attention of the authors to the paper Mulder NJ, Mazandu GK, Rapano HA (2013) Using Host-Pathogen Functional Interactions for Filtering Potential Drug Targets in Mycobacterium tuberculosis. J Mycobac Dis 3:126. doi: 10.4172/2161-1068.1000126. In this paper Mulder et al. have used PATRIC database (which is also used by the authors of current manuscript) to predict human - Mtb PPIs. Mulder et al. have also performed enrichment analysis. It is important that Zhou et al. compare their work with that of Mulder et al. and highlight the new and important points in the manuscript.
Authors’ response: For the comments that Mulder et al. also use PATRIC, this may not be the case. We have read the paper very carefully and found that they predicted the human-mtb PPIs as follows, “Previously generated human and MTB intra-species functional networks were used. These functional networks were constructed by combining protein interaction data from the STRING database and complemented by additional interaction data from sequence and microarray data for the MTB network and by Bossi and Lehner’s interaction data, together with data from the REACTOME database for the human network, as depicted in Figure 1”. Obviously they are making the predictions using the different databases therefore this make the comparision less meaningful. In addition, we avoid referencing papers published by the OMICS Group. One of us (H. Zhou) actually just declined serving as an editor of the Journal “J Mycobac Dis” (where Mulder et al. published their work). The OMICS Group has the notorious reputation of producing some 250 journals without content and all of its journals charge high fee without any peer review. Refering to works on this journal may be harmful to the science community. As wikipedia says “An investigative report by The Chronicle of Higher Education stated that journal articles published by OMICS may undergo little or no peer review [[59]]. It was also suggested that OMICS provides lists of scientists as journal editors to create the impression of familiarity or scientific legitimacy, even though these are editors in name only and are not involved in the review or editing process [[59]]. Academics and the United States government, have questioned the validity of peer review by OMICS journals, the appropriateness of author fees and marketing, and the apparent advertising of the names of scientists as journal editors or conference speakers without their knowledge or permission. As a result, the U.S. National Institutes of Health no longer accepts OMICS publications for listing in PubMed Central and sent a cease-and-desist letter to OMICS in 2013, demanding that OMICS discontinue false claims of affiliation with U.S. government entities or employees”.
Right from the beginning of the manuscript authors refer the proposed approach to host-pathogen PPI prediction as “accurate homology-based”. I appreciate the determination and enthusiasm of authors to achieve high accuracy in host-pathogen PPI prediction, However, I think claiming their method to be “accurate” almost as the name of the proposed method is inappropriate especially before the accuracy of the results obtained is demonstrated/proved beyond any doubt. Authors may more appropriately refer their method as “proposed method” or something like that. However I leave it to the discretion of the authors.
Authors’ response: Thanks very much for the insightful comments. We have changed the title of this manuscript to avoid using the word “accurate”. Indeed, it is an excessive claim. We have changed the word “accurate” to “stringent” and change the title to “Stringent Homology-based Prediction of H. sapiens-M. tuberculosis H37Rv Protein-Protein Interactions”. And we have made this revision throughout the manuscript.
Introduction section (page 3): Authors state “ii) the differences between prokaryotic and eukaryotic proteins are not considered.” It is not clear what are the differences between prokaryotic or eukaryotic proteins? Are there any general points here? Any reference to support author’s point?
Authors’ response: The differences between prokaryotic or eukaryotic proteins have been reported in many papers and even classical text books. The major differences are listed in our manuscript already, post-transcriptional modifications, structures, etc. For the details of differences, please also refer to the following works, Nielsen et al. [[60]], Frye et al. [[61]], Chang et al. [[62]], Von et al. [[63]], von et al. [[64]], Kozak et al. [[65]], Hartley et al. [[66]], Springer et al. [[67]], Allfrey et al. [[68]], Neidhardt et al. [[69]], Schwartz et al. [[70]], Pestka et al. [[71]], Wallin et al. [[72]], Hartl et al. [[73]].
In page page 4 authors mention post-translational modifications and structure. While I agree that post-translational modification is a difference between prokaryotic and eukaryotic proteins, it is not clear how realization of this difference helps in predicting human-prokaryote PPIs. I don’t think that structures of homologous proteins from prokaryotes and eukaryotes are radically different.
Authors’ response: The differences in post-translational modification, protein structures, cleavage site, etc, may have influence in the interacting residues and interaction interfaces, which count a lot when transferring interactions from intra-species PPI to inter-species PPIs. Therefore, we made an improvement here in this work and it demonstrated a better performance.
If the authors depend on using experimentally derived host-bacteria PPI database as the template to predict human-pathogen PPIs then comment, in the spirit of general applicability of the proposed approach, on 1. the limitation of the size of template dataset. 2. Completion and accuracy of the template dataset 3. prokaryote-dependent host-pathogen PPIs (i.e., if prokaryotes in the template and the target are very different, such as Gram negative and Gram positive, what is the specific advantage of using host-pathogen PPIs as the template?)
Authors’ response: In the revised manuscript we discussed the limitation of the size, completion and accuracy of the template datasets. As currently the template datasets are very limited and we have already tried our best in finding the most abundant source of human-bacteria PPIs. The major limitation of our stringent homology-based approach lies in the fact that there is a limited amount of source eukaryote-prokaryote PPIs available currently. However, with the rapid advance in technology and the community’s increasing interest on host-microbe interaction studies, the eukaryote-prokaryote template PPIs will be much more abundant in the future. This should greatly facilitate the application of our stringent prediction approach to many host-pathogen systems in the future. It is a very insightful comment on the differences between the pathogens, say gram negative and gram positive. If the pathogens have drastic differences in their proteins (primary sequences, tertiary structure, interaction interfaces, and interacting residues, etc), then they will be less likely to be identified as stringent “homolog” in our approach as we are using the BBH-LS system. BBH-LS takes the origin and phylogenetics distances between two prokaryotes into account, as their genomic context will be calculated when identifying the homologs. Therefore if there are huge differences between one of the gram negative prokaryotic proteins and one of the gram positive prokaryotic proteins, they will unlikely to be reported as homologs in our stringent homology-based approach.
Page 11: Paragraph under the section “Analysis of sequence properties of proteins involved in host-pathogen PPIs”. Authors seem to believe that sequence properties such as length, number of domains and degrees of domains will be different for proteins involved in intra-species interactions compared to those involved in inter-species interactions. What is the basis for this assertion? if this is correct what about proteins involved in both intra-species and inter-species interactions? Authors present some results on this in Tables 9 and 6. But the results are critically dependent on the accuracy and completeness of both predicted and experimentally determined inter-species and intra-species PPIs respectively. The main problem for me here is that I am unable to identify the scientific basis to expect differences in the sequence features of proteins involved in intra-species and inter-species interactions. I am also of the impression that only very small proportion of proteins are likely to be involved in exclusive intra or inter-species interactions. Most proteins (especially in the host) are likely to be involved in both inter and intra species interactions.
Authors’ response: We are not assuming the sequence properties such as length, number of domains and degrees of domains will be different for proteins involved in intra-species interactions compared to those involved in inter-species interactions. On one hand, this section of analysis in the manuscript was just conducted to see if there is anything special for the proteins involved in the inter-species PPIN. From the results we get from the analysis, we are also surprised at the findings, but there is no assumption or assertion here in this section. We have simply discovered that those properties are different for the proteins involved in inter-species PPIN comparing with the proteins involved in intra-species PPIN. Sorry for the confusion, but the proteins we were conducting the analysis are exactly the proteins involved both in inter-species and intra-species PPIN, as long as the proteins involved in the inter-species PPIN, we will take them out and label them as proteins involved in inter-species PPIN. Any remaining proteins involved in intra-species PPIN will be labeled “proteins involved in intra-species PPIN.”
Pages 15: Authors use the term “interaction strength” to refer the number of times the prediction of interaction between a host protein and a pathogen protein is made. Traditionally the term “interaction strength” refers to how tightly two proteins bind physically. Authors may want to use a more appropriate term such as “measure of reliability” or “consensus score”.
Authors’ response: Thanks very much for the comments. We have revised the manuscript throughout, we have replaced the term “interaction strength” with “consensus score” to avoid the confusion.
In page 15 authors claim that their proposed approach is more efficient than the conventional approach simply because their proposed approach predicts more number of interactions than the conventional approach. I feel this is inappropriate. I feel so because unless the accuracy of predicted interactions in the proposed approach is clearly quite high and is better than that of conventional approach it is inappropriate to refer it as “more efficient”. What in case much of the predicted interactions are wrong? Under such a circumstance there is no meaning to predicting higher number of interactions.
Authors’ response: Here the term “efficient” is just describing the fact that stringent homology-based approach is using less templates but predicting more inter-species PPIs comparing with that of conventional homology-based approach. The evidence supporting the claim that our stringent homology-based approach is more accurate comparing with the conventional homology-based approach are listed in section “Cellular compartment distribution”, “Disease-related enrichment analysis”, “Functional enrichment analysis”, and “Pathway enrichment analysis”. All these results show that the human-mtb PPIN predicted by our stringent homology-based approach are more plausible, as they have more functional relevance to this pathogen’s infection.
Reviewer 2 (Second Round): Prof Narayanaswamy Srinivasan, India institute of Science
I do not want to discuss the reputation of a journal or a publishing group in this platform. However the article by Mulder NJ, Mazandu GK, and Rapano HA is a freely available document in the internet. Also a simple pubmed search shows a few other articles in this area by same or overlapping set of authors in other journals.
Authors’ response: Thanks for the suggestion. We don’t wish to ignore the contribution of those authors to the community. But we also wish to avoid discussing of work from that journal.
While I agree with the point that the proposed method is not very similar to that proposed by Mulder NJ, Mazandu GK and Rapano HA, “a right answer looks right whichever way you approach the problem” adding confidence to predictions made. I still feel it is important to address this point. However it is only my opinion and I leave it to the discretion of authors. Regarding author’s response to other comments I am OK with most of them. Though I do not entirely agree with authors on their analysis of sequence features of proteins involved in intra-species and inter-species interactions, I do not see it as a major problem. After all it is author’s paper - not mine!
Authors’ response: Thanks very much for the appreciation of our effort both in the manuscript and in the revision, we are very grateful to your comments that made our work better. For the analysis of sequence features of the proteins both in inter- and intra-species PPIs, it is still a very initial and it hasn’t been attempted by other groups before. It still needs lots of improvements at the current stage, but we believe that reporting this analysis here in this work is very beneficial for other scientists in the field to follow up with similar analysis and also introduce improvements on this analysis. This may eventually lead to more exciting discoveries.
Reviewer 3 (First Round): Prof Thomas Dandekar, Biocenter, Am Hubland, University of Würzburg, Würzburg, Germany
Hufeng Zhou et al. report on accurate homology-based prediction of H.sapiens M.tuberculosis H37v proteint-protein interactions. Summary comments: - The paper presents a lot of data, applying in part techniques originating from the authors themselves, requiring to asses then again the performance of these techniques according to these earlier papers. Furthermore, the quality of the results needs to be assessed. - A major question is of course which of these predicted interactions do really happen in M.tuberculosis infection? In the view of this reviewer, the paper does not really answer these questions with sufficient clarity and certainty, so that the results, though a lot of different tables and interactions, are not yet really useful to the reader. Please revise the paper (major revision) according to the detailed comments below - then the power and impact of the paper will be much higher.
Authors’ response: Thanks very much for the comments. We have revised the manuscript according to the reviewer’s comments and also provide a point to point reply listed below. According to our knowledge we have sufficiently assessed the results according to the latest technologies and available data allowed, although we do bear in mind that our validation is insufficient due to the missing of gold standard Human-M. Tuberculosis, and that is the limitation we realized and trying to improve in the future work on this project.
Title “Accurate” is not what is delivered, we get lots of predictions, the whole approach is bound to get many over-predictions and detailed functional analysis of the predictions happens only at very few places in the manuscript. Currently a title such as “Abundant homology-based over-prediction of H.sapiens M.tuberculosis H37Rv potential protein-protein interactions by two different methods” would be more appropriate. Furthermore, already in the title is a typo, remove the t after “proteint”, otherwise this even more astonishes the reader in the context of “accurate”.
Authors’ response: Thanks very much for the comments. We have revised the manuscript to get rid of the typo. We have changed the title of this manuscript to avoid using the word “accurate”, indeed, it is an excessive claim. We changed the word “accurate” into “stringent” and change the title into “Stringent Homology-based Prediction of H. sapiens-M. tuberculosis H37Rv Protein-Protein Interactions”
Abstract: Should be adapted after revising the whole paper.
Authors’ response: We have revised the abstract accordingly.
Background: An important point and very useful to get a reasonable paper from your study is to define what you mean by “interaction”. This reviewer first assumed that you primarily wanted to predict a direct protein-protein interaction, in other words something that you can later directly experimentally verify, e.g., by immune precipitation, crosslink etc. If you instead just mean functional interaction, e.g. when you speak about receptor-hormone interactions involved in infection response or look at early and late gene expression in infection or effects on transcription factors then it is far more difficult for the reader to see, how far your list can help as of course there are far and close connections of such functional interactions and you never define how far then the functional interaction may still be and to what level of certainty you want to give your different interactions.
Authors’ response: We highly appreciate the wonderful comments on the types of PPI. This should be clearly defined at the very beginning of the manuscript. As a matter of fact, we are actually predicting the direct physical interaction in a very stringent way, as the source database are primarily experimental physical interaction data and we use homology to stringently transfer the interaction data to the human-mtb system. In the revised manuscript we explicitly state this in the following words: “In this work, we only focus on the direct physical protein-protein interaction (PPI), therefore all the PPIs mentioned in this work are direct physical protein-protein interaction.”
By the way, the papers you cite 7–11 are all from a bioinformatical “large-scale screen take it all” corner (Srinivasan group, Wuchty) it will significantly broaden the perspective if you include also some experimental papers which really delineate a host-pathogen interaction and the involved proteins - this then gives you also an opportunity to clarify which definition (direct or indirect, more functional protein-protein interaction) you want to follow more in the rest of your paper.
Authors’ response: Thanks very much for the comments. We cite the works of Srinivasan et al., Wuchty et al. and so on (references 7–11) mainly because they are the representative work of conventional homology-based approach. Here, no matter conventional homology-based approach or stringent homology-based approach are all computational prediction approaches. In this work, we are predicting the direct physical interaction in a very stringent way, as the source database contains primarily experimental physical interaction data and we use homology to stringently transfer the interaction data to the human-mtb system. However, experimental approaches are out of the scope of this work.
Methods Maybe call the first part “overview” so that the reader better understands what happens in the first paragraph.
Authors’ response: The first part is called background, which is specified by the journal format requirement. We do not think we have liberty to change it.
Prediction of host-pathogen PPI networks
Please remove the term “conventional homology-based prediction” as this suggests that this is the typical way to badly over-predict protein-protein interactions. Please remove the term “accurate homology-based prediction” as this suggests that this is the correct way to again grossly over predict physical protein-protein interactions between host and pathogen. Rather be neutral in both cases and call it according to what has really been done in both cases: intraspecies homology-based prediction instead of “conventional” and interspecies homology-based prediction instead of “accurate”. Furthermore, then the reader knows, both are computer-based homology assumptions and knows, ok, here are many over-predictions.
Authors’ response: That is an insightful suggestion. However, using the new term of “intra-species homology-based prediction” and “interspecies homology-based prediction” may not be the best way of naming the different kinds of homology-based approach. In fact, it may make things worse: the naming may cause more confusion than convey a clear idea to the reader, as both homology-based prediction approaches we are discussing here are actually making inter-species PPI prediction. However, we have removed the word “accurate” and changed the “accurate homology-based ” prediction approach to “stringent homology-based” prediction approach.
It may also be worthwhile to recheck if there is no large-scale M.tuberculosis interactome study available, so that you have a better basis for the first set of homology-based predictions. Similarly, there is a lot of literature from experiments available describing real and direct interactions during the course of infection with M.tuberculosis and it is these data that you should be really after if you want to predict with higher accuracy the real protein-protein interactions in the infection.
Authors’ response: At the time of this work, we conducted a comprehensive work on the literature survey and was very sure there was no large-scale human-mtb interspecies host-pathogen PPIN available. It is true that intra-species large-scale M. tuberculosis interactomes are available with unknown quality (we did a comprehensive analysis on the available intra-species large-scale M. tuberculosis interactome, please refer to our BMC Genomics paper “Comparative analysis and assessment of M. tuberculosis H37Rv protein-protein interaction datasets”). However, our proposed “stringent homology-based prediction approach” only takes the inter-species eukaryote-prokaryote PPI data as the source PPI to make the predictions. Therefore, we are looking for large-scale human-mtb interspecies host-pathogen interactomes rather than rather than mtb intra-species interactome.
p.6 BBH-LS is your own algorithm and it will be nice to the reader if you explain in a few sentences how it works, in particular how then sequence similarity is measured, by which algorithm, and how gene context is taken into account. Explaining this will also increase reader confidence into your large-scale data.
Authors’ response: Thanks very much for the comments. It is really nice suggestions to explain more on the BBH-LS algorithm, and that will help to increase the reader confidence of our prediction approaches. However, BBH-LS algorithm is actually not our algorithm. The authors of BBH-LS developed their algorithms independently without any involvement from us. We get to know and used this algorithm through their publications on BMC system biology “BBH-LS: an algorithm for computing positional homologs using sequence and gene context similarity” by Zhang et al. But I strongly agree that BBH-LS algorithm needs to be explained more in our work to increase reader’s confidence. Therefore we have revised the manuscript accordingly “To identify the homologs between M. tuberculosis H37Rv and the 10 bacteria (in our stringent approach) and also the between M. tuberculosis H37Rv and H. sapiens (in the conventional approach), we use the BBH- LS algorithm which computes positional homologs using both sequence and gene context similarity [[18]]. BBH-LS is an effective and simple method to identify the positional homologs from the comparative analysis of two genomes, it integrates sequence similarity and gene context similarity in order to identify highly accurate orthologs [[18]]. This method applies the bidirectional best hit heuristic to a combination of sequence similarity and gene context similarity scores [[18]]. When applied BBH-LS algorith to the human, mouse, and rat genomes, it produced the best results when using both sequence and gene context information equally and compared to the other classic algorithms, (like MSOAR2) BBH-LS can identify more homologs with less false positives [[18]]. BBH-LS is considered to be a more accurate way of identifying homologs than other approaches which do not consider both the sequence and gene context similarity. The BBH-LS strength threshold β in this work is set as 0.01.”.
p.9 PPIN - be so kind to remind the reader that this means protein-protein interaction networks
Authors’ response: We have revised this part of manuscript accordingly. “Based on primary sequence analysis and topological analysis of the predicted host-pathogen protein-protein interaction network (PPIN).”
p.9 Analysis of sequence properties of protein involved in the host-pathogen PPI.Here as well as in the corresponding parts of the results the rational behind this section remains dark. If I already have a list of over-predicted protein-protein interactions with many proteins only indirectly affected by the infection process,what do I learn from the domain or sequence property distribution in this? Would it help to understand that M.tuberculosis has a certain percentage of hydrophilic residues? The same applies to the domain lists. In the latter I agree that they can be interesting but only, if you spend more time in explaining and discussing their actual function and functional context for real examples in the PPI networks you found.
Authors’ response: We utilized stringent homology-based approach accurately identifying homologs between mtb and other bacteria, and accurately transfer the experimental physical host-pathogen PPIs to predict the host-pathogen PPI in human-mtb system. Therefore, our results largely capture the possible human-mtb physical direct host-pathogen that most experimental approaches can detect. If experiments are applied in the mtb and human system, I believe all of our predicted human-mtb PPIs will be captured by the experiments. Moreover, the human proteins we analyzed in our human-mtb physical direct host-pathogen PPIs are actually the same human proteins involved in the human-bacteria inter-species PPIN from the source experimental data. Therefore we are analyzing the sequence and topological properties of an accurate dataset. The sequence and topological properties of host and pathogen proteins respectively as reported in this work, will be highly interesting and useful in this field of host-pathogen PPI studies, as those properties can used to differential the intra-species and inter-species PPIN, can be used in improving the prediction algorithms, and can be used in assessment and verification of predicted host-pathogen PPIs.
p.10 Calderwood et al. is cited here as “the first” to analyze protein-protein interaction networks - please explain in what sense you think this is true, very probably Barabasi group was the first to analyze topological properties of interaction networks and such a citation should be given here or yours better explained.
Authors’ response: Calderwood et al. are the first to analyze the topological properties of the human proteins in host-pathogen PPIN, as thorough literature survey shows they coined this kind of analysis as early as 2009, and the second study conducted on the same analysis in 2012. Many groups and labs have reported the topological properties analysis results of *general* intra-species PPIN, not those involved in host-pathogen interactions. This is a subtle point. Nonetheless, we agree that claiming the Calderwood et al. work as the “the first” is not appropriate. We have revised this part of the manuscript accordingly.
Similarly, the statement “this work is the first-ever study that examines the intra-species PPIN topological properties” seems not convincing. You can cite work from the Guthke group (HKI Jena, Germany) just to have one concrete example of someone who did this before, but if you really check the literature I am sure there are other groups who already examined PPIN topologies between host and pathogen well before, for instance, if you think about the zig-zag model of Jones and Dangel in plant infection, this is also a topological description, right?
Authors’ response: Thanks very much for the comments. Sorry for the confusion, we are not claiming ourselves to be the first of intra-species PPIN topological properties, but we are indeed the first to exam the intra-species PPIN topological properties of mtb protein involved in host-pathogen PPIN. That is, while there are many groups who work on intra-species PPIN topological properties before, no one before us has ever reported the intra-species PPIN topological properties of mtb protein involved in host-pathogen PPIN. Sorry for the unnecessary confusion, and we have also revised the manuscript accordingly on this part, by deleting the word “first”, although we are really the first on this small part of analysis.
Results
Here again it is a good idea to phrase all results throughout more carefully, then even a more sceptic reader (in the moment my duty as one of the reviewers, my apologies) will be more convinced on the quality of the data presented. p.12 “... then the predicted results are solid.” - rephrase: “... then we can be more certain about the quality of our results” Unfortunately that is not at all the case for the poor reader, as you forgot to mention at the Background Section of the paper which type of PPI you want to predict. For instance if you want to predict direct physical interaction, then all compartments that have nothing to do with direct pathogen interaction of the host cell have to be completely removed from your prediction lists as they are clearly wrong predictions.
If on the other hand you just think of “functional connection” you should give some score for your prediction at the very least (for instance, you could choose either the different p-values or the different “interaction strength” you find in your calculations or both as such a decision score). A real sceptic would tend to say, leave the study all together because looser functional connections of the infection process are already very clear from the accumulated literature and it makes it only worse to assemble this in lists where you maybe leave some loos connected functions even OUT by your approach (opposite mistake, here mentioned for the first time by this referee in his comments and easy for you to check: Look at the gene expression data, there are many other genes which change during early or late infection, hence they are loosely connected to infection, but never turn up inyour selected list of homology-based predictions): So please, if you want to push the second direction, give a score on your different predictions to make them meaningful for the reader.
Authors’ response: Thanks very much for the comments. I have revised the manuscript to clearly define the definition of PPI in our work, which is actually “direct physical interaction”, in the Background Section. Since we are not predicting functional associations, there is no need to give out the score of each association.
The suggestion of removing compartments that has no obvious interaction with pathogen has to be discussed more carefully. (1) there may be many cellular compartments that actually interact with pathogens in reality but we do not yet know about them. For example, some research shows pile protein from E. coli may have interaction with proteins involved with apoptosis; therefore this pile protein interacts with human proteins that previously have no direct and obvious functional correlation until compiling experiments prove that. Especially at this stage of host-pathogen interaction studies, many things are still unknown; therefore we can not easily get a conclusion that some compartments that seem not obvious to directly interact with pathogens are wrong and needed to be removed. (2) Cellular compartment annotations are not complete and thorough. And some proteins may have been annotated with several terms. More over, some terms annotated to certain human proteins may not exactly indicate that these human proteins locate only in these compartments; it is also possible that these specific human proteins locate in some other cellular compartments as well. And we have also moderated some the sentences in the manuscript. For example, we have revised sentences like, p.12 “... then the predicted results are solid.” revised into “... then we can be more certain about the quality of our results”.
p.13 the “transcription factor complex” category is a good example for loosely connected interaction - clarify, give score if you want to mention such interactions, clearly remove if you are after direct physical interactions (then only true for a virus, there several of its proteins directly interact with transcription and translational machinery of the host, e.g. in the HIV example you mention).
Authors’ response: See Discussion in the previous point.
p.13., second part: “proteasome degredation” is believable, I fully agree and here you also really go after the functional connection of the interaction, bravo! If you would remove all over-predictions and only go for some protein-protein interactions new predicted which have a high probability to be direct host-pathogen interactions in infection or be functionally implied with a really high score, then you have achieved what your paper intends to be about: accurate host-pathogen protein-protein interaction prediction!
Authors’ response: Thanks very much for the comments. Actually intend to predict direct physical interactions, and we start from the direct physical interaction of human-bacteria PPI to accurately infer the possible direct physical interaction between human and mtb.
p.14, middle: comparison of your two homology-based approaches: After rephrasing them as suggested above, your statement becomes also more fair and makes also technically sense.
Authors’ response: Thanks very much for the comments. We have revised this part of the manuscript to make the statement more modest and humble.
p.14,15,16 including nuclear hormone receptors: To this reviewer this part seemed a superficial analysis, here the reader would need detailed analysis given from you to understand which interactions make really functional sense and are worth while checking experimentally. If you add this, then the paper becomes really useful and interesting. “nuclear hormone receptors regulate innate immunity response” - so here you tend to refer to functional interactions, so: give a comparative score, analyze individual interactions, and stress exactly those which are new, have not been reported for this interspecies PPI, and would be worthwhile to be pursued experimentally. This applies even more so for the intra-species homology-based approach.
Authors’ response: Here in this part of the manuscript we are using the Functional enrichment analysis of proteins involved in direct physical host-pathogen PPIs. Although we are really predicting the direct physical interaction, in many cases, the real direct physical interactions are always correlated with strong functional basis. In other words, we are trying to assess our predicted direct physical interaction through the functional perspective, with the underline basis, “if the direct physical interactions are real, they will more or less have some functional relevance for the host-pathogen system.” Therefore, we are discussing in this part of manuscript in a way that, if we realize there are some evidence supporting the validity of the predicted host-pathogen direct physical interaction, we will add more discussion and evidence to further explain the predicted interactions. Here in this section of the manuscript, we are using the enrichment analysis of the targeted human Gene Ontology terms to achieve the goal of finding the possible functional correlation. Therefore, this is not a set of analysis that examining each PPI one by one, it is grouping all the targeted human proteins together as one set of proteins that involved in the direct physical interactions with the pathogen proteins, and if they are enriched in many functional terms closely related to the host-pathogen interaction more specifically the human-mtb interaction, the prediction of direct physical interaction as a whole dataset has desirable performance. Therefore in this section picking out the individual one PPI responsible to certain terms maybe not the correct way to explaining the validity of this work, as enrichment analysis is achieved by Hypergeometric test of the whole targeted human protein sets not one by one examining of each PPI.
p.17 “focal adhesion” - this is again a good example, nice result!
Authors’ response: Thanks very much for the comments. Yeah, focal adhesion are among many terms that naturally show up in the final results, so overall our prediction results are very good.
Discussion
Again, broaden the discussion and cite paper also in light of the above points: Cite some experimental verified interaction results and papers describing them, speak about different definitions to define PPI networks - and convince the reader more about the selected interactions you think are probably new and should be there.
Authors’ response: There are limited work on the human-mtb direct physical PPIs available, and we haven’t found any experimentally verified human-mtb PPI simply because this kind of experimental verification are relatively few. But in the future work, we are thinking of experimentally verifing some of our predicted human-mtb PPIs.
Cancer pathway discussion: A nice point, again in the discussion mention people who alerted the cancer community about this connection before (there are such cancer scientists who suggested this analogy with an infection before, but I agree, this is an exciting connection) and then again give a score to these observed functional interactions so that the reader knows which ones exactly to follow up.
Authors’ response: Thanks very much for the comments. The cancer pathways showing up are really exciting here as they provide not only the new perspective of understanding the host-pathogen interaction, but also to some extent support our prediction of direct physical interactions. Here we are also doing the enrichment analysis, which are taking the whole set of the targeted human proteins as one set of human proteins involved in the interaction of human and mtb, and using the statistical approach to see which pathways might be significantly enriched for this set of human proteins. Therefore it has nothing to do the individual scoring of each functional association of the PPIs. Moreover we are actually predicting the direct physical interactions, pathway enrichment here are just an assessment approach for the predicted direct physical interactions from the functional aspect, as the underline premise is if the direct physical interactions are real, they are very likely to be supported by the functional point of view. Therefore we are not predicting the functional association but using the functional enrichment(no matter Gene Ontology or Pathways) to assess the validity of the predicted direct physical interaction.
Hub proteins are more easily involved also in inter-species interactions: Fine, this makes sense. However, also here it would be nice that you really consider some concrete protein examples, then it becomes clear, why and how this specific hub proteins work within host or pathogen as well as in the interactome between them.
Authors’ response: Thanks very much for the comments. It is relatively new topic for this field, around 3 years ago, one group first identified that human proteins involved in the host-pathogen PPIs tend to be the hubs of their own intra-species PPIN, and here we are the first to report this is also the true case for pathogen like mtb, as mtb proteins involved in host-pathogen PPIN also tend to be the hubs of the mtb intra-species PPIN. For the reason of why and how this is the case, why the proteins involved in the host-pathogen PPIN are more likely to be the hubs of their own inter-species PPIN are still under our heavy investigation, we are working very hard on explaining the theory and also supplemented with solid examples. But this will take a while for us to get the final conclusion. An in-depth discussion on topic is out of the scope of the current manuscript. Also, we can not include this part as we still don’t have a final answer why that is the case—but this is certainly worthwhile to report to the community so that many groups who are also interested in this topic can work on it and contribute to this study.
Conclusion
Rewrite in light of all the comments: Key point what did you find and where you are so sure, that an experiment should confirm the homology-predicted novel interaction between host-pathogen proteins?
Authors’ response: Thanks very much for the comments. We have revised the manuscript according to this review. Because of the assessment from localization, from functional enrichment and from pathway enrichment and also from the precise homology transformation, from the highly accurate human-bacteria source PPIs, we have the confidence that the data set we are predicting will be a suitable data set to start with when experimental verification is going to kick in.
Tables, Figures: Figure 1: Very nice, that transmits a clear message! Please prune and reorder all results tables and figures, once you have decided whether you want to show direct physical interactions or all sorts of more or less direct functional interactions and how you then would score them.
Authors’ response: We have revised the manuscript and clearly indicated that we are actually predicting the physical interaction at the beginning of the Background Section. And we have revised the manuscript throughout according to the comments of this reviewer, thanks very much for making our work better.
Reviewer 3 (Second Round): Prof Thomas Dandekar, Biocenter, Am Hubland, University of Würzburg, Würzburg, Germany
The authors made an effort to improve the manuscript. They kindly responded to the suggestions and points made, thank you, however, decided not to change much in the substance of the manuscript. Hence, I would ask the editor to reject the manuscript and in case of publication I strongly recommend to publish my comments so that the community is not mislead by overprediction and bad data.
Authors’ response: We realize although our prediction performs better but we may still have some false positives in our prediction results which are very hard to avoid. We appreciate that, while our prediction is valuable, it is currently still a prediction and cannot be considered a golden standard yet. Only when all the predicted PPIs have gone through stringent experimental verification then we can claim that our prediction is real and can serve as a golden standard for the field.
In case of publication the lowering of the impact factor by Biology Direct may perhaps not happen, as after publication against my strong advice this may become a good example on how to go wrong with confidence.
I appreciate that the authors did a lot of work, and the calculations are technically sound, but the net result achieved will be very misleading for any reader who really wants to know which proteins in M.tuberculosis physically or direct interact with the human host. Unless you do something drastic to your methods (e.g., include a scoring scheme for quality of interaction, consider the compartments of the proteins and whether they can interact actually under TB infection or look meticulously at the biological function) this will stay so.
Authors’ response: We did provide the scoring scheme in this manuscript. It is called“consensus score”. This scoring scheme is primary the number of source PPI matches supporting the human-mtb PPI.
For instance:
Titel: Simply cancel out “Stringent”, “accurate” etc. then it is clear for the reader
what you are doing: overprediction by two different methods of homology prediction.
Similarly “conventional” is also no “conventional” method but use the two terms “intraspecies homology-based prediction” as well as “interspecies homology-based prediction” instead of “stringent” or, previously “accurate” to not mislead the reader
Authors’ response: As we discussed in the previous reply, intraspecies/interspecies homology-based prediction are also very confusing to the readers. Readers may be confused between what we are using to predict (source PPIs) and what we actually predict (predicted PPIs). For example, it can be confusing when we talked about “interspecies homology-based prediction to predict interspecies PPI”. It makes sense to propose the naming in this way, but it may cause more confusion. More importantly, our stringent homology-based prediction means more than interspecies homology-based prediction. For example, we are using human-bacteria source PPI to predict human-mtb PPIs. But if we are predicting human-archaea PPIs, we can not use common interspecies source PPIs like human-bacteria PPIs, we have to look for source human-archaea PPIs for this prediction. Therefore our stringent homology-based are very stringent on the type of proteins in the source PPIs and also stringent on the homology transfer, so that the homology prediction can achieve a better performance.
Then, carefully concentrate on the big major point of my criticism in your study phrased already in the first round of reviewing: You now write “we only focus on the direct physical protein-protein interaction (PPI)” - very well intended, but unfortunately this is exactly not the case:You do a homology-based prediction in both cases.
Authors’ response: There are several issues discussed in this section. First, we are predicting host-pathogen physical PPIs that is why we only transfer physical source PPIs in predicting the targeting host-pathogen PPI. Yes, we do homology-based prediction in both cases, but in both cases we are trying to predict physical host-pathogen PPIs, and in both cases we only use physical PPIs as the source PPIs to make prediction. Therefore we are predicting the direct physical interactions both in stringent and conventional approaches. We agree that it is possible that some of the predicted interactions may not be physical interactions. Nevertheless, some amount of false positives is unavoidable, and we believe (and the reviewer also agrees) our approach has many fewer false positives than earlier homology-based approaches.
To check whether your protein-protein interaction by the - I agree - better method, the interspecies homologybased prediction, you need either a golden standard of experiments (not yet available for your example) or you have to meticulously check which of the numerous potential interactions have really to do with host-pathogen interaction in this specific organisms: human and M.tuberculosis. Some of the findings you discuss in these terms, but there is nothing systematic.
Authors’ response: Limited by the current availability of data, especially for golden standard PPI datasets available for verification, we use indirect approaches to assess the performance of our prediction approach. We believe our effort on methodology will be very beneficial to the community.
Another approach would be to really think about the infection process and then predict in which compartment proteins from the human host DIRECTLY see proteins of M.tuberculosis as you claim you are really after physical interactions. Currently most of the interactions I would consider to be functional and not direct, by pathway connection: the “transcription category” (p.16) is a good example for such loosely connected interaction (and previous criticism ignored), - clarify, give score if you want to mention such interactions, clearly remove if you are after direct physical interactions.
Authors’ response: We are predicting direct physical interaction between host and pathogen proteins. We have discussed strong evidence that supports our predictions, as well as our claim that these predictions are more reliable than previous ones. The evidence includes the existence of homologous host-pathogen interactions, the evidence that the proteins involved are found in MTB infection-related pathways and compartments, etc. We have also provided a scoring scheme (the consensus score) to rank the predictions. We acknowledge that we do not have experimental data that directly verify the predicted physical interactions. But we hope that our predictions, which we believe are far more reliable than previous predictions, will be a useful guide to performing new experiments on human-MTB protein interactions.
As already stated in my last round of comments, you need some scoring to be surer of which prediction to trust or not.
Authors’ response: We have a scoring mechanism, which is called “consensus score”.
Minor point:
Cancer pathway discussion: A nice point, again in the discussion mention people who alerted the cancer community about this connection before, I mean not the BMC Medical Genomics paper of 2009 but the basic concept that cancer is also sort of an infection the human host fights against.
Authors’ response: thanks for pointing that out. We also believe cancer is sort of infection the human host fights against. In this paper, Coussens and Werb discussed a close relationship between infection and cancer [[74]]. We are also currently studying a virus (EBV), the infection of which will significantly increase the likelihood of cancer.
There are more points to improve the manuscript, but my time is also limited.
Authors’ response: Thanks for your time and help in improving our manuscript.