Reviewer’s report 1: Susmita Datta
The goal of this paper is to find eigengenes that can serve as potential biomarkers for improving the prognosis of high-risk patients and to give a biological description of these eigengenes. Overall, the authors’ methods and approach are valid (but see major recommendation 1), and their results are promising.
In the methods section, it isn’t clear whether the lmQCM algorithm for determining modules and corresponding eigengenes was applied to the entire dataset or only to the training data. If the former, then the cross validation performed later to assess the performance of the lasso-cox model might be biased. The concern is that, even though the lmQCM is unsupervised (i.e. the survival times aren’t used), if the eigengenes are not stable then using the whole dataset to construct them may lead to underestimation of the error rate during CV (because we are selecting features favorable to both the train and test data). It would be good to check that similar eigengenes are obtained from just the training data alone.
Author’s response: As correctly pointed out by reviewer, lmQCM does not use any information about the survival and thus it is an unsupervised method. As the reviewer suggested that the eigengenes stability is very important. We therefore tested the correlation of the genes in testing sets use concordance index (a metric we developed previously to test the correlation of genes in a co-expressed module) to test their stability. The result shown in Additional file 3: Figure S1 and Additional file 4: Figure S2 below. The co-expression modules were first detected from training set, and then the concordance indices were calculated for each gene module in the testing set. The observation is that the concordance indices are stable between the training and testing sets for all the modules and are significantly higher than randomly selected gene sets, which demonstrated the stability of the modules and our approach.
The primary tool for assessing the prognostic ability of the eigengenes is through Kaplan-Meier (KM) curves and the log-rank test. The KM curve using INSS stage (1, 2, 3, 4, and 4 s) is used as a baseline, however this is not adequate. The stratification of patients into risk groups in practice takes other clinical into variables. For example, MYCN amplification is well known to be highly predictive of high-risk patients. A fair evaluation of the authors’ method would be to use the KM curve constructed using the (clinically evaluated) high-risk indicator that is already provided for each patient. Alternatively, since high-risk patients are of primary interest, the authors can subset on these patients and see whether their method can significantly sub-classify those patients. As it stands, it is not clear if the eigengenes provide any prognostic value beyond that provided by clinical variables currently in use.
Author’s response: The patients of focus are already labeled as high-risk, which are the patients with stage 4 disease more than 18 months at diagnosis and patients of any age and stage with MYCN-amplified tumors. The MYCN cannot make more contribution for classification of the high-risk patients. But our workflow can give a better classification than use the clinical stage with these patients.
This study uses overall survival as the outcome, but how does this approach perform for predicting event-free survival? Are there eigengenes that are associated with this outcome as well? And if so, are they different from the ones associated to overall survival.
Author’s response: We thank the reviewer for this important point. In this paper with the selection of data we focus on overall survival, the event-free survival for events such as relapse and metastasis will require more comprehensive set of data beyond the scope of this paper even though but our methods will be applicable on these data.
Since copy number variation (CNV) data is available for these patients, and the authors suggest (page 3 line 8) that lmQCM can find modules that are association with structural mutations (like CNV). The CNV data provides an opportunity to verify that claim. It was also mentioned (page 5 line 32) that some M36 genes are “co-localized on the same cytoband, which indicates a potential structural variant in NB patients.” the CNV data can be used to investigate this.
Author’s response: We totally agree and the integration/comparison with CNV data is part of our ongoing work.
Page 3, eq. (1): Is this using the Frobenius norm? The norm used is not stated.
Author’s response: Yes, we clarified this in the revision.
Page 3, line 46: Computing p-values is done by “randomly selecting K genes for 1000 times”. Is this sampling done within the given module or among all genes? If the latter, is it sampling with replacement.
Author’s response: This sampling is performed within the given module. We provide a more detailed description in the paper.
Page 4, line 27: “We found that by setting and o be 30, 0.8, 20 respectively, …” contains typos. Consider “We found that by setting the three parameters to 30, 0.8, and 20, respectively”.
Author’s response: We revised the description.
8. Page 5–6: The figure references do not match. Figure 2(a-g) in the text should be changed to Fig. 3(a-g).
Author’s response: We modified the figure captions in the paper.
Reviewer’s report 2: Marco Chierici
The authors state that “based on the clinical data, 259 patients were assigned in low risk group while 239 were assigned to the high risk group”: Unfortunately, this is not correct for two reasons. First, according to the provided clinical characteristics file, the high-risk patients are 176; secondly, the patients not marked as “high-risk” are not “low-risk” but can be either low or intermediate risk, thus they should be considered as “non high-risk”. Based on this classification, there are 13 patients among the non high-risk group that are not alive, differently from what stated in the paper. Please clarify this point and revise the results.
Author’s response: We thank the reviewer’s thoughtful comment. In the original version of the paper, the 239 patients in the high-risk group was labeled based on our classification result from a companion paper using our algorithm. In this revision instead we focused on the 176 high-risk patients which are provided by clinical characteristics labeling from the CAMDA competition. And we recalculated the result showed substantial improvement over clinical staging. We have clarified this in the revision.
About data preprocessing, were the microarray probes summarised at the gene level? If so, how? Parameter tuning in lmQCM was “based on previous work”, but this is unreferenced: Please provide a reference if available.
Author’s response: We provided a reference to our previous paper in this revision.
What about the rationale behind parameter tuning? Was it used in a similar condition? Was cross-validation used?
Author’s response: Based on our extensive previous work, we have empirical knowledge about the range of four the parameters. We compared the different parameter in this range, lmQCM method used these parameters in the paper as they often led to balanced sizes of the gene modules with clear biological interpretations for individual modules.
Regarding the parameter tuning in SNF: Did the authors try a grid search over the three SNF parameters, using cross-validation to evaluate the performance? How were the classification results evaluated in practice?
Author’s response: We applied a grid search over the three SNF parameters.
The references to figures in the main text are out of sync with the actual figure numbers, i.e. there are references up to Fig. 2 but there are 4 figures. Moreover, the caption for Fig. 3 is missing. Figure 5 A-d lacks a legend explaining the colors and is not referenced in the text; moreover, a different type of plot could better vehicle the information in a more compact way.
Author’s response: We modified the figure captions.
Please address minor typos such as missing spaces (as in the title of the methods section about SNF) and missing symbols (as the parameters in the SNF section). Some long sentences may be simplified (e.g., “To test the power of the combination (...) or the clinical stage information.” in conclusions).
Author’s response: We corrected the typos and simplified long sentences.
Reviewer’s report 3: Dimitar Vassilev
Major merit of the study is the originality of the used methodology in the context of the applied procedures and approaches for emerging the dependance between the co-expressed genes and the potential of survival time prediction of the patients studied. All those methodological steps are composed in a workflow which has a potential capacity to be used in another cancer studies
Author’s response: We thank the reviewer for the encouraging comments on this work.
The suggested approaches for data integration based on mining gene co-expression network (GCN) is known and already applied in the studies, but the problem here is related to the selection of features in the context how to build and how to apply such a model (i.e. GCN) my remarks here can be related not to the applied method but again to the “tuning” of initial parameters and the potential of possible validation of them. And finally the method of similarity network fusion (SNF) for merging the eigengenes and to test their potential for functional biomarkers drops in semantics of the results in particular to the poorly explained functional annotation through the gene ontology enrichment. As it was presented and described, the workflow demands some clarification in terms of functionality of each step in it as well the total idea for validation of the functionality of the prognosticated biomarkers concerning the risk assessment for the survival time of the studied patients
Author’s response: We provided more clarification for the functionality of each step in the workflow.
There are also some potential remarks in using “our recently developed wighted network mining algorithm” based on local maximum click optimisation - where is not so clear for the point of view of defining of some initial parameters and their comparability
Author’s response: Based on our extensive previous work, we have empirical knowledge about the range of for the parameters. We compared the different parameter in this range, lmQCM method used these parameters in the paper as they often led to balanced sizes of the gene modules with clear biological interpretations for individual modules.
The submitted material needs of a thorough revision in English - both grammar and morphology which will improve significantly the and semantics of sentences. The illustrations are possibly the most questionable part of the study. I think the authors can renew the design of some of the figures which can be related in quite better manner to the obtained results (Fig. 5a, d)
Author’s response: We checked the grammar and layout of the paper. Since Fig. 5 was confusing to readers, it was removed in the new version of the paper.
The number and inclusion of references are limited and not enough for such an original work
Author’s response: We added more references to support our work.
Conclusions are as well recommended to be corrected in the context of the suggested workflow and the completeness of the work provided by that workflow
Author’s response: We revised the description.
Also avoiding for example such freely hanging phrases having obvious lack of comparability as “...which not only help achieve a more accurate survival prognosis...” will give the work better merit
Author’s response: We revised the text accordingly.
There are some obvious errors in grammar - in particular in the use of complex sentences and verbs with different tenses. The style can be improved also as a result of correction of the text in the context of spelling and grammar.
Author’s response: We checked the spelling and grammar and made revisions accordingly.
The level of the submitted material will be improved significantly by renewing some of the graphics (Fig. 5a, d)
Author’s response: Since Fig. 5 was confusing to readers and was redundant to Fig. 2, it was removed in the new version of the paper.
The data preprocessing and subsequent clusterization: Due to the highly unbalanced nature of the data there might be problems in defining categories as high or low risk. How the authors overcome the unbalancedness and the heterogenity of the data? Do the authors measure in someway the possible errors due to this problem?
Author’s response: We thank the reviewer to point out the unbalanced data problem. If the reviewer refers to the clinical stage and clinical risk. Yes, there is unbalance issue. The number of patients labeled as stage 4 s and high risk are smaller/higher? (check it to be specific). However, we want to find survival-associated features. After we combined the deceased patients, the 105 patients deceased among total of 498 patients (21%), and among them, 92 patients are clinical high-risk in total of 176 clinical high-risk patients (55%). We think the sample sizes and proportions are appropriate for our statistical analysis. Furthermore, we used Regularized Cox proportional hazards model to calculate the risk indices of all patients. The median of risk indices of the training examples was used as a threshold to split patients into low-risk and high-risk groups. The same threshold was applied to classify the single held-out patient into one of the two groups, which means we were not using the same clinical categories as originally curated, which does not incur the unbalanced data issue. At last, we tested if these two groups have distinct survival outcome using Kaplan-Meier estimator and log-rank test. We divided patients into two groups (low and high group) where the median of each feature was used as a cut-off point. By using median as cutoff in the above two steps, we mitigated the unbalanced data issue in our survival association analysis.
The suggested lmQCM approach for the purposes of defining GCN modules is interesting and having in mind some previous publications of the authors - it is a well tested method. However in the submitted material will be worth to explain what are exactly in this study the suggested four parameters Lambda, Alfa, t , and Beta. Definitely the fine tuning of these parameters can influence the final result in a large scale - it will be good to have authors explanation for these problems.
Author’s response: Yes, as the reviewers pointed out, lmQCM has been applied to various types of cancer studies previously, and the meanings of the parameters were discussed in details in the previous publications [10, 16]. To further explain them, we added the following section to the manuscript: There are four parameters for lmQCM: γ, λ, t, and β. Among them, γ controls the threshold for the initiation of each new module, λ and t define the adaptive threshold of the module density to ensure proper stopping criterion for the greedy search for each module, and β is the threshold for overlapping ratio for merging. We used the same settings for our GCN module mining as in  for those parameters, which have been proved to generate functionally meaningful modules from multiple cancer datasets.
The used Lasso-Cox model is a reasonable approach for defining the so-called risk index of the patients as it is given in the submitted material. The problem with such models as lasso regression (also elastic regression) can arose when they are applied to multivariate space parameters. Although the reduced parameter space by the eigengenes give some relaxation of such models it will be worth to explain the options how to control the Lasso-Cox risk index estimates from certain bias and what is the best way to validate this process?
Author’s response: We thank the reviewers to point out this. To address the problem of applying lasso regression to multivariate space, we used a two-level cross validation (CV) strategy. The first level was leave-one-out CV. Namely, a single patient was chosen as test set, with the rest as training set. Then in the training set, we performed 10-fold CV to select the best regularization parameter. Regularized Cox proportional hazards model was built on the training set using the selected parameter, and based on the model, risk indices of all patients were calculated.
The data preprocessing and subsequent clusterization: The Gene Ontology enrichment analysis might be not the major objective of the study but it is presented in a very limited manner. Using only a single tool for enrichment from an external knowledge source provokes a lot of questions about the accuracy of the defining (co)-expressed genes and in particular the accuracy of their annotation. My suggestion is that such an ontology enrichment can be extended at least to the major knowledge sources as Gene Ontology, NCBI, other. This can open some parallel to the study problems but from other view angle can extend and enrich all the suggested workflow by the authors.
Author’s response: The online gene list enrichment tool ToppGene (http://toppgene.cchmc.org) developed by Cincinnati Children’s Hospital Medical Center  was used for all of the module functional enrichment analysis. ToppGene not only carries out enrichment analysis on standard Gene Ontology, it also generates enrichment results from more than 20 different sources including pathway databases, human and mouse phenotypes, NCBI PubMed, transcription factor binding sites, and drug information. We clarified in the revision.
The last two part of the results section “Survival-associated feature selection using lasso-regularized Cox proportional hazard model” and the next one “Prognostic prediction based on integrative analysis” are written mostly as material and methods part. There are some problems again how are selected the features for Lasso-Cox model. The selection and subsequent clusterization of the selected eigengenes for obtaining some confidential biomarkers possibly needs some more methodological work. Nevertheless it would be good to get some explanation by the authors about the methodological solution and the obtained results more clearly: why it was done in this way?
Author’s response: We thank the reviewer’s comment, it helps for us to rethink and better elucidate our purpose of study. To address this, we moved part of contents of the Results section “Survival-associated feature selection using lasso-regularized Cox proportional hazard model” and the “Prognostic prediction based on integrative analysis” to the Materials and methods section. We also added the details of our method and written in a more methodological form to explain our workflow.