We appreciate the reviewer’s comments from Prof Oliviero Carugo, Dr Alistair Forrest and Prof Manju Bansal. We have revised the manuscript accordingly.
Reviewer #1 (First Round): Prof Oliviero Carugo, University of Vienna, Austria
Major comments
The main assumption of this manuscript is that overlapping lincRNAs (those that are found both in the ENCODE dataset and in the PH dataset) have functions different from unique lincRNAs (those that are found only in one dataset). In fact, on the basis of the data shown in the manuscript, this is an epistemological axiom (if not a postulate) that might be questioned. The experimental observation that five (5) unique lincRNAs are less stable than seven (7) overlapping lincRNAs is in fact insufficient to conclude that there is a functional difference.
Authors’ response: We have revised the whole paper to demonstrate our ideas. We showed that a large proportion of overlapping lincRNAs were more stable than the unique lincRNAs through the analysis of differences in minimum free energy distribution. Furthermore, the experiments of lincRNA half-lives also supply the result. Through not only the half-lives of overlapping and unique lincRNAs but also the well-studied lincRNA with known half-life and previous studies about RNA stablility, we suggested that overlapping lincRNAs (relatively stable linRNAs) and unique lincRNAs (relatively unstable lincRNAs) could be associated with different functions.
As a consequence, the entire manuscript is highly speculative and this must be clearly pointed out in the discussion.
Authors’ response: We have added some evidences to supply our ideas and moderated some speculations which were insufficient evidences in the discussion.
In the chapter entitled “Obtaining annotated and putative novel lincRNAs”, in the “Methods” section, a more detailed description must be provided to allow one to reproduce the algorithm. If this is too long, a supplementary file may be submitted (with some examples).
Authors’ response: We have re-written the lincRNA analysis workflow in the “Methods” section.
Also the introductory section should be modified. The first part of the “Background” section is a good introduction about the stability of non-coding RNAs and its functional implications. Some additional emphasis should be put on the relationships with cancer and diseases. Otherwise, the mention to medicalissues is inappropriate. The last part of the “Background” section, which is a short summary of the experiments and of the results of the manuscript, should be slightly expanded to be more easily understood by the reader. In its present form, it is too short and it becomes, inevitably cryptic.
Authors’ response: Thanks for the comments of reviewer. We have re-written the introduction and revised the whole article.I cannot inspect Figure 8 (resolution is too small).
Authors’ response: We have adjusted it in the revised version.
Minor comments
Section “Background” line 43 – I am expecting that the Authors describe briefly which cancers and which diseases are related to lncRNAs. The first sentence in this section is otherwise too vague for a scientific publication.
Authors’ response: We have revised the introduction and deleted the first sentence.
Section “Background” line 46 – Please, check if the expression “… that are proved the importance …” is correct. Perhaps it should be “… that are proven to be important …”
Authors’ response: We have revised the introduction and deleted the sentence.
Section “Background” line 62 – Please, check if the expression “… Clark et al found only a minority …” is correct. Perhaps it should be “…Clark et al found that only a minority …”.
Authors’ response: Thanks, we have corrected it in the revised version.
Section “Background” line 68 – Please, check if the expression “… we improve lincRNA …” is correct. Perhaps it should be “… we improve the lincRNA …”
Authors’ response: We have corrected it in the revised version.
Section “Background” lines 68-71 – These two sentences, which are a sort of summary of the experiments described in this manuscript, are a bit confusing. First, the Authors write that they compared “different RNA-Seq datasets. Then, they write that they compared “both RNA-Seq datasets”. The question is: are they “several” or just “two”? Moreover, the Authors should describe briefly which are these datasets, how they were assembled, validated, compared, etc. Just few sentences should be enough to improve the readability of the manuscript.
Authors’ response: We revised introduction and described the improved lincRNAs workflow in the “Methods” section.
Section “Background” lines 72-73 – The expression “through randomly testifying … unique lincRNAs” is not clear. The Authors should re-write it.
Authors’ response: The sentence has been re-written according to the reviewer’s suggestion
Section “Background” line 74 – The expression “coinciding” might be “in agreement with”.
Authors’ response: We have deleted the sentence in the revised manuscript.
Section “Background” line 70 – The Authors should justify why they selected K562 cells.
Authors’ response: It has been revised according to the reviewer’s suggestion.
Section “Background” lines 76-77 – The sentence “Therefore, we suggest … lincRNA stability” is unclear and should be re-written.
Authors’ response: We rewrote it in the revised manuscript.
Section “Results and discussion” line 85 – A reference to Cufflinks is mandatory. And it is also necessary to mention what is that (briefly).
Authors’ response: We have added the cited paper in the revised manuscript.
Section “Results and discussion” line 86 – Probably it would be better to write simply “by considering the annotations present in XXX” or “by using the software XXX” and “(see Methods for details)”. In its present form and without references, this list of resources is not really readable.
Authors’ response: We added the cited paper in the revised manuscript.
Section “Results and discussion” lines 90-91 – The Authors should justify why their method is more effective than alternative methods based on the integration of several databases. In fact, I understand that it is simpler. But I am not sure it is more effective.
Authors’ response: We have revised it.
Section “Results and discussion” lines 103-104 – The Authors mention “four public database annotations”. They should also indicate their names and where they can be found.
Authors’ response: It has been revised according to the reviewer’s suggestion.
Section “Results and discussion” line 104 – The Authors mention “PH” here for the first time: it is necessary to define it explicitly. If I am not wrong, it is defined only in the Abstract and this is not sufficient.
Authors’ response: PH was defined in the “Methods” section.
Section “Results and discussion” lines 109-113 – The Authors describes the“minimum free energy”. This should be described better and it is necessary to cite the computational procedure used to compute this thermodynamic quantity.
Authors’ response: We have revised it in the “Methods” section according to the reviewer’s suggestion.
Section “Results and discussion” lines 116-120 – These sentences repeat the“Background” section and should be removed (or moved to the introductory section).
Authors’ response: These sentences been deleted according to the reviewer’s suggestion in the revised manuscript.
Section “Results and discussion” lines 126-127 – The expression “Comparing the both datasets, …” might be inappropriate and might be even removed.
Authors’ response: It has been revised according to the reviewer’s suggestion.
Section “Results and discussion” line 128 – RNAfold can be used to compute something and it cannot be calculated.
Authors’ response: We have deleted the sentence in the revised manuscript.
Section “Results and discussion” line 132 – The verb “testified” might be inappropriate. Why not “determined experimentally”?
Authors’ response: It has been revised according to the reviewer’s suggestion.
Section “Results and discussion” line 134 – The “5 unique lincRNA” were taken from PH or from ENCODE?
Authors’ response: It has been revised according to the reviewer’s suggestion.
Section “Results and discussion” line 136 – A reference to “qPCR after ActD treatment” is necessary.
Authors’ response: We have added the cited paper in the revised manuscript.
Section “Results and discussion” lines 137-138 – A expression “… which is coordinated …” might be inappropriate. Perhaps it might be changed into “… in agreement with …”.
Authors’ response: We have deleted the sentence in the revised manuscript.
Section “Results and discussion” line 143 – The verb “express” might be “may be expressed”. Section “Results and discussion” line 151 – The verb “is” might be inserted between “It” and “highly”.
Authors’ response: It has been revised according to the reviewer’s suggestion.
Section “Results and discussion” lines 150-157 – Caution: this discussion is highly speculative and it should be re-written. It is necessary to clearly indicate that these are mere suppositions of the Authors that are not based on any new results.
Authors’ response: We have re-written in the revised manuscript.
Section “Results and discussion” line 164 – The expression “FPKM” should be indicated extensively.
Authors’ response: We have re-written the sentence in the revised manuscript.
Section “Methods” line 188 – The verb “were” might be “was”.
Authors’ response: Thanks, we corrected it in the revised version.
Section “Methods” line 196 – The sentence “Total RNA was extracted as described above” might be deleted.
Authors’ response: We have deleted the sentence in the revised manuscript.
Section “Methods” line 231 – “We” might be “we”.
Authors’ response: We are sorry about this mistake, and have corrected it in the revised version.
Reference number 8. The journal name should be abbreviated. The same goes for several other references.
Authors’ response: We fully checked the manuscript and revised similar question.
Reference number 34. Volume and pages are missing. The same happens also in some other reference.
Authors’ response: We fully checked the manuscript and revised similar question.In the right part of Figure 1, the box named “Non-coding” might be modified by listing the four programs in the same order used to describe them in lines 93-100 of the section “Results and discussion”: iseeRNA, CPC, PhyloCSF, CPAT. In the present figure, CPAT is the second program and not the fourth.
Authors’ response: We have modified the description about Figure9in the “Methods” section.The caption of Figure 2 is unclear. It should be re-written.
Authors’ response: We have revised it.Figure 3 – Which units are used the measure the free energy? The data shown in the figure are taken from PH or Encode? It is also necessary to write what are the thick horizontal black lines and what the error bars indicate. In fact, although most of the readers will understand this figure, it is mandatory to write an exhaustive legend.
Authors’ response: It has been revised according to the reviewer’s suggestion.In Figure 4, “Encode” should be “ENCODE”.
Authors’ response: We are sorry about this mistake, and have corrected it in the revised version.Figure 5 – See the observations about Figure 3.
Authors’ response: It has been revised according to the reviewer’s suggestion.Figures 6 and 7 – These curves have very different shapes. Might the Authors try to describe them and write something about the differences?
Authors’ response: It has been revised according to the reviewer’s suggestion.
Reviewer #1 (Second Round): Prof Oliviero Carugo, University of Vienna, Austria
The revised version of the manuscript is considerably better than the original version. However, in my opinion, there is still a very modest direct and experimental evidence of the difference between overlapping and unique lincRNAs. As a consequence, the conclusions are extremely speculative.
Authors’ response: To explore the stability of two separated classes of lincRNAs by comparing both RNA-Seq datasets of K562 cells, we have analyzed minimum free energy distribution and measured the half-lives of the selected lincRNAs. The results supported the conclusion that the overlapping lincRNAs show more stable than the unique lincRNAs in K562 cell. Actually, the conclusion could be accepted in a general consideration that the overlapping lincRNAs expressed in both datasets should have more probability to be detected than the unique lincRNAs only expressed in one single dataset. Therefore, the lincRNAs existing in two datasets were likely more stable than those existing in one single dataset.
Reviewer #2 (First Round): Dr Alistair Forrest, Omics Science Center, Japan
Major comments
1. "The method is simpler and more effective than integrating several database annotations by their own scripts [29, 30]." Unless you provide benchmarking this is an unsupported statement. Should be removed.
Authors’ response: We did not compare the effectiveness without their scripts, however, our pipeline is simpler and more convenient than integrating several database annotations by the scripts.
2. "our sequenced RNA-Seq dataset (PH)." Why is the dataset called PH?
Authors’ response: We are really grateful to the reviewer’s carefulness. A series of RNA-Seq datasets were sequenced in K562 cells by PMA or hemin treatment. PH was untreated RNA-Seq dataset.
3. The main point of the paper seems to be that reproducible lncRNAs have higher free energy and inferred secondary structure than lncRNAs only observed in one dataset. I think this is likely to just reflect abundance. More highly expressed lncRNAs are more likley to be observed in multiple datasets. Weakly expressed lncRNAs are less likely. The authors could perhaps strengthen their story by looking at the relationship between expression level and stability by breaking the lncRNAs into several bins (low, mid, high) and examine the free energy (and half-life) in box plots.
Authors’ response: Thanks for the reviewer’s helpful suggestions. We have taken this advice and analyzed the relation of the minimum free energy and expression level in the revised manuscript. However, they were not correlated, similar to no correlation between lncRNA (including lincRNA) expression and half-life.
4. "highly expressed in ENCODE, but hardly expressed in PH." For these kind of statements it is very important to explain what RNA-seq protocol was used for ENCODE and PH. If the methods do not match this may explain why you see differences.
Authors’ response: We appreciate the reviewer’s suggestion. ENCODE and PH datasets were sequenced following the manufacturer’s instructions of Illumina. And we analyzed them using the same method (see Methods ).
5. "Unstable lincRNAs are very sensitive, respond rapidly when transcription changes and act almost immediately after transcription without producing a functional gene product in the nucleus." Unsupported statement. You do not demonstrate that unstable lncRNAs are inherently 'rapid responders', neither do you demonstrate 'without producing a functional gene product'. The manuscript should be re-read critically examining whether your statements are supported by your analysis or from a primary reference.
Authors’ response: We added the cited paper and revised the sentence.
6. "Pervasive transcription of lincRNAs in K562 cells". This section doesn't really add anything. "514 lincRNAs (FPKM > =1) in ENCODE (80 for FPKM > =10), whilst there were 312 lincRNAs (FPKM > =1) in PH(30 for FPKM > =10). 89 overlapping lincRNAs of both datasets expressed with FPKM > =1." This does in no way suggest pervasive.
Authors’ response: We really thank for the reviewer’s question. We have re-written the section. The distribution and expression of lincRNA were showed in the revised Figure1.
Minor comments
1. "LncRNAs, located and transcribed from intergenic genomic regions that are proved the importance in cancer by genome-wide studies, are named intergenic lncRNAs (lincRNAs)," Do you mean the lncRNAs or the genomic regions where they are found are associated with cancer? The references 5, 6 do not directly correspond to a cancer link.
Authors’ response: We have rewrote the introduction and deleted the sentence.
2. "lincRNA" and "lncRNA" are used interchangably in the paper. lincRNA corresponds to 'intergenic', whereas lncRNA are just long. You should make a definition and stick to one, probably lncRNA.
Authors’ response: LincRNA is a subgroup of lncRNA. The features of lncRNA also are showed for lincRNA. Some cited papers were associated with lncRNA, however, lncRNA included lincRNA.
3. "It is found that a large proportion of overlapping lincRNAs are more stable than unique lincRNAs". I suggest the authors not use the term 'overlapping lincRNAs' as this suggests genomic overlap. I think what the authors mean is 'lncRNAs observed in multiple K562 datasets are more stable than lncRNAs unique to one K562 dataset'
Authors’ response: We have annotated overlapping lincRNAs and unique lincRNAs in the “Results” section. We compared PH and ENCODE datasets to attain overlapping and unique lincRNAs in venn diagram (Figure2). Unique linRNAs presented in only PH or ENCODE dataset, and overlapping lincRNAs presented in both ENCODE and PH datasets.
4. "In light of this, we acquired a great deal of intergenic transcripts involved possible novel lincRNAs and annotated lincRNAs with Ensembl or Gencode". Unclear what "transcripts involved possible novel" means.
Authors’ response: It has been revised according to the reviewer’s suggestion.
5. "Hexamer usage bais" Bias
Authors’ response: Thanks, we have corrected it in the revised version.
6. "1804 lincRNAs were indentified" Identified
Authors’ response: We are sorry about this mistake, and have corrected it in the revised version.
7. "We randomly testified the half-lives of 7" tested
Authors’ response: We have revised it.
Reviewer #2 (Second Round): Dr Alistair Forrest, Omics Science Center, Japan
Major points
1. Ok. The authors have removed the sentence, still the section is entitled 132
"Improved pipeline for lincRNA analysis". I would change to just "Pipeline for lincRNA analysis" as still you have provided no evidence that the pipeline is 'improved'.
Authors’ response: It has been revised according to the reviewer’s suggestion.
2. FIX. The authors must add a sentence at the very first use of PH explaining that PH is (PMA or Hemin treated K562s).
Authors’ response: We have added the annotation at the very first use of PH in the revised manuscript. In this study, PH was only named for our RNA-Seq dataset of untreated K562 cells.3. FIX. Figure 4 should not be split into A and B on 1 FPKM. Only one set of boxplots should be shown.
Authors’ response: Figure4splits into A and B on 1 FPKM in order to correspond to the minimum free energy (A and B in Figure3). It can clearly illuminate the cases of FPKM ≥ 1 and FPKM < 1.
4. OK. Actually GSM765405 was sequenced using the CSHL RNA-seq protocol not Truseq. However should be comparable.
Authors’ response: In this study, we have compared the protein-coding RNAs from both datasets by different sequencing library construction methods. We found that 2546 (87.4%) protein-coding RNAs in PH also presented in ENCODE, which showed that the results from CSHL RNA-seq protocol and Truseq are comparable.
5. OK
6 OK
New minor comments
1. line 139 PhloCSF [34]. PhyloCSF
Authors’ response: We have corrected the spelling mistake in the revised version
2. expressive abundance (FPKM). expression level or transcript abundance
Authors’ response: It has been revised according to the reviewer’s suggestion.
Reviewer #3 (First Round): Prof Manju Bansal, Indian Institute of Science, India
The authors evaluate the stability of long intergenic non-coding RNA (lincRNA) in human K562 cell using the RNA-seq data. Two datasets of lincRNA are used in the study. The authors developed a pipeline to enriche lincRNA compared to that seen in ENCODE dataset. The stability of the lincRNA in these two dataset is compared, using mfe predicted by secondary structure program RNAfold. The reason to carry out this comparison is not clearly explained. It is no surprise that non-coding regions have lower stability than protein-coding regions.
Authors’ response: The secondary structure of lincRNA is important for its stability. We estimated the stabilities of overlapping and unique lincRNAs by bioinformatic method (minimum free energy). Furthermore, through the analysis of minimum free energy, our result agreed with previous report that lincRNA have lower stability than protein-coding RNA. That is, it was verification that the stability of lincRNA could be assessed using minimum free energy.
Further, the dataset is divided into overlapping and unique. Stability study shows that overlapping lincRNAs are more stable than unique lincRNAs. The explanation for the observed difference is not clearly stated. I cannot understand why lincRNA identified in two datasets (overlapping) should be more stable.
Authors’ response: We showed that a large proportion of overlapping lincRNAs were more stable than the unique lincRNAs through the analysis of differences in minimum free energy distribution and lincRNA half-lives.
RNA half-life studies are carried out on few lincRNAs under each category. The findings of the RNA half-life studies, using very few samples (5-7), cannot be generalized for > 700 odd lincRNAs in each group. The conclusion that lincRNA stability may be related to function is already shown in Ref [26, 28]. Overall, the conclusions made by the authors are rather weak or already estabilshed.
Authors’ response: Since lots of experiments may need too much time, we have added some experiments of lincRNA half-lives. We have revised the conclusions and suggested that overlapping lincRNAs (relatively stable linRNAs) and unique lincRNAs (relatively unstable lincRNAs) have different functions.
The manuscript has several grammatical mistakes, starting from the very first sentence in the Introduction section. Many paragraphs have loose statements and irrelevant information. E.g. See line 116 to 120. LincRNA and lncRNA are used interchangeably in many places.
Authors’ response: We really appreciate the reviewer’s advice. We have revised the whole paper (including introduction) and deleted some irrelevant information.
The methods section dealing with RNAfold need to be elaborated. Figures need to have proper units and labelling. e.g. MFE units.
Authors’ response: It has been revised according to the reviewer’s suggestion.
Reviewer #3 (Second Round): Prof Manju Bansal, Indian Institute of Science, India
I had requested the author to explain the purpose of dividing the dataset into Overlapping and Unique. See earlier comment “Further, the dataset is divided into overlapping and unique. Stability study shows that overlapping lincRNAs are more stable than unique lincRNAs. The explanation for the observed difference is not clearly stated. I cannot understand why lincRNAs identified in two datasets (overlapping) should be more stable”. The author’s explanations are not satisfactory and have merely reiterated the aim of the work.
Authors’ response: The lincRNAs are considered to play very important roles in gene regulation which shows dynamic properties in cellular processes. Therefore, identification of lincRNA stability is important to annotate its functions. In this paper, we classified the lincRNAs into overlapping lincRNAs and unique lincRNAs by comparing two RNA-Seq datasets of the same K562 cell line with venn figure (Figure2). In general, it is easy to understand that the overlapping lincRNAs are more stable than the unique lincRNAs, because the expressing probability of lincRNA should have high value if it could be observed in two separated experiments from ENCODE and our own detection in K562 cells. That it, lincRNAs expressed in both experiments statistically have higher expressing probability than lincRNAs only expressed in one single experiment. We have carried out RNA-Seq of K562 cell line, compared our RNA-Seq dataset with the corresponding ENCODE dataset, and found that a large proportion of protein-coding RNAs (86.7%) of our dataset appeared in ENCODE, but relative small proportion of lincRNAs (44.1%) of our dataset appeared in ENCODE (Figure7). We speculated this phenomenon arises due to the instability of lincRNAs during the cellular processes. We classified the overlapping part (overlapping lincRNAs) and the unique part (unique lincRNAs) from two datasets, and compared their stabilities. We have proved that a large proportion of overlapping lincRNAs were more stable than the unique lincRNAs through the analysis of differences in minimum free energy distribution and lincRNA half-lives.
The authors postulate that the unique lincRNAs present in ENCODE or PH dataset should be functionally different. This implies that the RNAs that are identified and annotated as lincRNAs by both ENCODE and PH pipeline have similar function and this not true.
Authors’ response: We suggested that overlapping lincRNAs (relatively stable linRNAs) and unique lincRNAs (relatively unstable lincRNAs) can be related to different functions, because they might be related to different cellular processes.Further, minimum free energy of RNA is dependent on the GC content. A comment on the GC content of the overlapping and unique lincRNAs can explain the observed difference between the groups (Figure 3). Similar observation is warranted for half-life based stability analysis. If the GC content of the randomly selected lincRNAs in the two groups (overlapping (10 sequence) and unique (7 sequence)) is different then the selection of lincRNAs is biased by the sequence property. Moreover, mRNA expression is known to be affected by GC content. (G Kudla - ‎2006, PloS Biology.). Authors can check this phenomenon by binning the lincRNAs based on GC content and correlated with the level of expression.
Authors’ response: In the previous studies, there is no correlation between lncRNA expression and its stability, although the signification correlation has been found for all RNAs (e.g. Clark et al. Genome Research, 2012, 22:885-898).
PH dataset: full form of the abbreviation is given in the end (Methods section).
But the first mention is in ‘Introduction’ (line 124).
Authors’ response: We have added the abbreviation of PH dataset directly in the introduction section where we first mentioned it.
FPKM: the significance and full form of FPKM is not mention (Line 146)
Authors’ response: We have added the full form of FPKM in revised version according to the reviewer’s suggestion.
Line 82: ‘beacause’
Line 140-141: ‘were remained to carry out’
Line 147: ‘annotatied’
Authors’ response: We have corrected the above spelling and grammar mistakes in the revised version according to the reviewer’s instruction.