Direct next-generation sequencing of virus-human mixed samples without pretreatment is favorable to recover virus genome
© Li et al. 2016
Received: 12 August 2015
Accepted: 5 January 2016
Published: 12 January 2016
Next-generation sequencing (NGS) enables the recovery of pathogen genomes from clinical samples without the need for culturing. Depletion of host/microbiota components (e.g., ribosomal RNA and poly-A RNA) and whole DNA/cDNA amplification are routine methods to improve recovery results. Using mixtures of human and influenza A virus (H1N1) RNA as a model, we found that background depletion and whole transcriptome amplification introduced biased distributions of read coverage over the H1N1 genome, thereby hampering genome assembly. Influenza serotyping was also affected by pretreatments. We propose that direct sequencing of noncultured samples without pretreatment is a favorable option for pathogen genome recovery applications.
This article was reviewed by Sebastian Maurer-Stroh.
KeywordsPathogen genome recovery Nonculture Background Depletion Whole Transcriptome Amplification Next-generation sequencing
Pathogen identification is a critical clinical application [1–3]. Identification methods based on culture have disadvantages, such as long turnaround time, increased biohazard risks, and culture bias. The high-throughput feature of NGS enables the recovery of pathogen genomes from noncultured samples, and offers the potential for highly accurate pathogen identification and rapid clinical diagnoses [4–12]. Many researchers have reported the NGS-based identification of pathogens from various noncultured samples [13–21], such as Old World arenavirus (brain et al.) , influenza virus (nasopharyngeal aspirate) , norovirus (feces) , dengue virus , yellow fever virus (serum) , Shiga-toxigenic Escherichia coli O104:H4 (feces) , and most recently, Ebola virus (serum et al.) [13–16].
Two major challenges must be overcome when we seek to recover pathogen genomes from noncultured samples: noise from host and/or microbiota cells, and limited availability of DNA/RNA. Consequently, two pretreatments are usually employed before sequencing noncultured samples: background depletion (BD) to increase the signal-to-noise ratio [22, 23], and alleged unbiased amplification to increase the amount of available nucleic acid in order to meet the requirement of NGS library preparation [24, 25]. Despite of the benefits, how these pretreatments influence pathogen genome recovery during the sequencing of pathogenic DNA/RNA from noncultured samples has not been fully investigated.
Effects of pretreatments on influenza virus identification
We applied different pretreatments (BD with or without Whole Transcriptome Amplification, abbreviated as WTA) to mixtures of human RNA and influenza A (H1N1) virus RNA, as a noncultured model system, and applied NGS to evaluate the effects of pretreatments on influenza genome recovery (Additional file 1: Figure S1). The four sample pretreatments were as follows: (1) BD, (2) WTA, (3) BD + WTA, and (4) no pretreatment. Effects of amplification time (2 or 8 h) and viral ratio (0.55 or 1.5 % viral RNA within RNA mixtures) were examined. NGS libraries were constructed of samples with different pretreatments. We obtained 12 gigabases of sequence data. After quality control and removal of human reads, the remanent reads were aligned to a dataset consisting of 246,715 flu genome sequences (Additional file 2) for influenza read identification and serotyping.
WTA for 8 h, with or without BD, remarkably decreased the influenza ratio (0.05 % or almost 0). For samples with an expected viral proportion of 1.5 %, we observed comparable influenza ratios of about 0.57 % for the no-pretreatment and BD + 2-h WTA pretreatment. As BD increased the influenza ratio while WTA decreased it, we hypothesized that there was a trade-off for viral detection between BD and WTA, and that the effects were in equilibrium when WTA was 2 h.
Next, we examined the effects of different pretreatments on influenza A viral serotyping. Most influenza reads with these pretreatments were aligned to segments from the H1N1 serotype (Fig. 1c and d). Reads aligned to other serotypes could be explained by interstrain sequence homology. However, read distributions on eight RNA segments were also biased by the four treatments (Fig. 1c and d). Although BD could increase influenza ratios, this benefit came at the cost of biased distributions compared to the distribution of sample without pretreatment. WTA further exaggerated the bias among different segments. When we focused on HA/NA segments, except for the BD + 8-h WTA pretreatment which produced almost no influenza reads, pretreatments consistently produced remarkable enrichments of H1N1-aligned reads (Additional file 5: Figure S3). This enrichment was observed even for the 8-h WTA pretreatment (without BD), despite that this pretreatment remarkably reduced influenza ratios and caused biased segment distribution.
Genome recovery efficiency
With an optimized bioinformatics pipeline, influenza-aligned reads were de novo assembled, and assembly contigs were re-aligned to the whole flu genome sequences. The reference genome of H1N1 strain A/Changchun/01/2009(H1N1) was aligned with the highest sequence similarity, with eight single nucleotide variations identified and validated by Sanger sequencing (Additional file 6: Table S2, Additional file 7: Table S3). The whole H1N1 genomes were nearly recovered for all pretreatments except BD + 8-h WTA as the best alignments were all assigned to H1N1. Thus, at both the NGS read and assembly levels, pretreatments did not affect accurate serotyping under conditions that produced sufficient influenza reads.
Genome de novo assembly
Contig total size (bp)b
Genome coverage (%)
H1 + N1 coverage (%)
Contig total size (bp)b
Genome coverage (%)
H1 + N1 coverage (%)
Contig total size (bp)b
Genome coverage (%)
H1 + N1 coverage (%)
BD (0.55 %)a
No pretreatment (0.55 %)a
BD + 8-h WTA (0.55 %)a
8-h WTA (0.55 %)a
BD + 2-h WTA (1.50 %)a
No pretreatment (1.50 %)a
Next, we gradiently and randomly resampled the influenza-aligned reads, and examined the variations of assembly sizes with read number (Fig. 1f). As the read number increased, the samples without pretreatments showed more rapid growth of H1N1 genome coverage than samples with BD and/or WTA pretreatments. About 400 reads could produce an 80 % recovery. About 2000 reads were required for BD treatment. Thus, although BD allowed a higher influenza-aligned read ratio, this benefit was offset by decreased assembly efficiency. Pretreatment with WTA (with or without BD) also obviously reduced the H1N1 genome recovery rate.
Taken together, direct sequencing of extracted RNA (no pretreatment) provided the best efficacy in recovering H1N1 genomes. Under clinical conditions, the amount of recovered RNA after host removal (without amplification) could be insufficient for NGS library preparation. Moreover, host BD induced bias of NGS read alignment over the viral genome, and thus affected the assembly. On the other hand, WTA increased the total available cDNA but reduced the viral ratio, resulting in reduced sensitivity to detect viral reads, especially for overamplification (8-h WTA) which significantly depleted the viral fraction. Direct sequencing method does not require extra preprocessing steps compared to BD, WTA and many other methods available [22–33], which means fewer experimental procedures, decreased cost, lower technical error rates, and decreased turnaround time. Thus, we propose that direct sequencing without pretreatment is sometimes the optimal solution. These findings will provide input for further studies and clinical implementation.
All experiments were approved by the Animal Ethics Committee of the Beijing Institute of Radiation Medicine, in accordance with the regulations of Beijing Administration Office of Laboratory Animals and no patient was involved in the study. Total human RNA was extracted from alveolar adenocarcinoma A549 cells with Invitrogen Trizol Reagent (Life Technologies) and quantified by Qubit 2.0 (Life Technologies). Influenza A virus  (A/Changchun/01/2009(H1N1), 13,632 bp) RNA was isolated with the QIAamp Viral RNA Mini Kit (Qiagen) and quantified by quantitative real time PCR (qRT-PCR) with the ABI 7500 PCR system (Applied Biosystems, Inc.) after reverse transcription. Host RNA background depletion (BD) was performed by using an rRNA-hybridization magnetic bead method with the RiboMinus Eukaryote Kit for RNA-Seq (Ribominus Concentration Module, Life Technologies), and further using magnetic beads conjugated to oligo(dT) primers (Illumina) to remove poly(A) tailed transcripts. WTA was performed by using QuantiTect Whole Transcriptome Kit (Qiagen). For samples not requiring amplification, the first and second strand cDNA were generated by using High-Capacity cDNA Reverse Transcription Kits (Applied Biosystems) and the NEBNext mRNA Second-Strand Synthesis Module (New England Biosystems). After purification by the Zymo Purification Kit (Zymo Research), double-stranded DNA (dsDNA) was quantified by Qubit 2.0. DNA inputs of 1 ng were used for multiplex NGS library generation with the Nextera XT DNA Sample Preparation Kit (Illumina). NGS was performed with an Illumina MiSeq platform to generate 150 or 250-bp pair-end reads. All high-quality sequence reads data have been submitted to the NCBI Sequence Read Archive (accession number SRP059219). Raw NGS reads were filtered with quality cutoffs of at least 50 % read bases with quality of Q20 or better, fewer than 10 % N bases, and fewer than 14 continuous N bases. Reads were firstly mapped to the human genome (hg19) and the unaligned reads were then aligned to a dataset including reference genomes of Mycoplasma (313 sequences, NCBI genome database), bacterial (3022 sequences, NCBI genome database), flu (246,715 sequences, EpiFlu, http://platform.gisaid.org and NCBI Nucleotide database, Additional file 2), other viral (1,757,357 sequences, NCBI genome database), and the whole NCBI nucleotide (nt) database by using Bowtie2  (v2.1.0) in the end-to-end, paired-end mode and BLASTn . Metagenomics analysis was carried out by using PathSeq  pipeline and Kraken . De novo assembly was carried out by using Trinity, IDBA-UD  and Velvet (v1.2.10) . Particularly for Velvet and IDBA-UD assembling, k-mer lengths were scanned from 9 to 123, and optimal lengths with the largest N50 length were selected. Assembly contigs were aligned to reference segments by using Blastn with a required E-value of less than 10−5. With the median site sequencing depth (denoted as D) for a sample as a baseline, the region with sequencing depth between 50–150 % D, < 50 % D and > 150 % D were defined as uniform, missed and over-amplified regions, respectively. Nucleotide motif discovery was performed by using MEME Suite 4.10.2  and FIMO (E-value < 10−4)  on missed and over-amplified regions for each sample with pretreatment. Influenza-aligned reads were randomly sampled at a step size of 100 or 1000 and then assembled by Velvet; the sampling was repeated 10 times.
Reviewer’s report: Sebastian Maurer-Stroh (Bioinformatics Institute, A*STAR, Singapore)
The big question to me is: if the ratio of influenza reads even after background RNA depletion is so small (<2 %), where are all the other reads from? Incompletely removed host RNA or Bacteria and their phages? Sending these reads through a metagenomics pipeline (e.g. Kraken) may be an interesting idea to follow this up, possibly in future (a word of caution: viral metagenomics remains a challenging task, by own experience, different methods can find different viruses in supposedly single virus samples).
From the viro-biological point of view, A549 cells, although commonly used to study influenza virus host interactions, are not the best cells to get high viral titres for example compared to MDCK cells but this is not a problem for this study where a challenging setup is anyways appreciated.
From the Bioinformatics software view, Trinity and Velvet for assembly may not be ideal depending on the k-mer length relative to the gap size. I would also try IDBA-UD which simultaneously uses long and short k-mer lengths but in this case there may not be much difference in the conclusions.
Another follow-up or extension of this work would be to statistically analyse both missed and overamplified nucleotide motifs with the different approaches to potentially get ideas how to unbias pretreatment methods better in future.
Author’s response: Thanks for the comment. First, we employed the concept of uniformity to determine missed and regions over-amplified. In detail, with the median site sequencing depth (denoted as D) for a sample as a baseline, we selected the region with sequencing depth between 50–150 % D as uniform region, whose ratio in genome was the uniformity. The missed and over-amplified regions were defined with site depth < 50 % D and > 150 % D, respectively. It should be addressed that we also examined uniformity with other thresholds (i.e., 40–160 % D or 80–120 % D), and the samples without pretreatment consistently had the highest uniformity compared these with BD and/or WTA (data not shown). Then, by using the MEME Suite 4.10.2 , we performed calculation of nucleotide motif discovery respectively on missed and over-amplified regions for each sample with pretreatment. The discovered motifs were re-aligned to the H1N1 genome by FIMO  (E-value < 10−4), and their occurrences on the whole genome and missed/over-amplified regions were both obtained. Finally, we selected 10 motifs significantly enriched in missed or over-amplified region (Fisher’s exact test, p < 0.05) in the three samples with BD and/or WTA pretreatment, which are shown in Additional file 9. We hope this result could be a hint to improve pretreatment methods in the future.
Many thanks for responding to my comments in detail and adding several further analyses that were needed to interpret the results better. However, with more results available it is now clear that there is a big problem which may be challenging to be resolved. While checking some of the results for the missed motifs after background depletion in new Additional file 6: Table S2 I noticed that the identified sequence motifs appear to match to A/California/07/2009(H1N1) [the H1N1 from the 2009 swine flu pandemic] rather than A/FM/1/47(H1N1) [an old reference H1N1 strain from 1947] which was mentioned to have been used in the method section. As you should know, there are several very different H1N1 strains known. Adding to the confusion, the associated SRA accession at NCBI is annotated taxonomically suggesting the virus is a mouse-adapted version “Influenza A virus (A/Fort Monmouth/1/1947-mouse adapted(H1N1))” for which no complete reference genome exists in the databases (only some segments). To get a clearer picture, I downloaded and reanalyzed your raw data (assembly and metagenomics for the 1.5 % no treatment SRR2054788 and 1.5 % double treated SRR2054787 sample, respectively). The influenza virus in your samples is in fact a recent H1N1 pdm09 virus (it is most similar to A/Changchun/01/2009(H1N1)), so your method description and the taxonomy annotation submitted to NCBI is wrong. Consequently, the coverage results (Table 1, Fig. 1e) etc require to use a matching genome to be accurate (and all database accessions of used references need to be properly listed). Furthermore, metagenomics analysis suggests a clear contamination with Mycoplasma for both reanalyzed samples which makes up the majority of non-host reads (metagenomics was checked with consensus from gottcha, mini-kraken, metaphlan and bwa readmapping to make sure it is not a spurious result, curious that your analysis with PathSeq did not pick this up). It may have to be established on clean cells that the effects with and without treatment are not influenced by the dominance of Mycoplasma reads or fully characterize its presence and include and discuss it as additional factor inherent to the existing data and analysis. Obviously, with the wrong strains mentioned, potentially wrong references used for analysis and serious undeclared cell contamination this work is not up to any scientific standards for publication. Nevertheless, the basic idea of the work is still good and the principal conclusions may not be much affected after all but it is of critical importance to provide accurate descriptions of the experiments to ensure correctness and reproducability of the results.
Author’s response: Thank you for your reviewing our manuscript again. We are very grateful that you pointed out the mistakes we failed to notice. According to your comments, we have checked and confirmed the virus strain (A/Changchun/01/2009(H1N1)) by using Sanger sequencing. We have re-performed all calculations in this study with updated reference datasets, and updated the corresponding results. Your question on mycoplasma contamination is important. We have actually found it in our samples through PathSeq analysis, but did not pay enough attention and categorized it as component of “others”. We apologize for this inappropriate opinion, and have carefully analyze the presence of the mycoplasma. We have added descriptions in manuscript and additional files to fully characterize the presence of mycoplasma. The results based on the new calculations and analyses show that the principal conclusions of this study remain unaffected. Please see the detailed report and also review the revised manuscript.
We made a mistake about the information of H1N1 strains used in this study, and we are very grateful that the reviewer pointed it out. The strain has been confirmed to be A/Changchun/01/2009(H1N1) rather than A/FM/1/47(H1N1). We have designed PCR primers (Additional file 6: Table S2) and sequenced the full genome of the strain we used by Sanger sequencing. The sequences obtained were consistent with the assembly based on NGS results, and we aligned them to the reference genome of strain A/Changchun/01/2009(H1N1) (accession No. JN032403—JN032410, NCBI Nucleotide database) and identified eight single nucleotide variations (Additional file 7: Table S3).
We have corrected the taxonomy annotation of sequencing data submitted to the NCBI SRA, and re-performed the whole computation of this study. In details, as the influenza reference dataset downloaded from EpiFlu does not contain the strain A/Changchun/01/2009(H1N1), we first updated the reference dataset with 118,955 more sequences from NCBI Nucleotide database (Additional file 2). Then, we removed human-aligned reads, and aligned the rest NGS reads to the new dataset of references. Based on influenza-aligned reads we re-performed serotype and statistical analyses as well as de novo assembling, and we found that the results were nearly unchanged, and conclusions were consistent with those in previous version of manuscript. The assemblies were also aligned to the new reference dataset, and we found that the reference genome of highest similarity was from the strain A/Changchun/01/2009(H1N1) (The reference we used in previous version of manuscript is A/New York/NHRC0003/2009(H1N1), which has 34 single nucleotide mismatches with reference of strain A/Changchun/01/2009(H1N1)). Finally, with the reference genome of A/Changchun/01/2009(H1N1), we evaluated the assembly statistics such as coverage and sequencing evenness again, and the results also remained nearly the same. Moreover, in theory, with enough sequencing depth and sufficient reference datasets which contained highly homologous sequences from other strains, bioinformatics analyses and results would not depend on the reference genome. Therefore, we suggest that the wrong strain information and reference genome might not affect the conclusion of this study.
We present here the investigation of how we mistook the strain information. Our laboratory had both of the strains while we performed the experiment. We received RNA sample extracted from strain A/Changchun/01/2009(H1N1) from our colleagues, but we were informed of the wrong strain name, A/FM/1/47(H1N1). Unfortunately, the flu reference genome dataset we used (EpiFlu, Additional file 2) did not include genome sequences of A/Changchun/01/2009(H1N1) (which is available in NCBI Nucleotide Database). The most similar strain when we aligned the assembly to reference dataset was A/New York/NHRC0003/2009(H1N1) (genome similarity: 99.5 to 99.9 % for each segment), and we used it as a reference to evaluate viral genome recovery. While we focused on the efficiency of genome recovery, we did not notice that the used reference genome was not from the alleged A/FM/1/47(H1N1). We apologize for the fault we have made, and thank the reviewer again for pointing it out.
We have re-analyzed the missed and over-amplified nucleotide motifs based on the correct reference genome of A/Changchun/01/2009(H1N1). Compared with the previous result (based on strain A/New York/NHRC0003/2009(H1N1)), the identified motifs exhibited some differences, while three motifs presented in new Additional file 9: Table S5 were the same as the previous. The relevant description in text and Additional file 9 have been revised.
We agree to the existence of mycoplasma contamination. Actually, we have observed the content of mycoplasma in the analysis by using PathSeq, but we assigned the mycoplasma to the “others” category (Additional file 3: Figure S2, previous version of revised manuscript). At that time, we thought that mycoplasma was commonly found in cultured cell lines and might need not to be specially addressed, as the main focus of this study are viral pathogens. We admit that it was an inappropriate opinion, and we should fully characterize the presence of mycoplasma as the reviewer suggested. We have revised Additional file 3: Figure S2 to exhibit the detailed distributions of species based NGS read alignments in this study. Especially, ratios of mycoplasma-aligned reads are shown in new Additional file 4. The rations were obtained by both aligning NGS reads (after removal of the host-aligned reads) to a dataset composed of 313 mycoplasma genome sequences and metagenomics analyses (Additional file 2). Among these samples, mycoplasma-aligned reads in total reads account from 0.14 to 1.8 % (average 0.97 %), except the sample of BD + 2 h-WTA (1.5 %) whose mycoplasma ratio achieved 5.05 %. We speculate that the high Mycoplasma-aligned ratio could be mainly ascribed to the pre-treatments.
A recent paper by Anthony O. Olarerin-George and John B. Hogenesch reported a large scale analysis of RNA-seq data from 9395 rodent and primate samples from 884 series, and found 11 % of the series with cultured samples were contaminated by mycoplasma (Assessing the prevalence of mycoplasma contamination in cell culture via a survey of NCBI’s RNA-seq archive, Nucleic Acids Research, 2015, 43, 2535). The contamination ratios are ranged from 0.01 to 14.43 % (mean = 1.44 %, median = 2.15 %), while the top 20 series with the highest mycoplasma reads ratio include top peer-reviewed journals such as Nature, Cell, PNAS, Genome Research, RNA and Nucleic Acids Research. Another important result of their investigation is an identification of 61 host genes significantly associated with mycoplasma-mapped read counts. In our study, we build model samples by mixing RNA from human cell lines and H1N1 strain, and focus on viral genome recovery. Therefore, instead of gene expression, we care more about valid extraction of viral RNA and the reads count occupied by other microorganism such as mycoplasma. According to the result, we obtained valid H1N1-aligned reads in five samples, and the variations in H1N1 ratio could be mostly attributed to different pre-treatments rather than mycoplasma contamination. On the other hand, compared with our mixture model, clinical specimens such as serum and oral swabs would be more complex due to much greater heterogeneity of genomes in total RNA/DNA. Mycoplasma is also prevalent in clinical samples, and the capability to identify viral pathogen in mycoplasma or other microorganisms contained samples by NGS is necessary.
To sum up, we have fully characterized the presence of mycoplasma in our samples in the revised manuscript (Finding sections, highlighted in yellow, Additional files 2, 3 and 4), and we suggest that the contamination of mycoplasma would not affect the genome recovery of the viral genome.
Whole transcriptome amplification
Coefficient of variation
This work was supported by Major Research plan of the National Natural Science Foundation of China (Grant No. U1435222), China Mega-Project on Major Drug Development (No. 2013ZX09304101), and Program of International S & T Cooperation (No. 2014DFB30020).
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Yang X, Charlebois P, Gnerre S, Coole MG, Lennon NJ, Levin JZ, et al. De novo assembly of highly diverse viral populations. BMC Genomics. 2012;13:475. doi:10.1186/1471-2164-13-475.PubMedPubMed CentralView ArticleGoogle Scholar
- Kupferschmidt K. Epidemiology. Outbreak detectives embrace the genome era. Science. 2011;333(6051):1818–9. doi:10.1126/science.333.6051.1818.PubMedView ArticleGoogle Scholar
- Jin DZ, Wen SY, Chen SH, Lin F, Wang SQ. Detection and identification of intestinal pathogens in clinical specimens using DNA microarrays. Mol Cell Probes. 2006;20(6):337–47. doi:10.1016/j.mcp.2006.03.005.PubMedView ArticleGoogle Scholar
- Kapgate SS, Barbuddhe SB, Kumanan K. Next generation sequencing technologies: tool to study avian virus diversity. Acta Virol. 2015;59(1):3–13.PubMedView ArticleGoogle Scholar
- Pallen MJ, Loman NJ, Penn CW. High-throughput sequencing and clinical microbiology: progress, opportunities and challenges. Curr Opin Microbiol. 2010;13(5):625–31. doi:10.1016/j.mib.2010.08.003.PubMedView ArticleGoogle Scholar
- Gilchrist CA, Turner SD, Riley MF, Petri Jr WA, Hewlett EL. Whole-Genome Sequencing in Outbreak Analysis. Clin Microbiol Rev. 2015;28(3):541–63. doi:10.1128/CMR.00075-13.PubMedView ArticleGoogle Scholar
- Bexfield N, Kellam P. Metagenomics and the molecular identification of novel viruses. Vet J. 2011;190(2):191–8. doi:10.1016/j.tvjl.2010.10.014.PubMedView ArticleGoogle Scholar
- Su Z, Ning B, Fang H, Hong H, Perkins R, Tong W, et al. Next-generation sequencing and its applications in molecular diagnostics. Expert Rev Mol Diagn. 2011;11(3):333–43. doi:10.1586/erm.11.3.PubMedGoogle Scholar
- Radford AD, Chapman D, Dixon L, Chantrey J, Darby AC, Hall N. Application of next-generation sequencing technologies in virology. J Gen Virol. 2012;93(Pt 9):1853–68. doi:10.1099/vir.0.043182-0.PubMedPubMed CentralView ArticleGoogle Scholar
- Barzon L, Lavezzo E, Militello V, Toppo S, Palu G. Applications of next-generation sequencing technologies to diagnostic virology. Int J Mol Sci. 2011;12(11):7861–84. doi:10.3390/ijms12117861.PubMedPubMed CentralView ArticleGoogle Scholar
- Capobianchi MR, Giombini E, Rozera G. Next-generation sequencing technology in clinical virology. Clin Microbiol Infect. 2013;19(1):15–22. doi:10.1111/1469-0691.12056.PubMedView ArticleGoogle Scholar
- Barzon L, Lavezzo E, Costanzi G, Franchin E, Toppo S, Palu G. Next-generation sequencing technologies in diagnostic virology. J Clin Virol. 2013;58(2):346–50. doi:10.1016/j.jcv.2013.03.003.PubMedView ArticleGoogle Scholar
- Gire SK, Goba A, Andersen KG, Sealfon RS, Park DJ, Kanneh L, et al. Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak. Science. 2014;345(6202):1369–72. doi:10.1126/science.1259657.PubMedPubMed CentralView ArticleGoogle Scholar
- Baize S, Pannetier D, Oestereich L, Rieger T, Koivogui L, Magassouba N, et al. Emergence of Zaire Ebola virus disease in Guinea. N Engl J Med. 2014;371(15):1418–25. doi:10.1056/NEJMoa1404505.PubMedView ArticleGoogle Scholar
- Maganga GD, Kapetshi J, Berthet N, Kebela Ilunga B, Kabange F, Mbala Kingebeni P, et al. Ebola virus disease in the Democratic Republic of Congo. N Engl J Med. 2014;371(22):2083–91. doi:10.1056/NEJMoa1411099.PubMedView ArticleGoogle Scholar
- Meyers L, Frawley T, Goss S, Kang C. Ebola virus outbreak 2014: clinical review for emergency physicians. Ann Emerg Med. 2015;65(1):101–8. doi:10.1016/j.annemergmed.2014.10.009.PubMedView ArticleGoogle Scholar
- Palacios G, Druce J, Du L, Tran T, Birch C, Briese T, et al. A new arenavirus in a cluster of fatal transplant-associated diseases. N Engl J Med. 2008;358(10):991–8. doi:10.1056/NEJMoa073785.PubMedView ArticleGoogle Scholar
- Nakamura S, Yang CS, Sakon N, Ueda M, Tougan T, Yamashita A, et al. Direct metagenomic detection of viral pathogens in nasal and fecal specimens using an unbiased high-throughput sequencing approach. PLoS One. 2009;4(1):e4219. doi:10.1371/journal.pone.0004219.PubMedPubMed CentralView ArticleGoogle Scholar
- Yozwiak NL, Skewes-Cox P, Stenglein MD, Balmaseda A, Harris E, DeRisi JL. Virus identification in unknown tropical febrile illness cases using deep sequencing. PLoS Negl Trop Dis. 2012;6(2):e1485. doi:10.1371/journal.pntd.0001485.PubMedPubMed CentralView ArticleGoogle Scholar
- McMullan LK, Frace M, Sammons SA, Shoemaker T, Balinandi S, Wamala JF, et al. Using next generation sequencing to identify yellow fever virus in Uganda. Virology. 2012;422(1):1–5. doi:10.1016/j.virol.2011.08.024.PubMedView ArticleGoogle Scholar
- Loman NJ, Constantinidou C, Christner M, Rohde H, Chan JZ, Quick J, et al. A culture-independent sequence-based metagenomics approach to the investigation of an outbreak of Shiga-toxigenic Escherichia coli O104:H4. JAMA. 2013;309(14):1502–10. doi:10.1001/jama.2013.3231.PubMedView ArticleGoogle Scholar
- Morlan JD, Qu K, Sinicropi DV. Selective depletion of rRNA enables whole transcriptome profiling of archival fixed tissue. PLoS One. 2012;7(8):e42882. doi:10.1371/journal.pone.0042882.PubMedPubMed CentralView ArticleGoogle Scholar
- Matranga CB, Andersen KG, Winnicki S, Busby M, Gladden AD, Tewhey R, et al. Enhanced methods for unbiased deep sequencing of Lassa and Ebola RNA viruses from clinical and biological samples. Genome Biol. 2014;15(11):519. doi:10.1186/PREACCEPT-1698056557139770.PubMedPubMed CentralView ArticleGoogle Scholar
- Ma Z, Lee RW, Li B, Kenney P, Wang Y, Erikson J, et al. Isothermal amplification method for next-generation sequencing. Proc Natl Acad Sci U S A. 2013;110(35):14320–3. doi:10.1073/pnas.1311334110.PubMedPubMed CentralView ArticleGoogle Scholar
- Hoeijmakers WA, Bartfai R, Francoijs KJ, Stunnenberg HG. Linear amplification for deep sequencing. Nat Protoc. 2011;6(7):1026–36. doi:10.1038/nprot.2011.345.PubMedView ArticleGoogle Scholar
- Adiconis X, Borges-Rivera D, Satija R, DeLuca DS, Busby MA, Berlin AM, et al. Comparative analysis of RNA sequencing methods for degraded or low-input samples. Nat Methods. 2013;10(7):623–9. doi:10.1038/nmeth.2483.PubMedView ArticleGoogle Scholar
- Dabney J, Meyer M. Length and GC-biases during sequencing library amplification: a comparison of various polymerase-buffer systems with ancient and modern DNA sequencing libraries. Biotechniques. 2012;52(2):87–94. doi:10.2144/000113809.PubMedView ArticleGoogle Scholar
- Malboeuf CM, Yang X, Charlebois P, Qu J, Berlin AM, Casali M, et al. Complete viral RNA genome sequencing of ultra-low copy samples by sequence-independent amplification. Nucleic Acids Res. 2013;41(1):e13. doi:10.1093/nar/gks794.PubMedPubMed CentralView ArticleGoogle Scholar
- Pan X, Durrett RE, Zhu H, Tanaka Y, Li Y, Zi X, et al. Two methods for full-length RNA sequencing for low quantities of cells and single cells. Proc Natl Acad Sci U S A. 2013;110(2):594–9. doi:10.1073/pnas.1217322109.PubMedPubMed CentralView ArticleGoogle Scholar
- Batty EM, Wong TH, Trebes A, Argoud K, Attar M, Buck D, et al. A modified RNA-Seq approach for whole genome sequencing of RNA viruses from faecal and blood samples. PLoS One. 2013;8(6):e66129. doi:10.1371/journal.pone.0066129.PubMedPubMed CentralView ArticleGoogle Scholar
- Kozarewa I, Ning Z, Quail MA, Sanders MJ, Berriman M, Turner DJ. Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of (G + C)-biased genomes. Nat Methods. 2009;6(4):291–5. doi:10.1038/nmeth.1311.PubMedPubMed CentralView ArticleGoogle Scholar
- Oyola SO, Otto TD, Gu Y, Maslen G, Manske M, Campino S, et al. Optimizing Illumina next-generation sequencing library preparation for extremely AT-biased genomes. BMC Genomics. 2012;13:1. doi:10.1186/1471-2164-13-1.PubMedPubMed CentralView ArticleGoogle Scholar
- Kozarewa I, Turner DJ. Amplification-free library preparation for paired-end Illumina sequencing. Methods Mol Biol. 2011;733:257–66. doi:10.1007/978-1-61779-089-8_18.PubMedView ArticleGoogle Scholar
- Yu Z, Cheng K, Sun W, Zhang X, Li Y, Wang T, et al. A PB1 T296R substitution enhance polymerase activity and confer a virulent phenotype to a 2009 pandemic H1N1 influenza virus in mice. Virology. 2015;486:180–6. doi:10.1016/j.virol.2015.09.014.PubMedView ArticleGoogle Scholar
- Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9(4):357–9. doi:10.1038/nmeth.1923.PubMedPubMed CentralView ArticleGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–10. doi:10.1016/S0022-2836(05)80360-2.PubMedView ArticleGoogle Scholar
- Kostic AD, Ojesina AI, Pedamallu CS, Jung J, Verhaak RG, Getz G, et al. PathSeq: software to identify or discover microbes by deep sequencing of human tissue. Nat Biotechnol. 2011;29(5):393–6. doi:10.1038/nbt.1868.PubMedPubMed CentralView ArticleGoogle Scholar
- Wood DE, Salzberg SL. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 2014;15(3):R46. doi:10.1186/gb-2014-15-3-r46.PubMedPubMed CentralView ArticleGoogle Scholar
- Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011;29(7):644–52. doi:10.1038/nbt.1883.PubMedPubMed CentralView ArticleGoogle Scholar
- Peng Y, Leung HC, Yiu SM, Chin FY. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics. 2012;28(11):1420–8. doi:10.1093/bioinformatics/bts174.PubMedView ArticleGoogle Scholar
- Zerbino DR, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008;18(5):821–9. doi:10.1101/gr.074492.107.PubMedPubMed CentralView ArticleGoogle Scholar
- Bailey TL, Boden M, Buske FA, Frith M, Grant CE, Clementi L, et al. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 2009;37(Web Server issue):W202–8. doi:10.1093/nar/gkp335.PubMedPubMed CentralView ArticleGoogle Scholar
- Grant CE, Bailey TL, Noble WS. FIMO: scanning for occurrences of a given motif. Bioinformatics. 2011;27(7):1017–8. doi:10.1093/bioinformatics/btr064.PubMedPubMed CentralView ArticleGoogle Scholar