Schematic diagram of procedures used to "bin" the classes of metagenomic data. The query genome, in this case HTCC1062, is represented on the x-axis. A) TBLASTN of protein sequences from the query genome against metagenomic data. "Homologous fragments" were defined as fragments of metagenomic data with expect scores of 1 × 10-10 or better to genes from the query genome. B) "Homlogous fragments with synteny" contain homologs in the same gene order as the query genome, with as many as 5 gene gaps (gene deletions) allowed. C) Best-hit test. Fragments of metagenomic data pass the test if the nucleotide sequence of the fragment gene yields the corresponding query gene as the best hit in a BLASTX search of the NCBI nr database. D) The position of the fragment on the vertical axis corresponds to the average amino acid identity score of all the genes on the fragment.