Skip to main content
Figure 2 | Biology Direct

Figure 2

From: A new ensemble coevolution system for detecting HIV-1 protein coevolution

Figure 2

Schematic view of ensemble coevolution system. (A) Workflow of coevolution prediction. Input data: a multiple sequence alignment dataset Dj and one phylogenetic tree constructed using Dj. (1) Preprocessing of input datasets. The method-specific input formats are preprocessed and imported into individual sequence-based methods M i (i = 1, …, 27). (2) Execution of sequence-based methods. Given the sequence-based method Mi and the sequence dataset Dj, coevolution scores of coevolving positions are normalized and exported into the matrix C*(Mi, Dj). In addition, normalized co-evolution scores are ranked. (3) Combiner. Given a chosen combination of sequence-based methods, coevolution scores of predicted coevolving positions are assembled through the combiner, which provides the assemble strategies such as majority voting, Borda count and weighted voting. (B) Workflow of our procedures that optimize the combination of sequence-based methods. Input data: inputs of multiple MSAs are processed by sequence-based methods (see (A)). The validation datasets (e.g. experimental and clinical data) are also prepared for the method evaluation. Coevolution scores of ranked coevolving pairs in C(Mi, Dj) are collected after applying the sequence-based method Mi to the sequence dataset Dj. (1) Linear transformation. Coevolution scores are linearly transformed between 0 and 1. (2) Ensemble learning. A heuristic algorithm identifies the combination of sequence-based methods with improved prediction performance (Additional file 1: Text S1). Each circle represents a single method and the combination of different methods is demonstrated in a group of colored circles. Using the validation datasets, prediction performance is evaluated (e.g. AUC) for the ranked statistical couplings assembled from the corresponding method combination. When adding a new method will not improve the prediction performance, the learning procedure stops and an optimized method combination is identified. Using the identified method combination, coevolving pairs are predicted as in (A) and returned as outputs.

Back to article page