Approach based on multiple sequence alignment | Approach based on alignment-free methods |
---|---|
Assumes contiguity (with gaps) of homologous regions | Does not assume contiguity of homologous regions |
Based on all possible pairwise comparisons of whole sequences; computationally expensive | Based on occurrences of sub-sequences; computationally inexpensive, can be memory-intensive |
Well-established and well-studied approach in phylogenomics | Application in phylogenomics limited; requires further testing for robustness and scalability |
More dependent on substitution/evolutionary models | Less dependent on substitution/evolutionary models |
More sensitive to stochastic sequence variation, recombination, lateral genetic transfer, rate heterogeneity and sequences of varied lengths, especially when similarity lies in the “twilight zone” | Less sensitive to stochastic sequence variation, recombination, lateral genetic transfer, rate heterogeneity and sequences of varied lengths |
Best practice uses inference algorithms with complexity at least O(n2); less time-efficient | Inference algorithms typically O(n2) or less; more time-efficient |
Heuristic solutions; statistical significance of how alignment scores relate to homology is difficult to assess | Exact solutions; statistical significance of the sequence distances (and degree of similarity) can be readily assessed |