Abstract
A simulation model of phylogeny, called GENESIS, was developed to evaluate and to estimate the qualities of various numerical taxonomic procedures. The model produces sets of imaginary species with known character state distributions and with known phylogenies. The model can be made to produce these species and their phylogenies under different evolutionary conditions.
Within GENESIS, there are two mathematical models that describe the diversification of the number of taxa. The number of taxa increases exponentially (the 'radiation' option), or according to a logistic curve (the 'equilibrium' option). As far as character evolution is concerned, GENESIS allows for two options; in the 'gradualistic' version character state changes occur in equal rates in the two daughter lineages after a speciation event; in the 'punctualistic' version these rates can be made to differ. Combining these options, GENESIS basically offers four evolutionary scenario's. The exact evolutionary conditions within each of these scenario's can be controlled by the user who must specify the values of a number of input parameters. GENESIS produces species and their phylogenies in the form of character data sets and corresponding true trees. The output is characterized by a number of tree statistics. Within each of the main evolutionary scenario's experiments were carried out, in which some input parameters were subjected to change while the others were kept constant. For the precise experimental design one is referred to the relevant paragraphs in chapters 4 - 6.
A number of cladistic and phenetic tree making methods was evaluated. The PAUP, PHYLIP, Wagner78 and Hennig86 programs were used to produce most parsimonious trees. Group-compatibility was performed with CAFCA. Four UPGMA algorithms were used to construct phenograms: UPGMA using product squared euclidian distances of unstandardized characters (UPGMA-1); UPGMA using squared euclidian distances of unstandardized characters (UPGMA-2); UPGMA using product moment correlations of, unstandardized characters (UPGMA-3) and UPGMA using product moment correlations of standardized characters (UPGMA-4).
Experiments using the most simple evolutionary scenario (combining the 'radiation' and 'gradualistic' options) showed that overall differences in accuracy were small between Wagner parsimony (PAUP, PHYLIP, Hennig86) and UPGMA-3. Parsimony with Wagner78, UPGMA-1, UPGMA-2, UPGMA-4 and especially compatibility analysis with CAFCA were shown to be inferior to these methods. The efficiency of various methods to recover the true tree, viz. Wagner78, PAUP, PHYLIP and CAFCA, depended on several tree properties, the consistency indices of both the true tree and the estimated tree being the most important ones.
When more complicated evolutionary scenario's are considered, simulation experiments showed that UPGMA based on product moment correlations of unstandardized characters, clearly produced better results than the other phenetic or cladistic methods (Wagner78, Hennig86 and CAFCA). The efficiency now appeared to be affected most importantly by the stemminess of the true tree.
A large number of phenetic procedures together with parsimony analysis, as performed with Hennig86 and Wagner78, were evaluated under a great variety of evolutionary conditions. McQuitty's similarity analysis and the average linkage method, both based on cosine- or product moment correlations of unstandardized characters, were found to perform consistently better that maximum parsimony and the other phenetic procedures.
The average accuracy of UPGMA-3, over all experiments described in chapters 4 and 5, and as measured by the consensus fork index ( CFI ) was 0.76. Hennig86, PHYLIP (MIX) and PAUP produced similar results with an average CFI of 0.68. In chapter 6, the four superior phenetic methods (McQuitty's similarity analysis and the average linkage method, both based on cosine- or product moment correlations of unstandardized characters) had an average accuracy of 0.74. In these experiments accuracy of maximum parsimony as performed by Hennig86 was at 0.64. Also other authors generally observed equally low or even lower accuracy values (chapter 6).
Stemminess and congruence of the characters with the tree as measured by the consistency index of the true tree, were found to be correlated with accuracy ( CFI ) in some experiments. Although, as also other authors pointed out, there may be a good correlation between an index and accuracy, there is the problem that the true tree must be known in order to compute the index. Therefore these indices cannot really be used as estimators of accuracy. Nevertheless they can serve to indicate the major determinants of accuracy. The consistency index of the estimated tree can be calculated in practice. Therefore the CI of the estimated tree might be used as a predictor of accuracy, though not a very reliable one. Estimated trees with low CI values, say less than 0.7, are probably not good estimates of the true tree.
In the present study, overall low values of accuracy were obtained. This is in agreement with the findings of a number of other authors. All simulations in the this study were run to produce 20 'species'. If we use a cutoff point of CFI = 0.833, where 13 of the 18 subgroups would be correctly obtained, than it can safely be assumed that most published phylogenetic estimations are likely to be quite inaccurate. Therefore 1 support the view of authors that it is inappropriate to refer to phylogeny estimation methods as methods for phylogeny reconstruction.
Original language | English |
---|---|
Qualification | Doctor of Philosophy |
Awarding Institution | |
Supervisors/Advisors |
|
Award date | 27 Sept 1995 |
Place of Publication | Wageningen |
Publisher | |
Print ISBNs | 9789054854227 |
DOIs | |
Publication status | Published - 27 Sept 1995 |
Keywords
- phylogeny
- origin
- species
- taxa
- phylogenetics
- computer simulation
- simulation
- simulation models
- taxonomy
- classification
- biological nomenclature
- numerical methods