TY - JOUR
T1 - Collision probabilities for AFLP bands, with an application to simple measures of genetic similarity
AU - Gort, G.
AU - Koopman, W.J.M.
AU - Stein, A.
AU - van Eeuwijk, F.A.
PY - 2008
Y1 - 2008
N2 - AFLP is a frequently used DNA fingerprinting technique that is popular in the plant sciences. A problem encountered in the interpretation and comparison of individual plant profiles, consisting of band presence-absence patterns, is that multiple DNA fragments of the same length can be generated that eventually show up as single bands on a gel. The phenomenon of two or more fragments coinciding in a band within an individual profile is a type of homoplasy, that we call collision. Homoplasy biases estimates of genetic similarity. In this study, we show how to calculate collision probabilities for bands as a function of band length, given the fragment count, the band count, or band lengths. We also determine probabilities of higher order collisions, and estimate the total number of collisions for a profile. Since short fragments occur more often, short bands are more likely to contain collisions. For a typical plant genome and AFLP procedure, the collision probability for the shortest band is 25 times larger than for the longest. In a profile with 100 bands a quarter of the bands may contain collisions, concentrated at the shorter band lengths. All calculations require a careful estimate of the monotonically decreasing fragment length distribution. Modifications of Dice and Jaccard coefficients are proposed. The principles are illustrated on data from a phylogenetic study in lettuce
AB - AFLP is a frequently used DNA fingerprinting technique that is popular in the plant sciences. A problem encountered in the interpretation and comparison of individual plant profiles, consisting of band presence-absence patterns, is that multiple DNA fragments of the same length can be generated that eventually show up as single bands on a gel. The phenomenon of two or more fragments coinciding in a band within an individual profile is a type of homoplasy, that we call collision. Homoplasy biases estimates of genetic similarity. In this study, we show how to calculate collision probabilities for bands as a function of band length, given the fragment count, the band count, or band lengths. We also determine probabilities of higher order collisions, and estimate the total number of collisions for a profile. Since short fragments occur more often, short bands are more likely to contain collisions. For a typical plant genome and AFLP procedure, the collision probability for the shortest band is 25 times larger than for the longest. In a profile with 100 bands a quarter of the bands may contain collisions, concentrated at the shorter band lengths. All calculations require a careful estimate of the monotonically decreasing fragment length distribution. Modifications of Dice and Jaccard coefficients are proposed. The principles are illustrated on data from a phylogenetic study in lettuce
KW - fragment length distributions
KW - size homoplasy
KW - markers
KW - lettuce
U2 - 10.1198/108571108X308116
DO - 10.1198/108571108X308116
M3 - Article
VL - 13
SP - 177
EP - 198
JO - Journal of Agricultural, Biological, and Environmental Statistics
JF - Journal of Agricultural, Biological, and Environmental Statistics
SN - 1085-7117
IS - 2
ER -