AFLP is a DNA fingerprinting technique frequently used in plant and animal sciences. A drawback of the technique is the occurrence of multiple DNA fragments of the same length in a single AFLP lane, which we name a collision. In this article we quantify the problem. The well-known birthday problem plays a role. Calculation of collision probabilities requires a fragment length distribution (fld). We discuss three ways to estimate the fld: based on theoretical considerations, on in-silico determination using DNA sequence data from Arabidopsis thaliana, or on direct estimation from AFLP data. In the latter case we use a generalized linear model with monotone smoothing of the fragment length probabilities. Collision probabilities are calculated from two perspectives, assuming known fragment counts and assuming known band counts. We compare results for a number of fld's, ranging from uniform to highly skewed. The conclusion is that collisions occur often, with higher probabilities for higher numbers of bands, for more skewed distributions, and, to a lesser extent, for smaller scoring ranges. For a typical plant genome an AFLP with 19 bands is likely to contain the first collision. Practical implications of collisions are discussed. AFLP examples from lettuce and chicory are used for illustration.
- cumulative distribution-functions