On statistical selection in plant breeding

C.J. Dourleijn

Research output: Thesisinternal PhD, WU


The ultimate goal of plant breeding is the development of new varieties. An important phase in the development process is testing and selecting potential new varieties. The varieties are tested by means of experiments at various sites, (sometimes) in several years. The observations from the experiments are usually modelled with a linear model. The best linear unbiased estimators (BLUEs) of certain estimable combinations of parameters (named variety values) in such a model are used to compare varieties mutually and with control varieties, and to make a selection. Selection means making decisions whether or not to discard particular varieties. The plant breeder certainly knows that these decisions are subject to uncertainty, but until now he did not have a quantitative measure for this uncertainty.

In this thesis we advocate the use of statistical theory in the selection phase of the breeding process. Here, statistical selection is split up into two components: 1) Estimating contrasts between variety values as good as possible, 2) Application of statistical selection procedures on these best estimates.

The results of this thesis are not restricted to a particular crop or breeding programme. However, we have studied the sugar beet breeding practice and use data from this field to illustrate our findings. In the sugar beet breeding programmes there are several stages where "varieties" are tested and selected. The plant breeder is interested in the specific varieties included in the variety trials and not in some population of varieties. Therefore, the variety terms in the models for the observations are chosen fixed.

Usually, an experiment at a particular site has incomplete blocks to take account of heterogeneity of the soil. With many varieties to be tested, the design of such an experiment is almost never balanced. Experiments are laid out at various sites (and sometimes in several years), but not every variety is tested at all sites. This results in a variety x site scheme that is not completely filled.

The selection decisions are based on several characters of the crop. 'This makes selection complicated, because all characters can seldomly be reduced into one selection index on which the decisions can be based. In case of sugar beets it is proposed to use the financial yield as selection index.

As said, we consider estimation of contrasts between variety values to be the first part of statistical selection. The value of a certain variety is defined in this thesis as a weighted average of the expectations of the observations corresponding to this variety. The corresponding linear combinations of model parameters can be estimated best using the least squares method. Although a breeder will base the selection decisions on variety values that include information from various sites (so-called mean performance values), he is also interested in the variety values at the separate sites. For the observations at a single site, either a fixed additive model or a mixed additive model with random block terms is used.

The definition of the mean performance variety value depends on the model used. This model describes the joint observations of the series of experiments performed at different sites. The ranking of the varieties based on the estimates of their variety value can be different for different models. Therefore, the model has to be chosen with care. The model choice concerns questions as to whether model terms have to be considered fixed or random, and whether an additive or an interaction model has to be used.

A procedure to obtain the BLUEs of the mean performance variety values, without analysing the joint observations of A experiments, is proposed in this thesis. First, the experiments at the various sites are analysed individually and contrasts between variety values at these individual sites are estimated. Next, the BLUEs of the mean performance variety values can be calculated for models without fixed variety x site interaction terms as a multivariately weighted average of BLUEs calculated at the individual sites. For models that include fixed interaction terms the BLUEs must be calculated as a univariately weighted average of the 'local' BLUEs, with the weights given by the breeder.

For the situation of a fixed additive model for the joint observations of all sites, it is shown that for some experimental designs the multivariate weights reduce to univariate weights. This is e.g. the case when the C matrices (from the reduced normal equations) of the individual sites are proportional to each other. Regardless of the model or design used (as far as investigated), we can say that contrasts between (mean performance) variety values can be estimated (with BLUEs) in two steps of reduced size. Also the variance/covariance matrix of the estimators and the usual estimate of the error variance can be calculated in such a way.

The same principles can be applied to a so-called concatenated trial. Such a trial, located at one site, is subdivided into subtrials that include new varieties not grown in any other subtrial and control varieties grown in all subtrials. Here, meaningful contrasts to estimate are the differences between parameters of a new variety and the average of the parameters of the control varieties. The BLUEs of these contrasts can be calculated by combining local BLUEs from the subtrials. In certain cases the local BLUEs are already the 'overall' BLUEs.

The use of statistical selection procedures is considered to be the second component of statistical selection. In this thesis we pay much attention to subset selection rules. Using subset selection, the breeder selects a random sized subset of varieties. The subset size is chosen as small as possible, but large enough to guarantee that the probability of correct selection (i.e. the probability that the desired variety is included in the selected subset) is at least P *, with P * a predefined value. The desired variety can be the best variety (where 'best' must be defined) or a good variety (where 'good' must be defined), or maybe the desired varieties are all varieties better than a control variety. The subset is selected by means of a specified selection rule, which includes estimates of differences between variety values, the estimated variances of the corresponding estimators and so-called selection constants. The selection constants are associated with the experimental design used.

Often used in practice is the selection of a predetermined number of varieties. However, we have shown that this way the probability of correct selection cannot be controlled. This could mean that the desired variety is lost too often.

For unbalanced incomplete block designs selection constants had to be calculated for each variety. For practical use this is very inconvenient, because this type of designs is often used. Therefore, we have developed selection rules that only need a single selection constant, regardless the experimental design used. Such rules can only be used if the experiment is randomised, which means that the design has to be randomised and the actual varieties have to be assigned to the design varieties (numbers) by means of a defined randomisation process.

The calculation of the selection constants by numerical integration is only feasible for the situation of variance-balanced designs. In the other cases we can use computer simulation to approximate the desired selection constants. Our computer program SELCON performs this simulation, making it possible to calculate the selection constants for every experimental design. It appears that the simulation results are accurately enough to be used for practical purposes. Simulation can also successfully be used to calculate the probability of correct selection and the expected subset size, given the configuration of variety parameters. This can e.g. be used to compare different selection rules.

Using the variance/covariance matrix of the BLUEs of contrasts between mean performance variety values the selection constants can be approximated by simulation. So, the subset selection rules can also be used for the combined results of a series of experiments. The information that certain varieties are a priori excluded from selection affects the value of the selection constants and is therefore taken into account in the simulation program.

Two modifications of the subset selection procedures are proposed in order to make these procedures more useful in practice. In the first proposed modification it is assumed that the variety parameters represent a sample from a (Normal) superpopulation. This extra assumption leads to smaller selection constants and thus smaller subsets. However, now the expected probability of correct selection is controlled and not the minimum probability of correct selection, as is the case for the ordinary subset procedures. Subset selection procedures with the superpopulation assumption seem very useful for the plant breeding practice. The second proposed modification, namely the use of simultaneous lower bounds of the ranked variety parameter contrasts in order to calculate a confidence lower bound of the probability of correct selection, seems less practical.

To be able to execute the subset selection rules on a routine basis, software is needed. Therefore, we wrote the program SUBSET. This program executes subset selection rules using the selection constants as calculated by the program SELCON. In the output of SUBSET the breeder is informed about the uncertainties corresponding with certain selection decisions, and this enables him to make a well-considered selection. The developed theories and computer programs were successfully tested in a case study.

We finally reach the conclusion that statistical selection procedures, especially subset selection procedures, can successfully be used in the plant breeding practice.

Original languageEnglish
QualificationDoctor of Philosophy
Awarding Institution
  • van der Laan, P., Promotor, External person
  • Verdooren, L.R., Promotor, External person
Award date22 Mar 1993
Place of PublicationS.l.
Print ISBNs9789054850960
Publication statusPublished - 1993


  • selection
  • selection responses
  • beta vulgaris
  • sugarbeet
  • statistical analysis


Dive into the research topics of 'On statistical selection in plant breeding'. Together they form a unique fingerprint.

Cite this