The analysis of dependent count data

J. Engel

Research output: Thesisexternal PhD, WU

Abstract

In the literature, methods have been presented for the analysis of count data classified by fixed and crossed factors under the assumptions that this data can be modeled by independent binomial or Poisson distributions. In general, the mean value of these distributions depends on the levels of the classifying factors and a linear model is proposed for the logit transform or the log transform of these mean values.
In practice many situations occur which are different, such as:

- The counts are independent, but the observed variation in the data is more than can be explained by e.g. the Poisson distribution;

- The counts are dependent: the factors are not fixed but they are
random.

For these situations no general analysis methods are available, and there is a strong need for extensions of the theory. In this thesis extensions of the theory will be presented to allow for the modeling of this count data.

In chapters 2, 3 and 4 of this thesis the situation is considered of overdispersion with respect to the binomial distribution and the Poisson distribution. In the case of overdispersion we may observe from the data that var(X) = σ 2E(X) with σ 2>1, instead of var(X) = E(X) for the Poisson distribution. In chapter 2 we propose the beta- binomial distribution for modeling the overdispersed data, and limiting results for test statistics will be obtained for a large number of trials at each cell in the design.

A gamma-Poisson or negative binomial model is proposed for modeling overdispersed count data in the 3th and 4th chapter of this thesis. Here we obtain approximate distributions of test statistics for a large number of replicates and for large counts as well. In chapters 2, 3 and 4 the limiting results are obtained for standard test statistics known from the theory of loglinear and logitlinear models, like Pearson's X 2statistic.

Chapter 5 deals with dependent count data in a split-plot situation. Here a model is proposed to allow for this dependence of the data from the splitplot experiment. Two separate analyses will be performed, namely for the whole plot and for the sub-plot factors, imitating the general Anova approach. The basic models are the gamma-Poisson model and the Dirichlet-multinomial model.

Data obtained by a dependent classification of objects in two or more ordered classes, testing hypotheses concerning the probabilities corresponding to these classes is a problem met e.g. in the context of questionnaires. In chapter 6 we study the signed rank test of Wilcoxon in the situation of such a dependent classification. It appears that the limiting distribution of this test statistic, under a Dirichlet-multinomial model assumption for the data is the normal distribution; there is an extra parameter for the dependence of classification.

The two final chapters 7 and 8 of this thesis deal with random factor problems for crossed and for nested designs (chapter 7) and for nested designs using a different method (chapter 8).

The approach in chapter 7 is as follows. Basically, we assume that the process which generates the counts can be modeled by the Poisson process. The intensity of this Poisson process is a random variable instead of a fixed parameter, and the random components for main effects and interactions of the factors are represented by this random intensity. We assume lognormality for the distributions of these random model components and we shall derive a limit theorem to simplify this complicated model. The result is a simple model for situations with large counts.

The quasi-likelihood approach for nested designs with random factors is the subject of chapter 8. The quasi-likelihood approach was proposed by Wedderburn in 1976 for the analysis of independent data, to be used if distributional assumptions are hard to make. It is an attractive method to use for the analysis of dependent count data as well, as the exact distribution of this data is rather intractable.

We shall use the quasi-likelihood approach to derive estimators and test statistics for the variance components in the case of a nested design with random factors, starting with a few very simple assumptions with respect to mean and variance of the data.

Interesting is, that the data which can be analysed is not restricted to count data. At the end of chapter 8 some topics for further research will be mentioned, advocating a further study of quasi-likelihood for the analysis of dependent (count) data for crossed designs with random factors.

Original languageEnglish
QualificationDoctor of Philosophy
Awarding Institution
Supervisors/Advisors
  • van der Laan, P., Promotor, External person
Award date29 May 1987
Place of PublicationWageningen
Publisher
DOIs
Publication statusPublished - 29 May 1987

Keywords

  • statistical analysis
  • statistical inference
  • cum laude

Fingerprint

Dive into the research topics of 'The analysis of dependent count data'. Together they form a unique fingerprint.

Cite this