Stochastic modelling of daily rainfall sequences

T.A. Buishand

Research output: Thesisinternal PhD, WU


<p/>Rainfall series of different climatic regions were analysed with the aim of generating daily rainfall sequences. A survey of the data is given in I, 1. When analysing daily rainfall sequences one must be aware of the following points:<br/>a. Seasonality. Because of seasonal variation of features of the rainfall process the analysis is done for each month or season separately (see III, 2).<br/>b. Non-homogeneity. A rainfall series is called non-homogeneous when it is non-stationary even after elimination of seasonal variation.<br/>c. A large fraction of days with no rain.<br/>d. Dependence between rainfall amounts on successive days (serial correlation).<br/>It is the combination of the last two points which makes the generation of daily rainfall sequences difficult. When dealing with rainfall observations over periods longer than one day this difficulty is mostly obviated because one gets less zeroes and evidence for serial correlation usually disappears. For instance, there is no evidence for serial correlation in monthly data of Dutch stations (see II, 3.1). Besides, theoretical distributions can easily be fitted to the marginal distribution (e.g. the 'loi des fuites', see II, 3.2). The generation of these data is therefore not complicated. For annual totals the Gaussian distribution often fits reasonably well (see II, 2 for Dutch series, and V, 2.1 for foreign series). Departures from normality are found for rainfall stations with a few wet days in a year (New Delhi, Khartoum).<p/>Homogeneity of Dutch rainfall series is discussed in Chapter II. It is assumed that non-homogeneities are man-made, e.g. due to a change in rain gauge installation or a change of observer and therefore non-homogeneities usually consist of jumps.<br/>A problem when dealing with Dutch rainfall series is the lowering of the rain gauges (from 1.50 m to 0.40 m) during the period 1946-1954 (see II, 4). Due to a smaller wind effect it is expected that such a reduction in height results in larger rainfall measurements. To find a jump in the mean, annual totals of Dutch stations were compared with contemporary totals of foreign stations where no change of height took place. For such a comparison two points are important:<br/>a. The distance between the various rainfall stations. In order to obtain a powerful test for a jump, one should choose the stations close together.<br/>Therefore Dutch rainfall stations near the Belgian or German border were taken.<br/>b. There are other non-homogeneities, for instance, due to changes of site. The consequence of such non-homogeneities is that the estimates of a jump, caused by a reduction of height, may be biased. Moreover, these non- homogeneities give rise to a smaller correlation between the rainfall series and the tests for a jump become therefore less powerful. The influence of local changes can be reduced by taking averages of different stations in a certain area.<br/>With regression models and plots of partial sums, a jump in the mean of about 2 per cent is found for stations remote from the coast; for coastal stations the height of the jump can be much larger (even more than 10 per cent), but there is a large variation due to differences in the degree of protection against the wind. The results correspond quite well with those of earlier research by BRAAK (1945).<br/>By comparing monthly data of Dutch and German stations in the northern coastal area (see II, 4.2) it is found that the largest jumps occur in the winter season.<p/>Another point of investigation is the homogeneity of the Zwanenburg- Hoofddorp (1735-1972) series (see II, 5). Since here there is no nearby rainfall station, with no changes in the way of measuring during the period of observation, the analysis of homogeneity was merely based on the series under consideration. The tests which were considered are less powerful than the ones based on a comparison between changed and unchanged stations. Yet, there is obvious evidence for differences in the means of Zwanenburg (1735-1860) and Hoofddorp (1861-1972). There is no evidence for departures from homogeneity in the Hoofddorp series. Since there is also a poor correlation between the Zwanenburg data and other old rainfall series, these data can be considered useless for present-day hydrological research.<p/>Because of the large number of zeroes in daily rainfall sequences, it is suggested to generate first the occurrence of wet and dry days and subsequently the rainfall amounts on wet days. Since small rainfall amounts are often registered as zero it is advisable to call a day wet if its rainfall amount exceeds some specified value. For the Netherlands a threshold of 0.8 mm is advisable (see II, 6); for smaller thresholds there are only a few rainfall stations for which the series of wet and dry days (shortly denoted as wet-dry series) is homogeneous.<p/>In Chapter III a model is developed for Dutch rainfall series, using daily data from Winterswijk (1908-1973), Hoofddorp (1867-1971) and Hengelo (1908-1973). Theoretical considerations about the model are given in Chapter IV.<br/>With respect to the wet-dry sequences of these series it can be concluded:<br/>a. There is no evidence for correlation between the lengths of successive wet and dry spells (see III, 3.1).<br/>b. Modifications of the negative binomial distribution (the shifted negative binomial distribution, see III, (3.2) and the truncated negative binomial distribution, see 111, (3.3)) fit the lengths of weather spells well.<br/>Seasonal dependence of the parameters of the truncated negative binomial distribution was extensively studied. For a particular type of spell it was shown that it is reasonable to keep one of the parameters, r, constant throughout the year. Further, for dry spells the other parameter, p, can be smoothed according to a moving average scheme (see III, (3.17)); for wet spells seasonal variation of the parameter p can be described by a Fourier series with one harmonic component (see III, (3.12)).<p/>With respect to the behaviour of rainfall amounts on wet days the following remarks can be made.<br/>a. There is no evidence for correlation between the rainfall amount on the first day of a wet spell and the length of the preceding dry spell (see III, 4.1).<br/>b. The first and the last day of a wet spell have smaller means than the other wet days; the smallest mean is found for solitary wet days (see III, 4.2).<br/>c. There is some evidence for serial correlation of successive rainfall amounts within a wet spell (see III, 5.1). It is assumed that this serial correlation can be described by a first order moving average process (see III, 6.1).<br/>The last two points are most evident during the winter season.<br/>A shifted gamma distribution fits the marginal distribution of the rainfall amounts on wet days reasonably well (see III, 5.2). There is no evidence for seasonal variation of the shape parameter; the mean, however, shows an obvious seasonal variation.<p/>Though synthetic sequences resemble the historic series with respect to features contained in the model (such as the marginal distribution of daily rainfall amounts and the lengths of wet and dry spells), this is not necessarily true for other features. As examples the correlogram and features of k-day sums ( <em>k</em> = 2, 3, .. .) were considered. This was done for both the wet-dry process and the entire rainfall process.<br/>Some features of the rainfall model can be obtained by numerical methods. These features are:<br/>a. The cumulative distribution function (cdf) of the number of wet days in a <em>k</em> -day period. Under the assumption of iid rainfall amounts within a wet spell it is not difficult to derive an expression for the cdf of <em>k</em> -day rainfall totals (see IV, 3).<br/>b. The correlogram for both the wet-dry process and the entire rainfall process (see IV, 4).<br/>c. The variance-time curve of the wet-dry process and of the entire rainfall process (see IV, 5). For large values of <em>k</em> ( <em>k</em> >10) the variance of the number of wet days in a <em>k</em> -day period can be approximated well by an asymptotic formula (Equation IV, (5.36)) involving only the first three moments of the lengths of wet and dry spells. This approximation can also be done for the variance of <em>k</em> -day rainfall amounts when the rainfall totals within a wet spell are iid. For the derivation of the formulas, underlying these numerical calculations, the following assumptions are made.<br/>a. The process is stationary.<br/>b. The wet-dry process is an alternating renewal process. A definition of this process is given in IV, 2.2.<br/>These assumptions turn out to be reasonable when the rainfall process is examined for a particular month or season.<p/>For the correlogram. it can be concluded (see III, 6.1):<br/>a. There is a good correspondence between the estimated first serial correlation coefficient and the theoretical value for both the wet-dry process and the entire rainfall process. This quantity is usually underestimated when simplifying assumptions are made about the behaviour of rainfall amounts within a wet spell.<br/>b. For larger lags the model usually underestimates the serial correlation coefficients of the rainfall process, especially during the winter season. For the wet-dry process the model usually provides a better fit at the higher lags.<br/>Closely related to the last point is the fact that the model underestimates the variances of 30-day rainfall amounts (see III, 6.2). During winter and autumn sometimes long wet spells occur with very high intensity (see III, 4.2) which inflate the estimated variances of <em>k</em> -day totals for large values of <em>k</em> .<p/>The following remarks can be made on the cdf of <em>k</em> -day sums.<br/>a. For the number of wet days in a k-day period there is a good correspondence between theoretical and empirical cdfs (see III, 7.1).<br/>b. For the entire rainfall process theoretical cdfs fit well for small values of <em>k</em> ; poor fit may occur for larger values of <em>k</em> (e.g. <em>k</em> = 30). This poor fit usually consists of an underestimation of the probabilities of large values (see III, 7.2.2).<br/>Though the cdf was only investigated under the assumption of independent rainfall amounts within a wet spell, it may be expected that the shape of the cdf is hardly influenced when serial correlation between these rainfall amounts is assumed, since the increase in the variance of k-day totals is only very small for a model with serial correlation (see III, 6.2).<br/>For the rainfall process it was investigated how different features of the model affect the shape of the cdf of 30-day totals. The main results are:<br/>a. The shape of the cdf is hardly influenced by the distribution of the lengths of weather spells (see III, 7.1).<br/>b. The shape of the cdf is to some extent not sensitive to the marginal distribution of the rainfall amounts on wet days (see III, 7.2).<br/>c. The shape of the cdf is hardly altered when rainfall amounts within a wet spell are assumed to be iid.<br/>For Winterswijk (1908-1973) nearly the same results were obtained when the threshold defining a wet day is taken to be 0.3 mm instead of 0.8 mm.<br/>Though there are many corrections and supplements in the series of Hengelo (1908-1973) the results for this station correspond quite well to those of the adjacent station of Winterswijk<p/>In Chapter V daily rainfall sequences of stations with a more pronounced seasonal variation than Dutch stations are discussed.<br/>The problems encountered for Dutch stations usually arise here too:<br/>a. In order to get a homogenous wet-dry series one is often forced to call only those days wet for which the rainfall amount exceeds a rather large threshold (see V, 2. 1).<br/>b. Rainfall amounts within a wet spell are often non-identically distributed. Moreover, there usually exists a small serial correlation between rainfall amounts within a wet spell (see V, 2.3).<br/>c. The rainfall model underestimates the variances of <em>k</em> -day totals for large values of <em>k</em> (see V, 2.4).<br/>Besides, for the series analysed in Chapter V there are some problems associated with dry seasons with no or hardly any rainfall:<br/>a. It is often not possible to fit the shifted negative binomial distribution or the truncated negative binomial distribution to lengths of wet spells during the dry season. Since there are no long wet spells during this season, the likelihood equations of these distributions often do not have a solution within the parameter space. In such cases it is possible to fit a one-parameter distribution (geometric, logarithmic series) to the lengths of wet spells (see V, 2.2).<br/>b. Dry spells can be quite long. Modifications of the negative binomial distribution sometimes cannot fit the lengths of these spells (see V, 2.2 and V, 3). In such cases it might be advisable to use transition probabilities for the generation of the wet-dry series instead of generating lengths of wet and dry spells. For instance, it was shown, by simulation, that a first order Markov chain describes the right tail of the distribution of the lengths of dry spells well for the station of Alexandria (see V, 3).<p/>The generation of synthetic data for Pasar Minggu (Indonesia) was investigated in more detail (see V, 2.5). Special attention was paid to the beginning of both the wet and the dry monsoon. The model can describe the transitions between these seasons quite well.<p/>A special problem arises for the rainfall series of Khartoum (1902-1940). For this station there is some evidence for serial correlation in the annual totals and in the annual number of wet days (see V, 3). This serial correlation can be explained by persistence in the lengths of successive wet and dry seasons. It is proposed therefore to generate the beginning and the end of the wet season first. Within a wet season the rainfall process can be approximated by a Bernoulli process for the occurrence of wet and dry days and a shifted gamma distribution for the rainfall amounts on wet days. The probability of a day being wet and the mean of the rainfall amount on a wet day show seasonal variation.<br/>The main shortcoming of the daily rainfall model is that it underestimates the variance of <em>k</em> -day totals for large values of <em>k</em> which may result in it poorly fitting the distribution of these totals. It is, however, by no means certain whether this shortcoming is important in practical situations. When dealing with hydrological systems with a long memory one may expect serious problems but studies on such systems can often be based on a time-scale longer than one day. Therefore it is necessary to test the model on some real problems to obtain a better insight into its shortcomings.<p/>One may ask whether improvements of the model are possible. For Dutch series the description of the wet-dry process by a seasonal changing alternating renewal process seems reasonable, since the model fits well the probability distributions of the annual number of wet days (see III, 7.2.2) and of the number of wet days in a 30-day period (see III, 7.1). Therefore one must think of a better model for the behaviour of rainfall amounts on wet days. It is impractical to incorporate serial correlation of higher order between rainfall amounts within a wet spell as the effect on the variance-time curve of the process is negligible, because wet spells usually are of short duration. The model could be improved by:<br/>a. a random slowly changing mean of the rainfall amounts on wet days. This certainly will increase the variance of <em>k</em> -day totals for large values of <em>k</em> .<br/>The main problem of this method is the estimation of the parameters. Another problem can be the choice of the type of distribution for the rainfall amounts on wet days.<br/>b. generating a total rainfall amount for a particular period (e.g. a month) and splitting this rainfall amount into the rainfall amounts of the various wet days of that period. Because of the method of generation this model may give a reasonable fit to monthly and annual totals. A disadvantage of this method is that the model contains a large number of parameters.<br/>But before thinking of such improvements one must realize that there are large local differences for the variances of 30-day totals (see III, 6.2). It is therefore necessary to analyse a large number of daily rainfall sequences of the Netherlands and its neighbouring countries.<br/>For some foreign stations analysed in Chapter V one also has the trouble that for large values of k an alternating renewal process leads to a serious underestimation of the variance of the number of wet days in a <em>k</em> -day period. Research still has to be done to get a better model for such series.<p/>
Original languageEnglish
QualificationDoctor of Philosophy
Awarding Institution
  • van der Molen, W.H., Promotor, External person
  • van der Laan, P., Co-promotor, External person
Award date6 Apr 1977
Place of PublicationWageningen
Publication statusPublished - 1977


  • analogues
  • hydrology
  • meteorological observations
  • models
  • precipitation
  • statistical analysis
  • cum laude

Fingerprint Dive into the research topics of 'Stochastic modelling of daily rainfall sequences'. Together they form a unique fingerprint.

  • Cite this