Title: Modeling flood elevations of the Limpopo River at Beitbridge border post using extreme value distributions. In this thesis, extreme value models are applied to the occurrence and magnitude of the Limpopo River floods at the Beitbridge border post.

## Statement of the Problem

A return period is the average length of time in years during which an event (for example, a flood or river level) of a certain magnitude is matched or exceeded once. A level of return is defined as a value that is expected to match or exceed or exceed with a given probabilityp on average once per time interval (T).

## Aim and objectives

Aim

Objectives

Significance of the study

Dissertation structure

## Introduction

The empirical results from this thesis suggested that a Gumbel class of the generalized extreme value distribution model provided a good fit. Notably, most researchers used the generalized extreme value distributions, and only a few used the GPD.

## Concluding remarks

There are two approaches for practical extreme value analysis, namely the block maxima approach and the peak-above-threshold approach. Due to the limited amount of data, we usually resort to using the r largest statistic when analyzing extreme values.

## Generalised extreme value distribution for block maxima

*Generalised extreme value distribution for block maxima**Properties of the GEVD when ξ = 0**Properties of the GEVD when ξ 6= 0**Estimation of parameters for the GEV**Stationary and non Stationary models**Deviance Statistics**Augmented Dickey-Fuller test**Model diagnostics**Modelling the R largest order statistics*

Then the sequence (i) is said to converge weakly to Z if the distribution of the random variables Mna−bn functions. The shape parameter influences the support and tail behavior of the GEVD. Now the moment generating function of the standard Gumbel distribution can be used to determine the moments of GEVD.

The normal expected value given in equation (3.2.21) is used to determine the expectations of the second derivative of the probability function. In general, the expectation given in equation (3.2.21) above can also be used to determine the moments of the generalized extreme value distribution. In this way, the variation through time in the observed process is modeled by a linear trend in the location parameter of the appropriate extreme value model which in this case is the GEV distribution.

In each of the above cases, we have H0 :δ = 0, that is, a unit root exists and the time series is non-stationary. The Augmented Dickey Fuller test is performed by augmenting the equations above by adding the lagged values of the dependent variable ∆Yt.

## Generalised Pareto Distribution

*Introduction**Threshold Selection**Declustering**Parameter Estimation for the Generalised Pareto Distribution 30*

We exploit another useful property of the extremity index estimator to derive an additional tool for threshold selection. The threshold stability of the extremal index estimator refers to its invariance to change in threshold value above a suitably high threshold. To check whether the Ferro and Sergers(2003) model fits the estimated extremal index ηu, we use a Quantile-Quantile (QQ) plot of interexceedance times against standard exponential quantiles.

However, there are some irregularities in using the maximum likelihood method to estimate the parameters of extreme value distributions (EVTs). Smith (1985) emphasizes the regularity conditions that apply to the use of the maximum likelihood estimation method in estimating the shape parameter. In our study, we will use both maximum likelihood estimation and Bayesian methods for parameter estimation.

According to Coles (2001) and Smith (1987), the problem of regularity assumptions in the limiting behavior of the maximum likelihood estimation method occurs when the shape parameter ξ < −12. Several standard statistical model diagnostic plots are used to check model fit and the appropriateness of threshold selection.

## Estimating uncertainty using bootstrapping

### Flood height data

In this thesis, the historical monthly and annual maximum data of the Limpopo River from Beitbridge border post station were used.

Descriptive statistics

Time series plot, density plot, normal QQ plot and Box plot for

## Stationary GEV models

*Fitting the model without trend**Profile Likelihood for the shape parameter**Diagnostic Plots for the GEV Stationary model**Test for Stationarity using Augmented Dickey Fuller test*

Using a combination of estimates and standard errors, the 95% confidence intervals for ξ, σ and µ are summarized in Table (4.3). 95%. This suggests that the flood heights for the Limpopo River at the Beitbridge border station are modeled by the Fr´echet class of distributions. However, better precision of the confidence interval bands is achieved by using the shape parameter profile likelihood as shown below.

When plotting the probability profile, we can keep changing the range until we see a confidence line. From the profile likelihood plot in Figure (4.2), the confidence intervals for the shape parameter are approximately the same. Various diagnostic plots are shown in Figure (4.3) to assess the accuracy of the GEV model fitted to data on the Limpopo River at the Beitbridge border station.

Neither the probability plot nor the quantile-quantile plot gives reason to doubt the validity of the fitted model: each set of plotted points is nearly linear. Moreover, the curve provides a satisfactory representation of the empirical estimates, especially when sampling variability is taken into account.

## Non Stationary GEV models

### Deviance statistics

Therefore, allowing a linear trend over time is not an improvement over our model that does not allow for a linear trend. linear trend.

### Modelling flood height with inclusion of Southern Oscillation

The parameters obtained after the inclusion of (SOI) are summarized as follows: This Table 4.5: Summary of estimates with the inclusion of SOI.

Scatter plots

### Correlation coefficient values for different months

From the plots in Figure (4.6) it is evident that February was a rainy month, because the flood height is positively related as shown by the correlation graphs as well as the value of r in Table (4.6).

### Modelling the r largest order statistics

Therefore, we have chosen a fixed value of r to be 2 based on the standard errors and Figure (4.7) and also the plots in Figure (4.9) in our subsequent analysis.

Diagnostic plots for r=2 rlargest order statistics

Probability plots and quantile plots for r largest order statistics 53

Introduction

### Declustering

The data are over 1.43 meters simulated by an autoregressive process with extremal index θ = 1.25 and corresponding 1ˆ. Therefore, we can further, using this extreme index estimate of 1.33 meters and a threshold of approximately 1.33 meters. Therefore out of the 287 threshold crossings for our flood height data, we have 23 clusters as illustrated in figure (4.13) below.

The tail fraction based on the bulk model is shown in red, while the parameterized tail fraction is indicated in blue.

Use of extremal mixture model to select threshold

Examining the fit of threshold over a range of threshold

Estimates and standard errors for GPD fitting

Model diagnostics of Generalised Pareto Distribution

## Bayesian Inference of the annual maxima of flood heights

Introduction

Parameter estimates

Bayesian Analysis

## Return levels after fitting the GEV and GPD using the Maximum

The hundred year yield levels using the Generalized Extreme Value Distribution are comparable to the flood height for 2013 when the flow reached 6,707 meters using the maximum likelihood approach.

## Parametric bootstrap

### Return level plot for the Generalised extreme value distribution 64

Figure (4.21) shows the predictive densities of return levels, and from this we can simply visualize the confidence interval bands. In this chapter, we succeeded in showing the suitable model that fits the Limpopo River at Beit Bridge border post. Return periods and return levels calculation of Limpopo River at Beitbrug border post were also exploited in this chapter.

Even the descriptive statistics including the time series analysis, boxplot, density plot and normal quantile-quantile plot for Limpopo river flood height data are also given in this chapter. Of great interest is the inclusion of Bayesian EVT and r largest order statistics in this chapter.

Introduction

## Conclusion

The models developed in this thesis correspond to cumulative (or moving sums) annual maximum flood heights in series and therefore appear to be reliable for flood frequency analysis. Using the identified time-dependent GEV models with a scale parameter trend at the Beitbridge border post would also reduce the sensitivity of flood frequency, which is known to vary with changes in the scale parameter estimate, and therefore lead to more reliable estimates in the frequency of flooding.

Limitations of the dissertation

## Findings and Contributions

The use of identified time-dependent GEV models with a trend in the extent parameter at the Beitbridge boundary station would also reduce the sensitivity of flood frequency, which is known to vary with changes in the estimate of the extent parameter, and thus lead to more reliable estimates of flood frequency. ii) Accurate estimation of return rates and periods of extreme flood heights helps in risk mitigation, eg bridge design by civil engineers. iii). It is now possible to quantify the damage caused by the flood height for that given period. iv) The Fr´echet class of distributions is found to best fit the data in all modeling frameworks of this dissertation. This means that the distributions of extreme flood heights for the Limpopo River at the Beitbridge border station are thin.

## Future research

This implies that the distributions of extreme flood heights for the Limpopo River at Beitbrug border post are thin-tailed. iii) Inclusion in the EVT models (GEVD and GPD) more covariates in the form of cycles and or a physical variable such as dummy variable indicating the occurrence of cyclones in the region, will also be considered in future research. iv) Future research will also investigate the use of scoring rules such as the continuous probability score (CPRS) in assessing the predictive performance of the (GEVD) and (GPD) models. 5] de Waal, D. (2009). Posterior predictions about river discharge. Risk and decision analysis in maintenance optimization and flood management. 2012). Extreme rainfall distributions: analysis of change in the Western Cape. 2003).Inference for clusters of extreme values. 1928) Limiting forms of the frequency distribution of the largest or smallest member of a sample.

Extreme value modeling of dependent series using r. 2006). A Comprehensive Analysis of Extreme Rainfall, Msc Thesis, University of Witswatersrand, Johannesburg, URL http://wiredspace.wits.ac.za: Available October Adapting Extreme Value Distributions to Zambezi River Flood Water Levels Recorded at Katima Mulilo in Namibia, Msc Thesis , University of Western Cape, URL http://hdl.handle.net accessed 2005. Estimating high quantiles of extreme floods in the lower Limpopo River Basin of Mozambique using a model-based Bayesian approach. 2002). Application of extreme value theory and threshold models to hydrologic events, Msc thesis, University of Colorado, URL http://ucdenver.edu :accessed 2002.

Understanding predictive uncertainty in hydrologic modeling: the challenge of identifying input and structural errors. 2012). Review of Extreme Value Threshold Estimation and Uncertainty Quantification. Extreme value analysis of environmental time series: an application to the detection of trends in ground-level ozone. Statistical Science, 4 (4), pp:.

Summary for the properties of a Gumbel distribution

Summary of the properties for the GEVD

Table showing descriptive statistics for flood height

Table showing a summary of parameters and standard errors

Summary of estimates after allowing for a linear trend

Summary of estimates with inclusion of SOI

The correlation coefficients r

Maximum log likelihoods parameter estimates, confidance intervals and

Estimates and standard errors for GPD fitting

Bayesian estimates for the GEVD

Calculation of prediction Intervals