• No results found

PDF srvubudsp002.uct.ac.za

N/A
N/A
Protected

Academic year: 2023

Share "PDF srvubudsp002.uct.ac.za"

Copied!
98
0
0

Loading.... (view fulltext now)

Full text

Specific problems in using the multiple regression model to model spatial data are that parameters are assumed constant over space and error terms are assumed to be independent. GWR is an extension of the traditional spatial data regression model that takes location in space into account.

Statistical description of the GWR model

Estimation of the regression parameters

Weighted Regression

Think concretely of a fixed regression point on location. So which mayor should not be an observation point. The estimate of f!.(so), the parameter vector at the regression point at location So, is .

Spatial weighting functions

This ensures that observations closer to the regression point will have more influence on the parameter estimates than observations further away. One method is to rank the data points according to their distance from the regression point.

Estimation of 0"2

To reduce this problem, spatial kernels can be constructed that vary their bandwidth according to the data density around the regression point, so that the bandwidth is greater where the data points are sparse than \vhere the data is dense. In areas where the data is sparse, the kernel will have to expand to ensure that the sum of the weights is C, while in areas where the data is dense, the kernel will have to shrink.

Choice of bandwidth

Cross-validation

By plotting the CV scores against bandwidths, guidance can be provided on choosing an appropriate bandwidth (Fotheringham, Brunsdon, & Charlton, 2002). If a bandwidth that minimizes the CV score is identified graphically, a more accurate value for that bandwidth can be obtained by using an optimization routine.

Akaike's Information Criterion

Cross-validation, a technique widely used in statistics with nonparametric modeling, involves refitting the model to predict each data point, leaving that data point out of the fitting process (Hastie, Tibshirani, & Friedman, 2001). The model can be rebuilt repeatedly with different values ​​of bandwidth and the corresponding cross-validation score calculated.

Spatial Non-stationarity

H A: 13k is non-stationary over the region of interest. This procedure is repeated for all parameters 1-h for k = 0,.

Chapter 3

A Proposed Extension to the GWR Model

The Expansion Method

Development of the LLGWR model

The estimate of !::!.* (so), the parameter vector at the regression point at the location So, is given by. In the G\VR model, the spatial variability of the regression coefficients is accommodated by invoking weighted regression centered at a point of interest and with weights that decrease as the distance of observations from that point increases.

Chapter 4

A Small Data Set taken from Soil Science

Data

Exploratory Data Analysis

  • Global model
  • Residuals
  • Global models fitted over quadrants

58 where Yi is the /h observed value of water content and Yi the corresponding fitted value from model (4.1), is shown in Figure 4.4 (a), and a plot of the residuals against the fitted values ​​is shown in Figure 4.4 (b). . However, the spatial distribution of the residuals shown in Figure 4.5 appears to be non-random. Large positive residuals are located in the northeastern part of the map, and negative residuals are located in the south.

In general, standardized residuals greater than 2 in absolute value are considered potential outliers. The impact of these observations on the regression analysis was examined, but removing them made very little difference to the results. The field was divided into four quadrants as shown in Figure 4.7 and simple linear regression models were fitted to the data for each quadrant separately.

This provides a simple way of checking whether the modeled water/clay content relationship is likely to be stationary in space. The results of the global models fitted separately for each quadrant are presented in Table 4.3, and the scatterplots of water versus clay content with the corresponding fitted regression lines for each quadrant are presented in Figure 4.8.

Table  4.2:  Parameter estimates  of the simple  linear  regression  (global)  model  The regression has an  R2  value of 71  %,  and thus the model  provides a  good fit  to  the  data
Table 4.2: Parameter estimates of the simple linear regression (global) model The regression has an R2 value of 71 %, and thus the model provides a good fit to the data

Application of GWR

Thus, the parameters were estimated at each of the grid points producing 5000 estimates for each parameter in space. Thc intcrccpt cocfficient e::;timates, as can be seen in Figure 4.10 (a), show a clear pattern with higher values ​​located in the north-west of the field and lower values ​​located in the south. The standard errors of these estimates as can be seen in Figure 4.10 (b) are highest in the corners of the field.

The estimated clay coefficients mapped in Figure 4.11(a) have: the highest values ​​are in the southwest of the fidd, and the lowest values ​​are in the northwest. The standard errors of these estimates are highest in the northwest corner and along the southern edge of the field. A comparison of Figure 4.10(a) with Figure 4.11(a) shows that high intercept values ​​correspond to low clay coefficient values ​​and low intercept values ​​correspond to high clay coefficient values.

This randomization was repeated 1000 times and the proportions of the variances S2(~k) for Ie = 0.1 exceeding the actual variance obtained from the data at the correct sites were calculated and found to be 0.001 and 0.022, respectively. These proportions provide a measure of the probability of observing variation in the local parameter e::;timated to lea::;t as extreme as that ob::;served for the actual data if the parameter were globally constant.

Figure 4.9:  Variation  in  CV score  with  bandwidth  using  a  Gaussian kernel.  The  minimum  CV score  exists  for  a  bandwidth of approximately  17m
Figure 4.9: Variation in CV score with bandwidth using a Gaussian kernel. The minimum CV score exists for a bandwidth of approximately 17m

Im ple m e ntation of LLG\VR

It was found that the parameter /30 was significantly different from zero at all the locations and PI was significantly different from zero at 91 % of the locations. 1\Iape of the estimates of the parameters were produced to illustrate their variation over space. Parameters were therefore estimated at each of the grid points yielding 5000 estimates for each parameter.

1000 randomizations of the data were performed and the results of the tests of the following hypotheses. It can be seen from Figure 4.14(a) that the clay coefficient has high estimates, located in the southwest corner of the field. The standard errors of both the intercept and clay coefficient estimates, as shown in Figures 4.13(b) and 4.14(b), respectively, are lowest in the center of the field and where the COl'lwrs are most difficult.

Low values ​​of the daily coefficient are located in the northwest corner, as well as along the southeast border of the field. Based on the results of the tests of the significance of individual parameters, it was possible to omit the coefficient fJr from the model, for which it should be significantly different from zero at all locations.

Figure 4.12:  Variation in CV score with bandwidth for the LLGWR model using a  Gaussian kernel
Figure 4.12: Variation in CV score with bandwidth for the LLGWR model using a Gaussian kernel

Kriging application

Comparative results

Chapter 5

A Large Data Set taken from Geology

Data

Ex pl orato r y Data Anal ysis

  • Continuous variables
  • Residuals

Argovian rock formations are found at locations furthest north and furthest south of the study area. Sequanian rock formations are mainly found in the west of the region, and Quaternary rock formations are mainly found in the northern half of the region. The sampled sites in the center of the study area are dominated by Kimmeridgian rock formations and there are only 4 sampled sites with Portlandian rock formations in total.

Quaternary and Argonian make up about 20% of the sites, and Portland only 1.5% of the sites. Histograms of metal concentrations expressed in parts per million for each metal are shown in Figure 5.4. 259, where Yi are the observed values ​​of chromium concentration and y, the adjusted values ​​from model (5.1), is shO\vn in Figure 5.7 (a) and the plot of residuals against the adjusted values ​​is shO\vn in Figure 5.7 (b).

However, the spatial distribution of the residuals shO\\"11 in Figure .S.8 appears to be non-random with a cluster of large negative residuals located in the eastern part of the study region and some large positive residuals located in the southwestern part A plot of the standardized residuals, which are useful for outlier detection in presented are Figure 5.9.

Table  5.2:  Frequencies  of occmrence of rock  types
Table 5.2: Frequencies of occmrence of rock types
  • Global model

The "rca plate of invcstigalion w&. The model {c.I) "-M fit separately to the data for ea~h quadrant it allows.

Table  5.6:  Results  from  separate  regressions  for  each  quadrant  where  72j  (j  =
Table 5.6: Results from separate regressions for each quadrant where 72j (j =

Application of GWR

All parameters were found to be significantly different from zero at most locations, except for the coefficient a,ssocia,tcd with L4i, which was found to be significant only at 187c of the locations. The main outcome of a G\VR analysis is a set of local parameter estimates that can be mapped to show how the model parameters change in space. Parameters were thus estimated at each of the grid points, yielding 5600 estimates for each parameter in space.

These estimates, as well as the standard errors of the estimates, have been mapped using ArcGIS software and are shown in Figures 5.12 to 5.14. The Monte Carlo method described in Section 2.6 was used to determine whether or not the parameters showed significant non-stationarity.

Table 5.9:  l>1f)nle  (
Table 5.9: l>1f)nle ('arlo 1em, for l~l]l-"tm.ionarity

Implementation of LLGWR

HA : ;3k is non-stationary throughout the region of interest Ho : 3;: is stationary throughout the region of interest compared to HA ​​: 3;: is non-stationary throughout the region of interest Ho : .3~ is stationary throughout the Region of interest compared with Table 5.11 shows that some additional parameters were found to be significantly different from zero at more than half of the locations, namely 30, 3f and ;35', suggesting that the inclusion is a linear expansion may be \\'orthhile .

Some parameters were found to be insignificant at most locations, so the model can be re-fitted (by excluding these parameters. The parameter estimates located at A v appear to be significantly non-zero and non-stationary, namely 30. The variable Proportion of sites where parameter p-values ​​are significantly different from zero ~Ionte Carlo test.

Predicted values ​​of chromium obtained from this model as well as the standard errors of predictions are mapped in Figure 5.19. From Figure 5.19 (a) it can be seen that high values ​​of chromium concentration are predicted in the north-eastern and south-western regions.

Figure 5.15:  Variation in CV score with band\vidth using a  Gaussian kernel.  The  minimum  CV score exists  for  a  bandwidth of approximately  1.8km
Figure 5.15: Variation in CV score with band\vidth using a Gaussian kernel. The minimum CV score exists for a bandwidth of approximately 1.8km

Comparative results for tl'ailliug data set

Results of the validation data set

Chapter 6 Conclusions

71 that the nature of the variability of the regression coefficients in the two data sets fayour different models. It is therefore debatable whether the extension of the GWR model is worthwhile and further investigation is needed. Local Linear Geographic \Veighted Regression (LLG\VR) is easy to implement and can add value in the analysis of certain data sets and can easily be included in the GWR repertoire.

Further investigations involving the analysis of more datasets are required, especially datasets showing strong non-stationarity. An investigation into ~1fixed LLG\\'R models \by which stationary parameters are modeled globally and non-stationary parameters modeled locally is also required.

Bibliography

BIBLIOGRAPHY 74 ing: 110nte Carlo Studies and Application to Illicit Drug 11arke Modelling

Finding a predictive model for Iberian dung beetle species richness based on spatial and environmental variables.

Appendix A

Soil Science Data

Appendix B GWR code

Appendix C LLGWR code

Figure

Table  4.1:  Descriptive statistics of water  and  clay  content
Figure  4.3:  Histograms  of water  and  clay  content
Table  4.2:  Parameter estimates  of the simple  linear  regression  (global)  model  The regression has an  R2  value of 71  %,  and thus the model  provides a  good fit  to  the  data
Figure 4.9:  Variation  in  CV score  with  bandwidth  using  a  Gaussian kernel.  The  minimum  CV score  exists  for  a  bandwidth of approximately  17m
+7

References

Related documents

If the panel tilt angle is  and the panel azimuth is P, the panel inclination relative to the solar beam  is then given by [7] cos = sin cos – P sin + cos cos 5 The response