--- title: "Factorial kriging to extract the varying coefficients of the linear regression" author: "X. Freulon" date: "14 décembre 2021" output: pdf_document: number_sections: true toc: yes toc_float: true fig_heigth: 4 fig_width: 4 --- # Introduction The objective is to illustrate how to use in *RGeostats* the factorial kriging to extract spatially varying coefficients of a linear regression. We assume that the observed variable $Z$ is modeled by a linear model of some explanatory variable $N$, known every where, and a structured residual $\epsilon(s)$ $$ Z(s) = a(s) + b(s) \times N(s) + \epsilon(s). $$ The coefficients of the linear part, $a(s)$ and $b(s)$ varies spatially, and should be recovered from the observation of $Z$ and $N$ at some locations. In this example, we first simulate the maps of the coefficients, the factor, and the residual, according to some known models. Observations of $Z$ and $N$ are extracted at random locations. Then the factorial kriging is used to compute the parts of the drift: $a(s)\times 1$ and $b(s) \times N(s)$. A moving neighborhood is used enabling a local estimate of the coefficients. ```{r init, echo=FALSE, eval=TRUE} knitr::opts_chunk$set(echo = TRUE) library(RGeostats) set.seed(432431) constant.define("asp",1) rm(list = ls()) ``` # Simulation of reference model A regular grid is defined on $[0,1] \times [0,1]$. ```{r model_simulation, echo=TRUE, eval=TRUE} grd <- db.create(flag.grid = TRUE, x0=c(0.0,0.0), dx=c(0.01,0.01), nx=c(100,100)) # simulation of the model parameters grd <- simtub(dbin = NULL, dbout = grd, mean = 10, model = model.create(vartype = "Gaussian", range = 0.5, sill = 1.0), nbsimu = 2) grd <- db.rename(grd, names = "Simu.V1.S1", newnames = "A") grd <- db.rename(grd, names = "Simu.V1.S2", newnames = "B") grd <- simtub(dbin = NULL, dbout = grd, model = model.create(vartype = "Exponential", range = 0.2, sill = 1.0), nbsimu = 1) grd <- db.rename(grd, names = "Simu.V1.S1", newnames = "epsilon") # A Strictly positive variable grd <- db.add(grd, NbP = 1 + rpois(n = grd$nech, lambda = 2.5)) grd <- db.add(grd, Z = A + B * NbP + epsilon) ``` The varying coefficients are simulated using the a Gaussian variogram with a practical range of 0.5 (1/2 the domain) and a sill equals to 1.0. The mean of the two parameters is set to 10 to get positive values. The coefficient maps are smooth. The spatial correlation of the residual is modeled by an exponential variogram with range 0.2 (1/5 of the domain) and a sill equals to 1.0. ```{r LM_model, echo=FALSE, eval=TRUE, fig.width=8, fig.heigth=4} plot(db.locate(grd, names = "A", loctype = "z", flag.locnew = TRUE), pos.legend = 1) plot(db.locate(grd, names = "B", loctype = "z", flag.locnew = TRUE), pos.legend = 1) ``` The predictor $N$ is a discrete positive variable drawn from a Poisson distribution whose mean is 2.5. ```{r NbP_model, echo=FALSE, eval=TRUE, fig.width=8, fig.heigth=4} plot(db.locate(grd, names = "NbP", loctype = "z", flag.locnew = TRUE), pos.legend = 1) hist(grd$NbP, xlab = "Predictor value", main = "NbP", nclass = 25, col="orange") ``` The final observation field shows few spatial correlation due to its dependence to $N$. However the relation between $Z$ and $N$ is clearly linear. ```{r Z_model, echo=FALSE, eval=TRUE, fig.width=8, fig.heigth=4} plot(db.locate(grd, names = "Z", loctype = "z", flag.locnew = TRUE), pos.legend = 1) corr_NbP <- correlation(grd, name1 = "NbP", name2 = "Z", xlab = "Predictor", ylab = "Observation", flag.regr = TRUE, reg.col = "red") ``` The observation model is not spatially structured but with a strong linear correlation with the predictor: the correlation value is `r round(corr_NbP,2)`. # Data set The observations of $Z$ and $N$ are extracted at random locations (1000 points) in the domain. ```{r data_set, echo=TRUE, eval=TRUE} np <- 1000 sel <- rep(FALSE, grd$nech) sel[sort(sample(x = 1:grd$nech, size = np, replace = FALSE))] <- TRUE data <- db.create(db.extract(db.sel(grd,sel), names = c("x1", "x2", "NbP", "Z"), flag.compress = TRUE)) ``` # Kriging Parameters In order to extract the local coefficient of the linear model, the kriging with external drift is performed with a moving neighborhood and the variogram model of the residual $\epsilon$. Results are stored in the dedicated grid. For the kriging external drift, the drift variable has be specified both in the input and output data bases. ```{r kriging_parameter, echo=TRUE, eval=TRUE} m_Res <- model.create(vartype = "Exponential", range = 0.2, sill = 1.0) neigh <- neigh.create(ndim = 2, type = 2, nmini = 10, nmaxi = 25, radius = 0.25) # selection of the input variables data <- db.locate(db = data, names = "Z", loctype = "z", flag.locnew = TRUE) data <- db.locate(data, names = "NbP", loctype = "f", flag.locnew = TRUE) # selection of the drift variable grd <- db.locate(grd, names = "NbP", loctype = "f", flag.locnew = TRUE) res <- grd # Results in a different db. ``` # Kriging of A The intersect coefficient is the *first* component of the drift to be extracted. The KED is performed with the *drift.extract* parameter set to 1. ```{r kriging_of_A} res <- kriging(dbin = data, dbout = res, model = m_Res, neigh = neigh, uc = c("1", "f1"), flag.est = TRUE, flag.std = TRUE, cov.extract = 0, drift.extract = c(1), radix = "A" ) ``` ```{r A_corr, echo=FALSE, eval=TRUE, fig.width=4, fig.heigth=4} corr_A <- correlation(res, name2 = "A", name1 = "A.Z.estim", xlab = "Estimated value", ylab = "Actual value", flag.iso = TRUE, flag.diag = TRUE, diag.col = "red") ``` The correlation between the actual value and the predicted value is `r round(corr_A,2)`. The comparison of the initial map of $a$ and its estimated value is presented below. ```{r A_maps, echo=FALSE, eval=TRUE, fig.width=8, fig.heigth=4} plot(db.locate(res, names = "A", loctype = "z", flag.locnew = TRUE), title = "Actual value") plot(db.locate(res, names = "A.Z.estim", loctype = "z", flag.locnew = TRUE), title = "Predicted value") ``` # Kriging of B ```{r kriging_of_B} res <- kriging(dbin = data, dbout = res, model = m_Res, neigh = neigh, uc = c("1", "f1"), flag.est = TRUE, flag.std = TRUE, cov.extract = 0, drift.extract = c(2), radix = "B.NbP" ) res <- db.add(res, B.estim = B.NbP.Z.estim / NbP) ``` ```{r B_corr, echo=FALSE, eval=TRUE, fig.width=4, fig.heigth=4} corr_B <- correlation(res, name2 = "B", name1 = "B.estim", xlab = "Estimated value", ylab = "Actual value", flag.iso = TRUE, flag.diag = TRUE, diag.col = "red") ``` The correlation between the actual value and the predicted value is `r round(corr_B,2)`. The comparison of the initial map of $b$ and its estimated value is presented below. Note that the factorial kriging extract the drift, i.e. $b(s) \times N(s)$. To recover the coefficient the estimated drift is divided by the predictor. The coefficient is undefined when the predictor $N$ is null. ```{r B_maps, echo=FALSE, eval=TRUE, fig.width=8, fig.heigth=4} plot(db.locate(res, names = "B", loctype = "z", flag.locnew = TRUE)) plot(db.locate(res, names = "B.estim", loctype = "z", flag.locnew = TRUE)) ```