---
title: "Variance of Measurement Error"
author: "D. Renard"
date: "28 juin 2021"
output:
  pdf_document: default
  html_document: default
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(RGeostats)
rm(list=ls())
set.seed(432431)
constant.define("asp",1)
```

## Introduction

This note is meant to illustrate the use of the **Variance of Measurement Error** and answer to the question asked by Pauline Kiss.

## Data Base

We first create the Data Base by simulating a Model. This Model is created as a mixture of an isotropic covariance (i.e. cubic) and a noise component (i.e. nugget effect). The newly created variable in the input data base (*db*) is named **Data**, defined on *nech* samples (this number has been voluntarily set to a small number in order to be able to display and check the contents of the kriging matrices).
We also create a grid defining the field of interest as a square of dimension *size* (with a mesh of 1). This conditions the range of the structured part of the Model (i.e. 30).

```{r}
size = 100
grid = db.create(nx=c(size,size),dx=c(1,1))

nech = 12
db = db.create(x1=size*runif(nech),x2=size*runif(nech))

model = model.create(vartype="Nugget Effect",sill=0.2)
model = model.create(vartype="Cubic",range=30,sill=0.8,model=model)
plot(model,xlim=c(0,50))

db = simtub(,db,model)
db = db.rename(db,db$natt,"Data")
print(db)
```

## Non-conditional simulation over the grid

We first perform a non-conditional simulation of the Model on the grid to get an idea of the texture corresponding to the model. The variable containing the non-conditional simulation outcome is called *NC Simu*. We can check the presence of the nugget effect component (salt-and-pepper noise).

```{r}
grid = simtub(,grid,model)
grid = db.rename(grid,grid$natt,"NC Simu")
plot(grid,title="Non-conditional Simulation")
```

## Performing the estimation

We perform the estimation using Kriging. The neighborhood is set to the *Unique*  neighborhood due to the small number of data samples.

```{r}
neigh = neigh.create(type=0,ndim=2)
grid = kriging(db,grid,model,neigh,radix="KRIG")
plot(grid,name="KRIG.Data.estim",title="Estimation",pos.legend=1)
plot(db,add=T,pch=21,bg="white",col="black")
```

The estimation has been performed using the Model with 20% of nugget effect (named **model2**). Due to the presence of this important non structured part, the estimation is smoothed out. This result will be referred to as **Kriging_1** further in this paper.

For better understanding, we perform a second estimation with a model which does not contain the nugget effect part

```{r}
model2 = model.create(vartype="Cubic",range=30,sill=0.8)
grid = kriging(db,grid,model2,neigh,radix="KNOSMOOTH")
plot(grid,name="KNOSMOOTH.Data.estim",title="Estimation without Nugget Effect",
     pos.legend=1)
plot(db,add=T,pch=21,bg="white",col="black")
```

A slight deviation can be observed which will be illustrated by the next scatter plot comparing both estimations. The difference is small in terms of estimation. It is slightly more visible for the standard deviation maps.

```{r}
dum = correlation(grid,"KRIG*estim","KNOSMOOTH*estim",
            title="Estimations (Models with and without Nugget Effect)",
            flag.diag=TRUE,flag.iso=TRUE)
dum = correlation(grid,"KRIG*stdev","KNOSMOOTH*stdev",
            title="St. deviations (Models with and without Nugget Effect)",
            flag.diag=TRUE,flag.iso=TRUE)
```

Let us have a closer look to the kriging matrix for the estimation of a single node (say the node ranked 123).

```{r}
debug.reference(123)
dum = kriging(db,grid,model,neigh)
```

We can check that:

- the L.H.S matrix is dimensions to 13: 12 samples + 1 universality drift condition
- their is a unique R.H.S vector as a single estimation is performed (it would have been different in the multivariate case)

Focusing on the L.H.S, we can see that the kriging matrix is written in **covariance** (even if the model has been defined in **variogram**): this trick tends to stabilize the inversion of the Kriging matrix. Let us refer to A[i,j] the value of the kriging matrix for row 'i' and column 'j'.
We can also see that the A[i,i] is equal to 1 which is the total sill of the matrix whereas the value of A[3,2] is the value of the structured part of the model (i.e. cubic). This makes sens as the distance between sample 3 and 2 being non zero, the influence of the nugget effect disappears.

As a proof, we can check the same output when using *model2*.

```{r}
debug.reference(123)
dum = kriging(db,grid,model2,neigh)
```

There is no difference for A[3,2] (same value for the cubic part of the Model). The only difference stands in the diagonal where A[i,i] is now 0.8 (total sill of *model2*).

## Using variance of measurement error

In this part, we consider that the nugget effect reflects the presence of a non-structured noise on the variable (this is one of the traditional definition of the nugget effect). This noise component is defined by its variance. Moreover, we consider that the variance of this noise component is the same at all samples and precisely equal to the amount of nugget effect in *model*.

```{r}
db = db.add(db,varErr=0.2)
db = db.locate(db,"Data","z")
db = db.locate(db,"varErr","v")
```

So we can now perform the same kriging as the "Kriging-1" by using the data set adding the noise contribution and therefore removing this noise from the model (i.e. using *model2*  instead).

We can first check the kriging matrix for our favorite target node

```{r}
dum = kriging(db,grid,model2,neigh)
```

We can see that the matrix contents is **the same** as the one that we obtained previously using **model**. This means that each diagonal term of the matrix A[i,i] has been complemented by the variance of the specific noise carried by the sample/

Just as a global check, we can run the previous estimation step on the whole grid and compare the results withe the results of "Kriging_1" using the scatter plot facility.

```{r}
debug.reference(-1)
grid = kriging(db,grid,model2,neigh,radix="KWITHV")
correlation(grid,"KRIG*estim","KWITHV*estim",
            title="Estimations (Data with constant VE; Model without Nugget Effect)",
            flag.diag=TRUE,flag.iso=TRUE)
correlation(grid,"KRIG*stdev","KWITHV*stdev",
            title="St. deviations (Data with constant VE; Model without Nugget Effect)",
            flag.diag=TRUE,flag.iso=TRUE)
```

Note that the variance of estimation is not the same (although it is linearly dependent): this makes sense as the Model has changed.

## Main interest of the Variance of Error Measurement

The key aspect of this parameter is clearly to make this value sample-dependent. We can then mix samples of different quality where each sample is attributed an individual noise (or measurement error). Let us recall that this quantity must be specified in terms of its variance.

In our example, we will focus on one sample in particular: the sample #5, located in the middle of the figure (X=48.238 and Y=51.854]) as shown in the following printout

```{r}
print(db,flag.print=TRUE)
```

We now simply enlarge the variance of Measurement Error attached to this sample: we raise it from 0.2 (initial value) to 1; and check the result on the following printout again. Note that this sample corresponds to the data value 1.448 (which happens to be the largest one for the whole data set)

```{r}
db[5,"varErr"] = 1
print(db,flag.print=TRUE)
```

Before running the estimation step on the whole grid, we first check the new L.H.S. of the kriging matrix.

```{r}
debug.reference(123)
dum = kriging(db,grid,model2,neigh)
```

We clearly see that the term A[5,5]=1.8 when all the other diagonal terms are still equal to 1.

It is now time to run the ultimate kriging step with this variance of Measurement error modified for this sample in particular. 

```{r}
debug.reference(-1)
grid = kriging(db,grid,model2,neigh,radix="KINDV")
plot(grid,name="KINDV.Data.estim",
     title="Estimation (Data with specific VE at #5; Model without Nugget Effect)",
     pos.legend=1)
plot(db,add=T,pch=21,bg="white",col="black")
```

We can check that the importance of the central sample has been strongly reduced. As a consequence, the range of estimated values is now lying between -1 and 0.51 (it was between -0.98 and 1.12 in the initial kriging step).

Finally, it is interesting to return to the various printouts and check the kriging weight attached to the sample #5 for all krigings.

- Kriging with Model including Nugget Effect (*model*) : 0.072
- Kriging with Model without Nugget Effect (*model2*) : 0.068
- Kriging with a constant VE (equal to 0.2) and Model without Nugget Effect (*model2*) : 0.072
- Kriging with VE equal to 1 at point #5 and Model with no Nugget Effect (*model2*): 0.042

It is worth noticing that this last configuration is not accurately comparable with the previous ones. As a matter of fact, when spreading the noise of the data over all the samples, we should use a nugget effect whose value is set to the average noise per sample. In the latter case, when raising the variance of measurement error of sample #5 to 1, we should compare with an estimation where the value of the Nugget Effect of the model has been set to:

(11 * 0.2 + 1) / 12 = 0.2667

This would probably not change the results drastically.