This document describes a case study which covers most of the geostatistical features available in RGeostats. It is based on a set of 102 isolated points (2-D space) in a Pollution data set collected on the Geneva area where two variables have been sampled, corresponding to the grades of Zn and Pb. The initial data is stored in the ASCII file called Pollution.dat.

This case study is designed in order to illustrate the use of geostatistics for processing variography, variogram fitting, cross-validation, kriging estimation, gaussian anamorphosis transformation and conditional simulation along a 2-D data set.

Loading the Information

Reading the external file

The first step consists in reading the information from the ASCII file and build the corresponding data base. The file Pollution.dat is organized as a spreadsheet with four columns. The first line contains the names of each column. Note that the fourth column contains the string "-1" (sample #3) which denotes the absence of information.

Exdemo_2D_pollution.table =
  read.table("Pollution.dat",header=T,na.strings="-1")

The result is a data.frame with four columns called X, Y, Pb and Zn. The third sample of variable Zn is set to NA.

For convenience, this file has already been stored as an R object. For retrieving it, it suffices to type the following command:

data(Exdemo_2D_pollution.table)

If the ASCII file did not have a header line, the data frame would have been created anyway, but with names defaulted to V1, V2, ...

Creating the Db

data.db <- db.create(Exdemo_2D_pollution.table,
                     flag.grid=FALSE,ndim=2,autoname=F)

This command creates a new Db structure called data.db containing an unstructured information. Its summarized contents can be viewed by simply typing the name of the Db.

print(data.db)

## 
## Data Base Characteristics
## =========================
## 
## Data Base Summary
## -----------------
## File is organized as a set of isolated points
## Space dimension              = 2
## Number of fields             = 5
## Maximum Number of attributes = 5
## Total number of samples      = 102
## 
## Variables
## ---------
## Field =   1 - Name = rank - Locator = NA
## Field =   2 - Name = X - Locator = x1
## Field =   3 - Name = Y - Locator = x2
## Field =   4 - Name = Pb - Locator = z1
## Field =   5 - Name = Zn - Locator = NA

The result is the following print where one can easily check that the DB is a 2-D one with a total of 5 attributes and contains 102 samples. By default:

the field 1 is created automatically (with the name set to rank) and contains the rank of each sample (starting from 1)
the ndim next fields are the coordinates
the next fields are the variables of interest (i.e. the variables Pb and Zn)

The previous printout corresponds to the result of the command print The user can get more information by modifying the arguments: for example, setting the argument flag.stats to TRUE, the user gets the statistics for all the fields (only the ones of variables Pb and Zn are given next). These statistics can be restrained by adding name=Pb (we can also define this restriction as name=4 as the field of interest (Pb) is stored in the field #4).

print(data.db,flag.stats=TRUE,names=4)

## 
## Data Base Characteristics
## =========================
## 
## Data Base Summary
## -----------------
## File is organized as a set of isolated points
## Space dimension              = 2
## Number of fields             = 5
## Maximum Number of attributes = 5
## Total number of samples      = 102
## 
## Data Base Statistics
## --------------------
## 4 - Name Pb - Locator z1
##  Nb of data          =        102
##  Nb of active values =        102
##  Minimum value       =      1.090
##  Maximum value       =     33.200
##  Mean value          =      3.638
##  Standard Deviation  =      4.705
##  Variance            =     22.138
## 
## Variables
## ---------
## Field =   1 - Name = rank - Locator = NA
## Field =   2 - Name = X - Locator = x1
## Field =   3 - Name = Y - Locator = x2
## Field =   4 - Name = Pb - Locator = z1
## Field =   5 - Name = Zn - Locator = NA

Data management

The data base data.db can be managed using the set of dedicated tools.

Modifying the variable role

It is now time to introduce the role that each field must play: this is defined using its locator. For example, we can check that the fields 2 and 3 are respectively the first and second coordinates: the locators are x1 and x2. Similarly if we consider Zn as the variable of interest, we set its locator to z1. The variable Pb is kept unclassified with its locator equal to NA. Setting this locator is performed using the following command:

data.db = db.locate(data.db,"Zn","z",1)

Viewing the Db

We can now visualize the contents of the Db graphically. The plot represents the samples with a symbol size is proportional to the concentration of the Pb variable (which corresponds to the z1 locator). The generic symbol is a circle filled with black color. This is obtained by typing the following command:

plot(data.db,pch=21,bg="red",col="black",
     title="Zn Samples location")

Performing a selection

We want to create a selection which suppresses the largest values of the variable Zn (which lies between 1.09 and 33.2) say above 20.

data.db = db.sel(data.db,Zn<20)

The new data base data.db now contains a new field called sel which will serve as a selection through which any access to the data base must be performed.

For example, if we perform the statistics on the data base (restricting them to the variable Zn selected by its name this time):

print(data.db,flag.stats=T,names="Zn")

## 
## Data Base Characteristics
## =========================
## 
## Data Base Summary
## -----------------
## File is organized as a set of isolated points
## Space dimension              = 2
## Number of fields             = 6
## Maximum Number of attributes = 6
## Total number of samples      = 102
## Number of active samples     = 99
## 
## Data Base Statistics
## --------------------
## 5 - Name Zn - Locator z1
##  Nb of data          =        102
##  Nb of active values =         99
##  Minimum value       =      3.000
##  Maximum value       =     12.700
##  Mean value          =      5.650
##  Standard Deviation  =      1.707
##  Variance            =      2.913
## 
## Variables
## ---------
## Field =   1 - Name = rank - Locator = NA
## Field =   2 - Name = X - Locator = x1
## Field =   3 - Name = Y - Locator = x2
## Field =   4 - Name = Pb - Locator = NA
## Field =   5 - Name = Zn - Locator = z1
## Field =   6 - Name = sel - Locator = sel

We can check that the variable Zn now is reduced to 99 measurements and varies between 3.00 and 12.7. At the same time, the variable Pb would vary from 1.09 to 12.1. This selection remains active as long as its locator contains the string sel.

Performing a transformation

We will create a new variable, called Zn.transformed, which will contain the result of a transformation performed on the all the active (non masked) samples of the data base. Note that this command (as well as the db.sel one) allows naming the variables without having to place the names between quotes:

data.db = db.add(data.db,Zn.transformed=Zn+Pb+2)

The transformation can be double-checked by checking the contents of the data base. When running the following command:

print(data.db,flag.print=TRUE)

## 
## Data Base Characteristics
## =========================
## 
## Data Base Summary
## -----------------
## File is organized as a set of isolated points
## Space dimension              = 2
## Number of fields             = 7
## Maximum Number of attributes = 7
## Total number of samples      = 102
## Number of active samples     = 99
## 
## Data Base Contents
## ------------------
##                 rank         X         Y        Pb        Zn       selZn.transfo
##                 [,1]      [,2]      [,3]      [,4]      [,5]      [,6]      [,7]
##       [1,]     1.000   119.500   509.330     2.150     4.600     1.000     8.750
##       [2,]     2.000   120.450   510.000     2.480     4.500     1.000     8.980
##       [4,]     4.000   121.600   510.940     2.210     4.700     1.000     8.910
##       [5,]     5.000   121.850   511.930     2.080     5.400     1.000     9.480
##       [6,]     6.000   122.130   510.680     2.250     4.200     1.000     8.450
##       [7,]     7.000   121.980   509.990     2.130     5.100     1.000     9.230
##       [8,]     8.000   121.770   508.790     2.480     5.000     1.000     9.480
## (Ncol=7,Nrow=7[from 102])
## 
## Variables
## ---------
## Field =   1 - Name = rank - Locator = NA
## Field =   2 - Name = X - Locator = x1
## Field =   3 - Name = Y - Locator = x2
## Field =   4 - Name = Pb - Locator = NA
## Field =   5 - Name = Zn - Locator = NA
## Field =   6 - Name = sel - Locator = sel
## Field =   7 - Name = Zn.transformed - Locator = z1

We obtain the complete printout of the different fields of the data base. However note that this printout is limited (both in terms of number of lines and columns) depending on the environment bounding parameters. We obtain the following printout.

We can first check that number of active samples is 99, out of the initial 102. Secondly, the sample, whose Zn value (column 4) is equal to 33.2 is not visible as the value is larger than 20: for this sample, the value Zn.transformed has not been calculated. Finally, for the first sample, the transformed value (8.75) corresponds to the sum of the Zn value (2.15), the Pb value (4.6) and the constant (2).

Statistical inference

We now wish to calculate the experimental variogram on the target variable Zn using the previously defined selection, with different options before we define the characteristics of the structural model.

Calculating the experimental variogram

The first task is to calculate 10 lags of the omni-directional variogram established for a lag distance of 1km:

data.vario = vario.calc(data.db,lag=1,nlag=10)

This command produces a variogram file (stored in the object data.vario which belongs to the vario class). Its contents can be printed by typing the name of the variogram file:

Visualizing the experimental variogram

We can visualize the contents of the experimental variogram (using the generic plot function) where each lag is represented by a symbol proportional to the number of pairs. The number of pairs is also displayed. This is obtained by typing:

plot(data.vario,npairdw=TRUE,npairpt=1,title="Experimental Variogram")

Directional Variograms

We must also look for a possible anisotropy. For that sake, we will calculate the experimental in four regular directions, characterized by the angles 0, 45, 90 and 135 degrees, for the same number of lags and lag dimension.

data.4dir.vario =
  vario.calc(data.db,lag=1,nlag=10,dir=c(0,45,90,135))

The new variogram file data.vario.4dir (which belongs to the vario class) contains the 4 directions. Its contents will not be displayed in this documentation. Instead, all the directions are overlaid on the same figure with different colors. It is visualized using the command:

plot(data.4dir.vario,title="Directional variograms")

If we wanted to display only one direction (say the direction #2), we should call:

plot(data.4dir.vario,idir0=2,
     title="Directional variogram #2")

An obvious conclusion is that the experimental variogram calculated in 4 directions does not show a significant anisotropy. Therefore, it is sufficient to fit the omni-directional variogram.

Fitting the model

As no anisotropy has been retained, the omni-directional variogram is used and fitted with a model. The model is fitted using an automatic procedure, and ultimately stored in the specific model file called data.model (belonging to the model class):

data.model = 
  model.auto(data.vario,
             struct=c("Spherical","Exponential"),
             title="Modelling omni-directional variogram")

The interactive procedure uses a graphic output where the user can check graphically the quality of the fit, as shown in the next figure:

We can also check the contents of the model by typing its name.

data.model

## 
## Model characteristics
## =====================
## Space dimension              = 2
## Number of variable(s)        = 1
## Number of basic structure(s) = 1
## Number of drift function(s)  = 1
## Number of drift equation(s)  = 1
## 
## Covariance Part
## ---------------
## - Exponential
##   Range       =      4.306
##   Theo. Range =      1.438
##   Sill        =      7.133
## Total Sill    =      7.133
## 
## Drift Part
## ----------
## Universality Condition

Note that the spherical component has been discarded from the initial choice of basic model defined as arguments of the automatic fitting procedure.

Cross-validating the structure

In this paragraph, we double-check the model against the data by performing a blind test: each datum is suppressed in turn and estimated from its neighborhood by kriging. The estimated value is compared to the actual data. This error is compared to the one predicted by the model.

This procedure requires the prior definition of a neighborhood which defines the way samples are selected to estimate each target data in turn. The neighborhood can be:

the whole set of samples: Unique Neighborhood
a selection of the closest samples to the target: Moving Neighborhood

Creating a unique neighborhood

We must create the object belonging to the neigh class, called Exdemo_2D_unique.neigh, which contains the characteristics of the unique neighborhood.

Exdemo_2D_unique.neigh = neigh.create(ndim=2, type=0)

For this first trial, we will simply select the Unique Neighborhood solution where each grid node will be estimated using the whole set of data points available.

The contents of this neighborhood file can be checked by typing its name.

Exdemo_2D_unique.neigh

## 
## Neighborhood characteristics
## ============================
## Unique neighborhood option
## Space dimension = 2

Creating a moving neighborhood

The moving neighborhood is stored in the object belonging to the neigh class, called Exdemo_2D_moving.neigh, which contains the characteristics of the moving neighborhood:

Exdemo_2D_moving.neigh = 
  neigh.create(ndim=2,nmaxi=10,radius=20,
               flag.sector=TRUE,nsect=8,nsmax=2)

In this interactive procedure, we choose to select at most 10 samples located at less than 20m from the target point. For this selection, samples are sorted into 8 angular sectors (or octants), and up to 2 samples are selected in each octant (if available).

The neighborhood contents can be checked by typing its name.

Exdemo_2D_moving.neigh

## 
## Neighborhood characteristics
## ============================
## Moving neighborhood option
## Space dimension                     = 2
## Minimum number of samples           = 1
## Maximum number of samples           = 16
## Number of angular sectors           = 8
## Maximum number of points per sector = 2
## Maximum horizontal distance         = 20.000000

Cross-validation

We now perform the cross-validation of the structure against the data using the moving neighborhood (this feature currently does not function with a unique neighborhood).

data.db = xvalid(data.db,data.model,
                 Exdemo_2D_moving.neigh)

This function creates the two following variables:

Xvalid.Zn.esterr: cross-validation error for variable Zn
Xvalid.Zn.stderr: normalized cross-validation error for variable Zn

These two variables are considered as the target variables.

We can check the histogram of the cross-validation error. It suffices to extract the corresponding vector, to calculate its histogram (30 classes) and finally to represent it graphically:

hist(db.extract(data.db,"Xvalid.Zn.*esterr"), nclass=30, 
     main="Histogram of Zn Error", 
     xlab="Cross-validation error", col="blue")

The cross-validation procedure shows that most of the data have a re-estimation standardised error lying in the interval [-2.5;+2.5], which is an indication of a good match between the data, the fitted model and the chosen neighborhood.

Estimation

We must first designate the set of locations where the estimation must be performed. In this case study, we choose to perform the estimation at the nodes of a regular grid.

Creating a regular 2-D grid

We want to estimate an area which covers the initial data set. This area is considered as a rectangle with 100 cells along the horizontal axis and 90 along the vertical axis.

grid.db = db.grid.init(data.db,nodes=c(100,90))

The characteristics of the newly created grid file grid.db are:

grid.db

## 
## Data Base Characteristics
## =========================
## 
## Data Base Summary
## -----------------
## File is organized as a regular grid
## Space dimension              = 2
## Number of fields             = 3
## Maximum Number of attributes = 3
## Total number of samples      = 9000
## Grid characteristics:
## Origin :    109.850   483.660
## Mesh   :      0.335     0.330
## Number :        100        90
## Angles :      0.000     0.000
## 
## Variables
## ---------
## Field =   1 - Name = rank - Locator = NA
## Field =   2 - Name = x1 - Locator = x1
## Field =   3 - Name = x2 - Locator = x2

Note that the variables with locator rank (for sample rank), x1 and x2 (for coordinates) have been created automatically. The mesh of a grid cell is almost square.

If needed, we could have created the grid using the function db.create defining the origin, the cell size and the number of cells along each axis.

Estimation using Unique neighborhood

Performing the estimation

The following step calculates the punctual estimation (using the Ordinary kriging and the Unique neighborhood) at the nodes of the grid.

We must first define the target variable in the input file, cancelling all z locators first (the fields 8 and 9 correspond to the variables generated by the cross-validation step) and setting the locator z1 to the variable Zn:

data.db <- db.locerase(data.db,"z")
data.db <- db.locate(data.db,"Zn","z")
data.db

## 
## Data Base Characteristics
## =========================
## 
## Data Base Summary
## -----------------
## File is organized as a set of isolated points
## Space dimension              = 2
## Number of fields             = 9
## Maximum Number of attributes = 9
## Total number of samples      = 102
## Number of active samples     = 99
## 
## Variables
## ---------
## Field =   1 - Name = rank - Locator = NA
## Field =   2 - Name = X - Locator = x1
## Field =   3 - Name = Y - Locator = x2
## Field =   4 - Name = Pb - Locator = NA
## Field =   5 - Name = Zn - Locator = z1
## Field =   6 - Name = sel - Locator = sel
## Field =   7 - Name = Zn.transformed - Locator = NA
## Field =   8 - Name = Xvalid.Zn.transformed.esterr - Locator = NA
## Field =   9 - Name = Xvalid.Zn.transformed.stderr - Locator = NA

The kriging procedure calculates simultaneously the estimation and the standard deviation (square root of the variance) of the estimation error.

grid.db = kriging(data.db,grid.db,data.model,
                  Exdemo_2D_unique.neigh,radix="KU.Part")

The variables are created with generic names:

KU.Part.Zn.estim for the estimation of the input variable Zn
KU.Part.Zn.stdev for the standard deviation of the estimation error for the input variable Zn

The two variables are set as the current variables.

Displaying the estimation result

It makes sense to visualize the result of the estimation step by visualizing the grid results. We choose the overlay the following items:

the estimation as a color image
the standard deviation of the estimation error using contour lines
the data information

This is obtained by typing the following commands:

plot(grid.db,name.image="KU.Part.Zn.estim",  
     title = "Estimation - Data subset (Unique Neighborhood)")
plot(grid.db,name.contour="KU.Part.Zn.stdev",nlevels=10,add=TRUE)
plot(data.db,pch=21,bg="yellow",col="black",add=TRUE)

Another display is a perspective view obtained using the following command:

plot(grid.db,name.persp="KU.Part.Zn.estim",theta=45,phi=30,
     zlim=c(3,25),
     title = "Estimation - Data subset (Unique Neighborhood)")

Suppressing the selection on data

The previous estimation has been performed using only the 99 non masked Zn samples. We can now perform the estimation step again, but using the whole data set.

We choose to use the same model, even though we must remember that this model, which has been fitted on the 99 data only, is not valid and should be updated.

To suppress the selection, we can delete the corresponding field or invalidate it by setting its locator to NA.

data.db = db.sel(data.db)

We can now perform the estimation again:

grid.db = kriging(data.db,grid.db,data.model,
                  Exdemo_2D_unique.neigh,radix="KU.All")

The use of the radix enables to create a second set of variables avoiding the confusion with the already existing variables. Hence the resulting variables:

KU.All.Zn.estim for the estimation of the input variable Zn using all the data
KU.All.Zn.stdev for the standard deviation of the estimation error for the input variable Zn using all the data

These two variables become the new current variables (their locators are set to z1 and z2). The previous current variables are declassified.

We can now visualize the results using the same command:

plot(grid.db,name.image="KU.All.Zn.estim",  
     title = "Estimation - All Data (Unique Neighborhood)")
plot(grid.db,name.contour="KU.All.Zn.stdev",nlevels=10,add=TRUE)
plot(data.db,pch=21,cex1=0.5,bg="yellow",col="black",add=TRUE)

Another display is a perspective view obtained using the following command:

plot(grid.db,name.persp="KU.All.Zn.estim",theta=45,phi=30,
     zlim=c(3,25),
     title = "Estimation - All Data (Unique Neighborhood)")

Estimation using Moving neighborhood

Performing the estimation

When the number of data information is too large, the kriging system (whose dimension is equal to the number of samples) can simply not be established, or not be inverted in a reasonable computing time with a reasonable accuracy. Therefore, when estimating a target point, a subset of the information, centered on the target point, is selected: this corresponds to a moving neighborhood.

We now want to perform an estimation by Ordinary Kriging using the moving neighborhood (stored in the neigh class called Exdemo_2D_moving.neigh):

grid.db = kriging(data.db,grid.db,data.model,
                  Exdemo_2D_moving.neigh,radix="KM.All")

The previous command calculates the punctual estimation at the nodes of the grid. Simultaneously, it also calculates the standard deviation of the estimation error.

The variables are created with generic names:

KM.All.Zn.estim for the estimation of the input variable Zn
KM.All.Zn.stdev for the standard deviation of the estimation error for the input variable Zn

The two new variables are set as the current variables. Note that they have the same name as the ones estimated using the Unique Neighborhood.

Displaying the estimation result

It is now time to visualize the result of the estimation step by visualizing the grid results. We choose the same representation as for the estimation performed using the Unique Neighborhood.

This is obtained by typing the following commands:

plot(grid.db,name.image="KM.All.Zn.estim", 
     title = "Estimation - Data subset (Moving Neighborhood)")
plot(grid.db,name.contour="KM.All.Zn.stdev",nlevels=10,add=TRUE)
plot(data.db,pch=21,cex1=0.5,bg="yellow",col="black",add=TRUE)

We can check the various zebra signs which correspond to the neighborhood artifacts. In order to reduce them, it would suffice to increase the size of the neighborhood radius together with the number of samples selected in each neighborhood.

Testing the Moving Neighborhood

The previous figure shows patterns which denote the poor quality of the moving neighborhood. We can check this by using the specific function:

grid.db = neigh.test(data.db,grid.db,data.model,
                     Exdemo_2D_moving.neigh,radix="Moving")

This procedure creates the following variables:

The number of selected active samples in the neighborhood
The distance to the furthest sample in the neighborhood
The distance to the closest sample in the neighborhood
The number of non-empty angular sectors
The number of consecutive empty sectors

As no specific radix has been specified, the variables are automatically named Moving.Zn. where the label explains the output variable meaning.

The following figure shows the largest distance, the smallest distance, the number of neighbors and the number of non-empty angular sectors.

plot(grid.db,name="Moving.Zn.max.radius",
     title="Largest Distance")

plot(grid.db,name="Moving.Zn.min.dist",
     title="Smallest Distance")

plot(grid.db,name="Moving.Zn.nb.samples",
     title="Number of selected samples")

plot(grid.db,name="Moving.Zn.nb.nonempty.sect",
     title="Number of non-empty consecutive sectors")

Simulations

In this section, we wish to perform several simulations of the variable Zn conditioned by the available data (after the 99 selection has been reset). The simulation is a technique which enables to produce several outcomes of the target variable. These simulations are calculated using the Turning Bands algorithm.

This method requires the definition of:

the number of turning bands
the seed for the generator of the random numbers

Essential remark: Gaussian transform

An essential remark is that this method is designed for the multi-gaussian framework. This property implies that the conditioning data must be converted into gaussian values before they are used for conditioning. Consequently the results must be back transformed into the raw scale afterwards.

Fitting the Gaussian Anamorphosis function

The first task consists in fitting the anamorphosis function which enables the transformation of a variable, from the raw scale to the gaussian scale and vice-versa. We fit this function for the variable Zn by typing the following command:

data.anam = anam.fit(data.db,"Zn",title="Anamorphosis Function")

The quality of the fit can also be checked graphically, by comparing the experimental distribution (step function) to the model (solid line). The result consists in the set of coefficients of the extension of the anamorphosis function, in Hermite polynomials. The number of polynomials is set to 30. The set of coefficients can be printed out upon request:

print(data.anam)

## 
## Anamorphosis characteristics
## ============================
## Hermitian anamorphosis
## Number of Hermite polynomials = 30 
## Mean                          = 6.096534
## Variance                      = 12.609583
## Change of Support Coefficient = 1.000000
## Minimum absolute value for Y  = -1.900000
## Maximum absolute value for Y  = 2.500000
## Minimum absolute value for Z  = 3.056264
## Maximum absolute value for Z  = 31.615885
## Minimum practical value for Y = -1.900000
## Maximum practical value for Y = 2.500000
## Minimum practical value for Z = 3.056264
## Maximum practical value for Z = 31.615885
## 
## Normalized coefficients for Hermite polynomials
##   0+     6.096534  -2.589305   1.721129  -1.176485   0.403345
##   5+     0.421460  -0.633803   0.183777   0.339873  -0.392919
##  10+    -0.010639   0.345044  -0.199840  -0.189154   0.275258
##  15+     0.024201  -0.249784   0.099291   0.168161  -0.165104
##  20+    -0.068779   0.177368  -0.021795  -0.149932   0.088824
##  25+     0.099303  -0.127098  -0.040519   0.138009  -0.014836

Transforming raw data into gaussian

The next operation converts the raw data Zn into its gaussian transform Gaussian.Zn, using the anamorphosis function data.anam, by typing:

data.db = anam.z2y(data.db,"Zn",anam=data.anam)

It is interesting to check the characteristics of the anamorphosis transform: the extreme practical values for Z are 3.08 to 12.98. This means that the simulated values (after the back-transform from gaussian to the raw scale) will never exceed these values. It is also interesting to check the statistics of the newly created gaussian variable. It is defined on 101 samples and varies between -1.9 and 2.5:

print(data.db,flag.stats=TRUE,names="Gaussian.Zn")

## 
## Data Base Characteristics
## =========================
## 
## Data Base Summary
## -----------------
## File is organized as a set of isolated points
## Space dimension              = 2
## Number of fields             = 10
## Maximum Number of attributes = 10
## Total number of samples      = 102
## 
## Data Base Statistics
## --------------------
## 10 - Name Gaussian.Zn - Locator z1
##  Nb of data          =        102
##  Nb of active values =        101
##  Minimum value       =     -1.900
##  Maximum value       =      2.499
##  Mean value          =      0.016
##  Standard Deviation  =      0.977
##  Variance            =      0.954
## 
## Variables
## ---------
## Field =   1 - Name = rank - Locator = NA
## Field =   2 - Name = X - Locator = x1
## Field =   3 - Name = Y - Locator = x2
## Field =   4 - Name = Pb - Locator = NA
## Field =   5 - Name = Zn - Locator = NA
## Field =   6 - Name = sel - Locator = NA
## Field =   7 - Name = Zn.transformed - Locator = NA
## Field =   8 - Name = Xvalid.Zn.transformed.esterr - Locator = NA
## Field =   9 - Name = Xvalid.Zn.transformed.stderr - Locator = NA
## Field =  10 - Name = Gaussian.Zn - Locator = z1

Structural analysis

The new gaussian variable is defined as the target variable. We can now calculate the omni-directional variogram for 10 lags of 1km each, by typing:

data.g.vario = vario.calc(data.db,nlag=10,lag=1)

The experimental variogram is then fitted using an isotropic exponential model:

plot(data.g.vario,npairdw=TRUE,npairpt=1,
     title="Variogram of Zn Gaussian")

The previous model can be fitted using the automatic fitting procedure.

data.g.model = model.auto(data.g.vario,struct=c("Exponential"),
                          title="Model for Zn Gaussian")

Conditional simulations

A set of 10 simulations is performed using the Turning Bands algorithm (with 100 bands). This is performed with the following command line:

grid.db = simtub(data.db,grid.db,data.g.model,
                 Exdemo_2D_unique.neigh,nbsimu=10,nbtuba=100)

These simulations are stored in the grid output file, called Simu.Gaussian.Zn-Si where i corresponds to the outcome number. These variables are defined as target variables. The simulation outcomes are given in the gaussian scale. They must be converted back to the raw scale using the anamorphosis function data.anam again. The easiest way to name the input variables are to specify them using their generic name (i.e. "Simu.Gaussian.Zn"):

grid.db = anam.y2z(grid.db,names="Simu.Gaussian.Zn*",anam=data.anam)

Several outcomes are visualized using the following procedure. For comparison purpose, all the outcomes are displayed using the same color codes: values are represented between 3 and 13. Finally the data points are overlaid. The following pictures correspond to the simulations 1 and 2.

plot(grid.db,name.image="Raw.Simu.Gaussian.Zn.S1",
     zlim=c(3,13),title="Simulation #1")
plot(data.db,pch=21,cex1=0.5,bg.in=1,add=TRUE)

plot(grid.db,name.image="Raw.Simu.Gaussian.Zn.S2",
     zlim=c(3,13),title="Simulation #2")
plot(data.db,pch=21,cex1=0.5,bg.in=1,add=TRUE)

We can now calculate some statistics by comparing the different simulations, per grid node. The function db.compare enables the calculation of one of the following options:

Number of defined values
Average over the defined values (default operation)
Variance over the defined values
Standard deviations over the defined values
Minimum over the defined values
Maximum over the defined values

In this section, we calculate the average of the simulations (default option). The locator of the resulting variable is set to z1, so that we do not have to specify it for the next plotting activity.

grid.db = db.compare(grid.db,names="Raw.Simu.Gaussian.Zn*",fun="mean")

## The method 'db.compare' is deprecated.
## It should be replaced by a call to db.stat.simu()

The following figure shows the average of the 10 simulations, with the data points overlaid:

plot(grid.db,zlim=c(3,13),title="Mean of Simulations")
plot(data.db,pch=21,cex1=0.5,bg.in=1,add=TRUE)

We can also compare the mean of these simulations with the kriging estimate:

dummy = correlation(grid.db,"KU.All.Zn.estim","mean",
                    flag.iso=TRUE,flag.diag=TRUE,
                    xlab="Kriging Estimate",
                    ylab="Mean of Simulations",
                    title="Comparing Simulations and Estimation")

Similarly we can calculate the standard deviation of the simulations using the following command:

grid.db = db.compare(grid.db,names="Raw.Simu.Gaussian.Zn*",fun="stdv")

## The method 'db.compare' is deprecated.
## It should be replaced by a call to db.stat.simu()

The following figure shows the dispersion of the 10 simulations, with the data points overlaid:

plot(grid.db,name="stdv",title="Dispersion of Simulations")
plot(data.db,pch=21,cex1=0.5,bg.in=1,add=TRUE)

2D Case Study

D. Renard

20 juin 2017

Loading the Information

Reading the external file

Creating the Db

Data management

Modifying the variable role

Viewing the Db

Performing a selection

Performing a transformation

Statistical inference

Calculating the experimental variogram

Visualizing the experimental variogram

Directional Variograms

Fitting the model

Cross-validating the structure

Creating a unique neighborhood

Creating a moving neighborhood

Cross-validation

Estimation

Creating a regular 2-D grid

Estimation using Unique neighborhood

Performing the estimation

Displaying the estimation result

Suppressing the selection on data

Estimation using Moving neighborhood

Performing the estimation

Displaying the estimation result

Testing the Moving Neighborhood

Simulations

Essential remark: Gaussian transform

Fitting the Gaussian Anamorphosis function

Transforming raw data into gaussian

Structural analysis

Conditional simulations