Filtering 1-D regular data

D. Renard

June 20-th, 2014

Abstract

RGeostats is a library of geostatistical procedures available in the R environment. It is based on the geostatistical library Geoslib written in C language. The specificity of RGeostats is that it can deal with information defined in a space of any dimension, and can process any number of variables simultaneously.

This case study is designed in order to illustrate the use of geostatistics for filtering an information provided along a regular 1-D data set. This technique is known as factorial kriging analysis and is usually applied in the multivariate framework. However here it will be used on a single variable.

Data Set

Input File

The data set consists in the conductivity information collected regularly along a well log. A set of 850 samples is provided in the file called Data1D.dat. The file is composed of two columns of numbers:

• the first column contains the depth coordinate

• the second column contains the conductivity information

The first line of the file contains the variable names. When reading the depth column, one can easily check that the samples are regularly organized along a 1-D grid with a mesh of 0.02002m and an origin at 519.00074m.

Loading the information

The first task is to read this information and store it as a R format table called Exdemo_1D_well.table, using the following command:

library(RGeostats)
## Loading required package: Rcpp
## Loading RGeostats - Version:11.0.5
rg.load("Exdemo_1D_well.table","well.table",flag.overwrite=TRUE)

The next step is to load the information in a new DB structure called well.db.

well.db <- db.create(well.table,flag.grid=TRUE,x0=519.00074,dx=0.02002,nx=850,autoname=FALSE)

The new DB structure created can be checked either by typing the following command (or by simply typing the name of the DB). We obtain the following printout:

print(well.db)
## 
## Data Base Characteristics
## =========================
## 
## Data Base Summary
## -----------------
## File is organized as a regular grid
## Space dimension              = 1
## Number of fields             = 4
## Maximum Number of attributes = 4
## Total number of samples      = 850
## Grid characteristics:
## Origin :    519.001
## Mesh   :      0.020
## Number :        850
## Angles :      0.000
## 
## Variables
## ---------
## Field =   1 - Name    =  rank - Locator =  NA 
## Field =   2 - Name    =  x1 - Locator =  x1 
## Field =   3 - Name    =  Depth - Locator =  z1 
## Field =   4 - Name    =  Conductivity - Locator =  NA

The two first fields have been generated automatically, the two columns of the table are loaded in fields 3 and 4.

The field #4 is now considered as the target variable:

well.db <- db.locate(well.db,4,"z",1)

Finally the field #3 is similar the field #2 (generated automatically) and can thus be deleted:

well.db <- db.delete(well.db,3)

Typing the name of the DB again, we obtain the following printout (only the variable paragraph is produced here):

well.db
## 
## Data Base Characteristics
## =========================
## 
## Data Base Summary
## -----------------
## File is organized as a regular grid
## Space dimension              = 1
## Number of fields             = 3
## Maximum Number of attributes = 3
## Total number of samples      = 850
## Grid characteristics:
## Origin :    519.001
## Mesh   :      0.020
## Number :        850
## Angles :      0.000
## 
## Variables
## ---------
## Field =   1 - Name    =  rank - Locator =  NA 
## Field =   2 - Name    =  x1 - Locator =  x1 
## Field =   3 - Name    =  Conductivity - Locator =  z1

We must finally check the statistics on the different variables contained in well.db structure, by typing the command and obtaining the following printout:

print(well.db,flag.stats=T)
## 
## Data Base Characteristics
## =========================
## 
## Data Base Summary
## -----------------
## File is organized as a regular grid
## Space dimension              = 1
## Number of fields             = 3
## Maximum Number of attributes = 3
## Total number of samples      = 850
## Grid characteristics:
## Origin :    519.001
## Mesh   :      0.020
## Number :        850
## Angles :      0.000
## 
## Data Base Statistics
## --------------------
## 1 - Locator Variable NA (Name=rank) 
##  Nb of data          =        850
##  Nb of active values =        850
##  Minimum value       =      1.000
##  Maximum value       =    850.000
##  Mean value          =    425.500
##  Standard Deviation  =    245.374
##  Variance            =  60208.250
## 2 - Locator Variable x1 (Name=x1) 
##  Nb of data          =        850
##  Nb of active values =        850
##  Minimum value       =    519.001
##  Maximum value       =    535.998
##  Mean value          =    527.499
##  Standard Deviation  =      4.912
##  Variance            =     24.131
## 3 - Locator Variable z1 (Name=Conductivity) 
##  Nb of data          =        850
##  Nb of active values =        850
##  Minimum value       =     22.113
##  Maximum value       =    137.738
##  Mean value          =     97.632
##  Standard Deviation  =     28.816
##  Variance            =    830.341
## 
## Variables
## ---------
## Field =   1 - Name    =  rank - Locator =  NA 
## Field =   2 - Name    =  x1 - Locator =  x1 
## Field =   3 - Name    =  Conductivity - Locator =  z1

Displaying the Information

We can now produce a graphic where the information along the well is plotted, by typing the command:

plot(well.db)

We can see that the variable presents high frequency irregular variations.

Variography

Experimental variogram

We now wish to calculate the variogram of the Conductivity variable. As the information is defined on a regular grid, we calculate the variogram with a lag equal to the grid mesh. We calculate the variogram for 500 lags and store the result in a new variogram file called well.vario.

well.vario <- vario.grid(well.db,nlag=500)

The contents of the variogram structure can be defined by typing the following command, or by simply typing the file name (the corresponding printout is not produced here).

print(well.vario)

We can also display this variogram using the command and obtain the following graphic:

plot(well.vario,title="Variogram of Conductivity")

Model definition

In the next step, we must fit a model which fits the experimental variogram. This operation is performed using an interactive procedure where we must define:

• the number of basic structures

• for each structure, its type, range and additional parameter (if necessary)

An automatic procedure is used, where a set of fours basic structures is introduced (three periodic structures and a linear scheme). The figure shows the experimental variogram and the fitted model.

well.model = model.auto(well.vario,struct=c(20,20,20,14),title="Model of Conductivity")

We can display the contents of the model by using the following command or simply by typing the name of the file well.model.

print(well.model)
## 
## Model characteristics
## =====================
## Space dimension              = 1
## Number of variable(s)        = 1
## Number of basic structure(s) = 4
## Number of drift function(s)  = 1
## Number of drift equation(s)  = 1
## 
## Covariance Part
## ---------------
## - Cosexp
##   Range       =      5.090
##   Theo. Range =      1.699
##   Parameter   =      0.847
##   Sill        =    133.305
## - Cosexp
##   Range       =      2.960
##   Theo. Range =      0.988
##   Parameter   =      0.430
##   Sill        =     62.465
## - Cosexp
##   Range       =      3.493
##   Theo. Range =      1.166
##   Parameter   =      1.497
##   Sill        =    388.536
## - Order-1 GC
##   Range       =      9.990
##   Sill        =    365.755
## Total Sill    =    950.062
## 
## Drift Part
## ----------
## Universality Condition

Filtering

We now wish to perform the filtering operation using the geostatistical technique. For that sake, we still have to define the neighborhood parameters, i.e. the number of cell values which will be used in order to compute the filtered estimate for the central cell.

Neighborhood Definition

The neighborhood is defined interactively and stored in a neighborhood file called Exdemo_1D_well.neigh, using the following command:

well.neigh <- neigh.create(ndim=1, type=3, radimg=100, skip=1)

The contents of the Neighborhood object is obtained by typing:

print(well.neigh)
## 
## Neighborhood characteristics
## ============================
## Image neighborhood option
## Skipping factor = 1
## Image radius :    100.000

Filtering step

We consider the Conductivity variable as the sum of five independent components, each component being responsible of each basic component of the model. The factorial kriging technique enables to estimate each component, together with a sixth component which corresponds to the local mean.

When an image neighborhood is used, characterized by its radius, the cells located close to the edge of the data set (closer than the neighborhood radius) cannot be estimated.

Each component is estimated individually. For example, the first component, corresponding to the first basic structure of the model, will be stored in the well.db file using the radix s1, when typing the following command:

well.db <- krimage(well.db,model=well.model,neigh=well.neigh,cov.extract=1,drift.extract=0,radix="s1",modify.target=FALSE)

Similarly, we can extract the component corresponding to the mean and store it using the radix sm:

well.db <- krimage(well.db,model=well.model,neigh=well.neigh,cov.extract=0,drift.extract=1,radix="sm",modify.target=FALSE)

If we now check the contents of the file well.db, we obtain the following printout:

print(well.db)
## 
## Data Base Characteristics
## =========================
## 
## Data Base Summary
## -----------------
## File is organized as a regular grid
## Space dimension              = 1
## Number of fields             = 8
## Maximum Number of attributes = 8
## Total number of samples      = 850
## Grid characteristics:
## Origin :    519.001
## Mesh   :      0.020
## Number :        850
## Angles :      0.000
## 
## Variables
## ---------
## Field =   1 - Name    =  rank - Locator =  NA 
## Field =   2 - Name    =  x1 - Locator =  x1 
## Field =   3 - Name    =  Conductivity - Locator =  z1 
## Field =   4 - Name    =  s1.Conductivity.estim - Locator =  NA 
## Field =   5 - Name    =  s2.Conductivity.estim - Locator =  NA 
## Field =   6 - Name    =  s3.Conductivity.estim - Locator =  NA 
## Field =   7 - Name    =  s4.Conductivity.estim - Locator =  NA 
## Field =   8 - Name    =  sm.Conductivity.estim - Locator =  NA

Visualizing the results

We now want to plot each component individually. For that sake, each variable must be defined as a target variable, before it can be visualized. This can be done in the following command, where each field is set, in turn, as the target:

plot(well.db,name="s1.Conductivity.estim")

plot(well.db,name="s2.Conductivity.estim")

plot(well.db,name="s3.Conductivity.estim")

plot(well.db,name="s4.Conductivity.estim")

plot(well.db,name="sm.Conductivity.estim")

Ultimate Check

The last check is to ensure that the sum of the six extracted components results into the initial Conductivity variable.

We construct the new variable sum which corresponds to the sum of the six extracted components by typing:

well.db <- db.add(well.db,Sum=s1.Conductivity.estim+s2.Conductivity.estim+s3.Conductivity.estim+s4.Conductivity.estim+sm.Conductivity.estim)

We can now draw the scatter diagram between the initial variable Conductivity and the newly created variable sum, by typing the command:

correlation(well.db,"Conductivity","Sum")

## [1] 1

We obtain the following figure which proves that the six extracted components add up to the initial Conductivity variable.