stats.sampling {RGeostats} | R Documentation |
Multivariate Statistical Sampling
stats.sampling(df, nvalues = NA, niter=100, constraints=NA, percent = 10, seed = 242351)
df |
The input data frame which provides the set of attributes (columns) for a set of samples (rows). |
nvalues |
The number of samples to be generated |
niter |
Number of trials to be performed in order to sample the experimental multivariate distribution. If the constraints have not been satisfied, no sample is generated. |
constraints |
Array giving the constraints if any constraints to be satisfied. This array is optional. See details for more explanations. |
percent |
The dimension of the kernel used for randomization. Seed details for explanations |
seed |
Seed for the random number generator |
The principle of this function is to generate an output data frame with the same number of attributes (columns) as the input data frame and the number of rows equal to 'nvalues'. The simulated values of the attributes should statistically match the distributions of the attributes from the input data table. This multivariate statistical sampling is a kernel method. Each input sample gives the seed of the implementation of a kernel. The set of kernels provide a global overall density function from which the output data are sampled. The argument 'percent' gives the dimension of the kernel: it is provided as a percentage of the standard deviation of each attribute. A value of zero will lead to an output data frame which will contain replication of the samples from the input data frame.
When the argument 'constraints' (denoted 'C') is defined as a matrix, its dimension should be nconst * nvar1 where 'nconst' designates the number of constraints and 'nvar1' should be equal to the number of columns ('nvar') + 1. For the given sample and the constraint ('j'), the linear combinaison
sum_(i=1)^(nvar) X(i) * C(i,j) >= 0
where X is the vector of the variables for the current sample extended by the value 1.
As an illustration, consider the case with two variables called X and Y. Consider the following constraints:
X > 0.1
Y < 0.5
X + Y > 2
X - 2Y < 3
This corresponds to the constraints matrix (entered by row):
Line #1: +1.0 +0.0 -0.1
Line #2: +0.0 -1.0 +0.5
Line #3: +1.0 +1.0 -2.0
Line #4: -1.0 +2.0 +3.0
A data frame with as many columns as the input data frame ('df'). The number of rows is given by 'nvalues'.