Geostatistical R Package - Board

by **niafall** » Tue Feb 22, 2022 2:39 pm

When implementing Conditional Geostatistical Simulations with point sample data (e.g. bottom trawl survey data), what is the most appropriate means of deriving the total abundance/biomass and uncertainty across the survey domain?

Many thanks

by **Didier Renard** » Tue Mar 01, 2022 10:27 am

Hello
This is a tentative of answer to a difficult question that can hardly be answered in a simple post.
The problem is the determination of the total biomass and the assessment of the corresponding uncertainty. Let me first point out that these are simple 2 values!!!

The first remark is that resorting to conditional simulations for the only purpose of getting these two outcomes is probably over-demanding. As a matter of fact, they require:
- the definition of data
- the definition of the field over which the total abundance is to be calculated
- a model (hopefully compatible with second-order stationarity... i.e. bounded)

1) When all these ingredients are provided, one can immediately imagine to use an estimation dedicated method such as kriging for example.
The only additional feature is that we do not target a local estimate (as for mapping) but a global estimate. This is achieved by using the global estimation procedure (the method is called global() in RGeostats).
This global kriging procedure does not require any neighborhood definition: as a matter of fact, for such a global feature, all samples are taken into account (using a local definition of neighborhood would be intuitively completely meaningless!). As a consequence, this will lead to a kriging system ... which has to be established only once (good point) ... but which can become very large and possibly almost intractable.
The second issue is the delineation of the field... When the domain on which the calculation must be performed is clearly defined (hard boundaries), there is no problem. Unfortunately sometimes, this delineation is not well defined: the field extends to a fuzzy boundary given by a series of zero catches along a sampling profile. Then we empirically declare that the field is over (i.e. no fish can be found further along the profile) ... but this limitation is not clearly defined.
This is not a problem for fish abundance but becomes an issue if you want to establish a fish average density over the domain (dividing the abundance by a domain surface which is ill-defined).
The advantage of this method (when the field is clearly delineated) is that the kriging estimate come with the corresponding variance.

2) To bypass the problem of huge matrix, one can thing of using the simple arithmetic mean to get the estimate... but use the extension variance to assess the uncertainty value. This variance only requires the knowledge of the model and the field extension. This is also offered by the method global() setting the argument 'calcul' to "arith").

3) When the field delineation is an issue, we can switch to the transitive theory which requires a (covariance) model but is not embarrassed by the exactness of the edge definition. However this approach is efficient and accurate only in good conditions regarding the sampling pattern. For example, if data are on a regular grid or on a set of parallel transects. Otherwise, you must massage your data beforehand (migrate them to this regular pattern... which may cause other troubles).

Finally if you decided to run simulations, the good solution for total abundance is clearly to average the simulations (therefore getting close to a kriging resulting map). Then to get the abundance over the domain, you simply need to integrate over the field (in other words, adding the bits of the grid nodes located within the polygon which delineates the domain) until you obtain the total abundance as a single number.
But there I am clearly stuck for deriving the variance corresponding to this total abundance. Possibly to get to his value, you must consider some types of independence criteria. For example, one could imagine to subdivide the fields into a set of contiguous subareas (partition) large enough so that we can consider that what happens in one subarea is independent from what happens in another subarea (even sitting next to the first one). Then variances become additive (neglecting the covariance). The subareas must also be small enough to allow the calculation of an estimation variance for each subarea. Then you could use global kriging again to get to these values (but where "global" here means to the size of the subarea).

Your last suggestion can also be considered. I would simply consider the distribution obtained by stacking all the values of all simulations over all grid nodes belonging to the field as the best proxy to the REAL distribution. Then modeling this distribution to extract its variance would not be difficult anymore.
I would avoid averaging simulations per grid node (as this comes back to an estimate which smooths out reality).
Your other suggestion which is to derive the abundance per simulation (summing over the grid cells for each simulation). Then each simulation is summarized to a single abundance value. All these values are stacked into a histogram which can then be modeled until its variance is derived. This is also a possibility. I can hardly measure the quality of this indirect assessment of the uncertainty. However for sure, it is much more demanding than the estimation methods mentioned before.
Needless to recall that the conditional simulations themselves have an internal issue: how to perform them with a large number of data which will make the conditional kriging system intractable too.

Hope all these thoughts will be helpful.

by **niafall** » Tue Mar 01, 2022 12:10 pm

Thank you for your thoughts, Didier. A lot to consider, but very helpful!

by **fernandespg** » Wed Mar 02, 2022 3:19 pm

Summing only works if you have synched your area resolution to the density metric (and/or adjusted accordingly). I.e. if your simulated grid cells are 1 km by 1 km - which I think they were in our case*, then estimating abundance (using the simulated variable of density per km2, our unit) resorts to a sum over that resolved field. Otherwise you do have to do something else (estimate the arithmetic mean density, kriged mean or the equivalent in the simulation world, and multiply that by the area). If it is just abundance you are after and you have a regular grid then the arithmetic mean should be fine and kriged mean won’t be far behind, with estimation variance as the relevant uncertainty measure. A simple check is to look at the arithmetic mean densities multiplied by the total area to see how they compare – I am pretty sure we would have done that. In absolute terms these issues are probably well overshadowed by the selectivity effects of the trawl in representing the density at a point. But again that should not affect our relative estimates of % distribution in space*. The only reason we resorted to simulations* was to see how the spatial variability might affect the boundary areas. I can’t see that we have done anything contrary to what Didier is saying there.
*For the benefit of others who may be following this, we are here referring to Fernandes, P. G., and Fallon, N. G. 2020. Fish distributions reveal discrepancies between zonal attachment and quota allocations. Conservation Letters, 13: e12702. Available for free at https://conbio.onlinelibrary.wiley.com/ ... conl.12702

Geostatistical R Package - Board

Total Abundance from Geostatistical Simulations

Total Abundance from Geostatistical Simulations

Re: Total Abundance from Geostatistical Simulations

Re: Total Abundance from Geostatistical Simulations

Re: Total Abundance from Geostatistical Simulations

Who is online