by Didier Renard » Tue Mar 01, 2022 10:27 am
Hello
This is a tentative of answer to a difficult question that can hardly be answered in a simple post.
The problem is the determination of the total biomass and the assessment of the corresponding uncertainty. Let me first point out that these are simple 2 values!!!
The first remark is that resorting to conditional simulations for the only purpose of getting these two outcomes is probably over-demanding. As a matter of fact, they require:
- the definition of data
- the definition of the field over which the total abundance is to be calculated
- a model (hopefully compatible with second-order stationarity... i.e. bounded)
1) When all these ingredients are provided, one can immediately imagine to use an estimation dedicated method such as kriging for example.
The only additional feature is that we do not target a local estimate (as for mapping) but a global estimate. This is achieved by using the global estimation procedure (the method is called global() in RGeostats).
This global kriging procedure does not require any neighborhood definition: as a matter of fact, for such a global feature, all samples are taken into account (using a local definition of neighborhood would be intuitively completely meaningless!). As a consequence, this will lead to a kriging system ... which has to be established only once (good point) ... but which can become very large and possibly almost intractable.
The second issue is the delineation of the field... When the domain on which the calculation must be performed is clearly defined (hard boundaries), there is no problem. Unfortunately sometimes, this delineation is not well defined: the field extends to a fuzzy boundary given by a series of zero catches along a sampling profile. Then we empirically declare that the field is over (i.e. no fish can be found further along the profile) ... but this limitation is not clearly defined.
This is not a problem for fish abundance but becomes an issue if you want to establish a fish average density over the domain (dividing the abundance by a domain surface which is ill-defined).
The advantage of this method (when the field is clearly delineated) is that the kriging estimate come with the corresponding variance.
2) To bypass the problem of huge matrix, one can thing of using the simple arithmetic mean to get the estimate... but use the extension variance to assess the uncertainty value. This variance only requires the knowledge of the model and the field extension. This is also offered by the method global() setting the argument 'calcul' to "arith").
3) When the field delineation is an issue, we can switch to the transitive theory which requires a (covariance) model but is not embarrassed by the exactness of the edge definition. However this approach is efficient and accurate only in good conditions regarding the sampling pattern. For example, if data are on a regular grid or on a set of parallel transects. Otherwise, you must massage your data beforehand (migrate them to this regular pattern... which may cause other troubles).
Finally if you decided to run simulations, the good solution for total abundance is clearly to average the simulations (therefore getting close to a kriging resulting map). Then to get the abundance over the domain, you simply need to integrate over the field (in other words, adding the bits of the grid nodes located within the polygon which delineates the domain) until you obtain the total abundance as a single number.
But there I am clearly stuck for deriving the variance corresponding to this total abundance. Possibly to get to his value, you must consider some types of independence criteria. For example, one could imagine to subdivide the fields into a set of contiguous subareas (partition) large enough so that we can consider that what happens in one subarea is independent from what happens in another subarea (even sitting next to the first one). Then variances become additive (neglecting the covariance). The subareas must also be small enough to allow the calculation of an estimation variance for each subarea. Then you could use global kriging again to get to these values (but where "global" here means to the size of the subarea).
Your last suggestion can also be considered. I would simply consider the distribution obtained by stacking all the values of all simulations over all grid nodes belonging to the field as the best proxy to the REAL distribution. Then modeling this distribution to extract its variance would not be difficult anymore.
I would avoid averaging simulations per grid node (as this comes back to an estimate which smooths out reality).
Your other suggestion which is to derive the abundance per simulation (summing over the grid cells for each simulation). Then each simulation is summarized to a single abundance value. All these values are stacked into a histogram which can then be modeled until its variance is derived. This is also a possibility. I can hardly measure the quality of this indirect assessment of the uncertainty. However for sure, it is much more demanding than the estimation methods mentioned before.
Needless to recall that the conditional simulations themselves have an internal issue: how to perform them with a large number of data which will make the conditional kriging system intractable too.
Hope all these thoughts will be helpful.