Geostatistical R Package - Board

by **Didier Renard** » Thu May 19, 2022 4:13 pm

Hi
Let us come back to the principle demonstrated by this workflow : http://rgeostats.free.fr/demo/RGeostats ... .sa.scot.R

We have a data set containing some hard data (say 70%) and some data assigned to 0 (a spike of 30% of fuzzy data).
We want to perform conditional simulations of this variable using traditional simulation techniques (such as the Turning Bands algorithm) which assumes Normality. And normality is not a correct assumption when data contain 30% spike.
We first perform an anamorphosis (Raw to Gaussian transform) on the whole data set ... The histogram of the result is as expected: 70% of the data are transformed into reasonable Gaussian values; the 30% fuzzy data are assigned a constant value equal to the inverse Gaussian of 30% (called 'ycut').
The next step is to modify the values equal to 'ycut' into a new value which would turn the global histogram to a Gaussian shape. This is obtained by running a Gibbs sampler ... which unfortunately requires the spatial model of the expanded variable ... that we don't have. This requires a preliminary task: finding the model of the underlying variable (where all samples are Gaussian).
This is ensured by vario.trans.cut. This trick considers the experimental variogram of the whole data set (including the fuzzy data) and we must guess the variogram of the underlying variable (when all data are Gaussian translated).

When this model is calculated and fitted, we can enter the Gibbs operator. This operator is meant to replace each datum by a new value such that:
y in [-10; ycut] for fuzzy data
y = ydata for hard data.
This operator ensures the spatial continuity amongst data as dictated by model.
As a result, these expanded data will fulfill the condition (be < ycut for fuzzy data and be > ycut and equal to initial Gaussian transformed for hard data) although they follow the spatial continuity of the Model.

When the data are expanded Gaussian, now we can perform the Turning Band simulation (in correct Gaussian hypothesis).
The result is finally back-transformed from Gaussian to Raw scale. The result usually lie within the range of the data. However if there is too much difference between the number of data and number of grid nodes: this can lead to results lying outside the range. It is therefore necessary to truncated results below zero (in raw scale).

The workflow is now corrected and will be available in the next RGeostats version (> 13.0.1).

Hope this will help.

by **Fabien Ors** » Mon May 23, 2022 10:53 am

We have just release a new version of RGeostats 14.0.3 (Windows and Linux).
(Mac version is currently under construction)
I hope this new version will fix all issues you experienced with herring.sa.scot workflow.

by **Mangeni-Sande** » Tue May 24, 2022 1:50 pm

Fabien Ors wrote:We have just release a new version of RGeostats 14.0.3 (Windows and Linux).
(Mac version is currently under construction)
I hope this new version will fix all issues you experienced with herring.sa.scot workflow.

Dear Fabien and group,

Thank you so much.
The anomorphosis algorithm and Gibbs sampler work well in the current Reostats version.

Ragards,

by **Mangeni-Sande** » Tue May 24, 2022 1:54 pm

Dear Didier,

The expalnation has been well received.
Thank you very much.

by **Mangeni-Sande** » Sat May 28, 2022 2:03 pm

Fabien Ors wrote:We have just release a new version of RGeostats 14.0.3 (Windows and Linux).
(Mac version is currently under construction)
I hope this new version will fix all issues you experienced with herring.sa.scot workflow.

Dear Fabien,

Thank you for the latest version of RGeostats, 14.0.3, which corrected the problems I was experiencing with the anam.fit () and gibbs () functions in the herring code.I expected both versions to offer comparable estimates, BUT that is not the case?

When I compared the biomass estimates generated by the two versions using the dataset and script I uploaded to the Rgeoestat platform (viewtopic.php?f=6&t=534), version 12.0.1 produced an estimate of approximately 664,000 tonnes, whereas the new version 14.0.3 produced an estimate of approximately 41,000,000 tonnes, which is greater than the total lake-wide biomass of approximately 3 million tonnes that Lake Victoria has ever produced when using the convention Jolly and Hampton (1990) method.

In addition, the Turning Band Simulation (simtub()) takes around 5-6 hours to excute against 30 minutes in version 12.0.1.

I have a feeling something is wrong, but I can't figure it out easily.

Please, provide guidance.

by **Fabien Ors** » Mon May 30, 2022 12:28 pm

Could you please provide us the full script you are running ?
Thank you for your feedback.

by **Mangeni-Sande** » Wed Jun 01, 2022 2:36 pm

Fabien Ors wrote:Could you please provide us the full script you are running ?
Thank you for your feedback.

Dear Fabien et .al.

This is the script..
#######################################################################################
# Conditional Geostatistical Simulations of acoustic data (120 kHz)

# Rastrineobola argentea (dagaa), in Lake Victoria, East Africa

# Script adapted from:
# Petitgas, P., Woillez, M., Rivoirard, J., Renard, D., and Bez, N. 2017. Handbook of
# Geostatistics in R for Fisheries and Marine ecology. ICES Cooperative Research Report
# No. 338. 177 pp.(108-112 pp).

## Session Information
# R version 4.1.1
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows >= 8 x64 (build 9200)

## clear workspace
rm(list=ls())

setwd("C:/Users/mageni/Dropbox/ST-ANDREWS/PhD/Geostatistics/dagaa geostats/Final script")

## Load required packages
library(Rcpp)#Rcpp_1.0.8.3
library(RGeostats) # RGeostats_14.0.3
library(spatstat) # spatstat_2.2-0
library(colorspace) #colorspace_0.1.4
library(dplyr)#dply_1.0.7
library(fields)#fields_12.5
library(ggplot2)#ggplots_3.3.5
library(mapdata)#mapdata_2.3.0
library(devtools)#devtools_2.4.1
library(colormap)#coourmap_0.1.4
library(maps)#maps_3.3.0
##############################################################################################
source("funcs_v5.r")## Source extra plotting functions
## Read datasets and import to R as objects
areaPoly <- read.csv("lakeVictoriaRGeostatPolygon.csv", header=T)
dat <- read.csv("dagaa_NASC_by_EDSU3.csv", header=T)# Fish density data

###############################################################################################
### Database management for RGeoStats ###
##############################################################################################
# Create RGeoStats database
densDb <- db.create(dat, flag.grid=F)#

# Set coordinate locators #
densDb = db.locate(densDb, "long", "x",1)
densDb = db.locate(densDb, "lat", "x",2)
densDb = db.locate(densDb, "NASC","z")

## Create area polygon object defining a polygon restrain the area of interest
poly.data <- polygon.create(areaPoly)

##############################################################################################
### Variography#
##############################################################################################
projec.query()# the projection is turned off

# Plot dB (Projection OFF)
plot(densDb, zmin=0.001, pch.low=3, cex.low=0.25,
las=1,# vertval labeling
pch=21, col=1, inches=4, title="dagaa_2015",
pos.legend=1,# position of the legend
xlab="Longitude (°)",ylab="Latitude (°)",
asp=1)#

# Calculate influence surface for sample points
densDb <- infl(densDb,
nodes=c(400,400),
origin=c(min(areaPoly$X),min(areaPoly$Y)),
extend=c(abs(max(areaPoly$X)-min(areaPoly$X)),
abs(max(areaPoly$Y))+abs(min(areaPoly$Y))),
polygon=poly.data,
plot=T,
asp=1)

projec.define(projection="mean", db=densDb ,flag.update=T)# Project database

# Plot dB (Projection ON)
plot(densDb, zmin=0.001, pch.low=3, cex.low=0.25, las=1,
pch=21, col=1, inches=5,
title="dagaa_2015 (Mean Projection)", asp=1)

## Compute lag & maximum distance for variography

# Project sampling location coordinates
pvals <- projec.operate(densDb@items$lon,
densDb@items$lat)

llag <- mean(nndist(cbind(pvals$x,pvals$y)))#
nlag <- round((max(pairdist(cbind(pvals$x,pvals$y)))/2)/llag)
max(pairdist(cbind(pvals$x,pvals$y)))

## Plot raw data as histogram
hist(densDb@items$NASC,
main="",
xlab="",
breaks=50,
col="#7f7f7f",
border="#7f7f7f")
mtext(expression(atop(s[a]~m^2~nmi^-2)),
side=1,
line=4.5)

mean(densDb@items$NASC)
var(densDb@items$NASC)

#Transform the data to a Gaussian field
model.anam <- anam.fit(densDb,
type="emp", #
draw=T,
title="Model of anamorphosis",
ndisc=densDb$nech,
sigma2e=800)#

# Create transformed database using model anamorphosis
#acoustic backscatter data into normal scores
db.data.trans <- anam.z2y(densDb, anam=model.anam)

db.data.trans <- db.rename(db.data.trans,
name="Gaussian.NASC",
newname="Yp")

ycut <- qnorm(sum(db.extract(db.data.trans,"NASC") == 0) /
length(db.extract(db.data.trans, name="NASC",
flag.compress=T)))

Y <- db.extract(db.data.trans, "Yp", flag.compress=T)
# Plot anamorphosis transformed data
hist(Y, breaks = 50, main = "",
col="#b3b3b3", border="#b3b3b3",
xlab="Anamorphosis Transformed NASC")

nasc <- db.extract(db.data.trans, "NASC")
Y[nasc == 0] <- ycut
db.data.trans <- db.replace(db.data.trans, "Yp", Y)

## Fitting variogram models #(Building the model)
vario.Yp <- vario.calc(db.data.trans, lag=llag, nlag=nlag)

n.H <- 10# Parameter for variogram transformation (Number of Hermite polynomials)
vario.Y <- vario.trans.cut(vario.Yp, ycut, n.H)#
plot(vario.Y, xlab="Distance (n.mi)",col="red",lwd=2)
model.vario.Y <- model.auto(vario.Y, struc=melem.name(c(1,2)),col="red", draw=T)

# Visualise the variogram
plot(vario.Y ,npairdw=T, npairpt=F, inches=0.06, col="red",
xlab="Distance (n.mi.)")#

plot(vario.Yp, npairdw=T, npairpt=F, inches=0.06,
col="#000000", add=T) #varariogram of truncated/transformed variable

plot(model.vario.Y, add=T, col="blue", lwd=3)#model of the transformed data fitted

# Add a legend to the plot
legend("bottomright", cex=0.75, y.intersp=1.2,
legend=c("Gaussian Transformed Variogram",
"Truncated Variable Variogram",
"Model Variogram"),
col=c("red","black","blue"), lty=1, pch=c(19,19,NA), lwd=c(1,1,2))#

#define interval limits for the Gibbs sampler
Ymax <- db.extract(db.data.trans, name="Yp", flag.compress=F)
Ymin <- db.extract(db.data.trans, name="Yp", flag.compress=F)
Ymin[Ymin <= ycut] <- -10
db.data.trans <- db.add(db.data.trans,Ymax)
db.data.trans <- db.locate(db.data.trans, db.data.trans$natt,"upper")
db.data.trans <- db.add(db.data.trans, Ymin)
db.data.trans <- db.locate(db.data.trans,db.data.trans$natt,"lower")
db.data.trans <- db.locate(db.data.trans, "Yp", "z")

# Gibbs sampler
db.data.trans <- gibbs(db = db.data.trans,
model = model.vario.Y,
seed = 1337, #
nboot = 1000, #
niter = 1000, #
flag.norm = FALSE, #
percent = 0, #
#toleps = 1, #
radix = "Gibbs",
modify.target = TRUE) #

# Rename the Gibbs sampler variable
db.data.trans <- db.rename(db.data.trans, "Gibbs.G1", "Y2")

range(db.extract(db.data.trans,"Y2",flag.compress=T))

## Histograms to visualise data transformation
# Truncated variable
histYp <- hist(db.data.trans[,"Yp"],plot=F,breaks=seq(-4,4,.1))
hist(db.data.trans[,"Yp"], proba=T, breaks=seq(-4,4,.1),
col="#b3b3b3", border="#b3b3b3",
main="", xlab="Lower-cut Gaussian Variable",
xlim=c(-4,4), ylim=c(0,ceiling(max(histYp$density))))

# Truncated variable after Gibbs sampling
hist(db.extract(db.data.trans,"Y2",flag.compress=T), proba=T, breaks=seq(-4,4,.1),
xlab="Gaussian Transformed Variable", main="",
col="#b3b3b3", border="#b3b3b3",
xlim=c(-4,4), ylim=c(0,ceiling(max(histYp$density))))

# Empirical density distribution (EDD)
EDD <- db.extract(db.data.trans,"Y2",flag.compress=T)
lines(density(EDD), lwd=2)
lines(seq(-4,4,0.1), dnorm(seq(-4,4,0.1),0,1), col="#ff0000", lwd=2)
##############################################################################
### Simulations ###
############################################################################
## Approximate grid cell sizes at reference coordinates
# Project area polygon
mapCoords <- projec.operate(areaPoly$X, areaPoly$Y)# set the coordinates

# Get projected values in nautical miles
nmix <- sum(abs(range(mapCoords$x)))
nmiy <- sum(abs(range(mapCoords$y)))

# Calculate number of km in x and y of area polygon that will be set as grid number
gnx <- round(nmix*1.852)
gny <- round(nmiy*1.852)

# Create simulation Grid for Conditional Geostatistical Simulations
grid.simu <- db.grid.init(poly.data, nodes=c(gnx,gny))

# Create neighbourhood object ## ordinary kriging in unique neighbourhood????
neigh.simu <- neigh.create(ndim=2, #
type=2, #
nmini=1, #
nmaxi=100) #
neigh.simu@dmax <- 100

# Set number of simulations to run
ns <- 999

# Run Turning Band Simulations #
grid.simu <- simtub(dbin = db.data.trans,
dbout = grid.simu,
model = model.vario.Y,
neigh = neigh.simu, #
uc = "", #
mean = 0, #
seed = 1337, #
nbsimu = ns, #
nbtuba = 1000, #
radix = "Simu", #
modify.target = TRUE)

grid.simu <- db.rename(grid.simu, "Simu.Y.S1","Simu.Y")
print(grid.simu,flag.stats=TRUE,names="Simu.Y")
hist(db.extract(grid.simu,"Simu.Y"),col=8,xlab="Y",nclass=100,main="Histogram of Simulated Gaussian expanded values")

# Create db for mean realisation
grid.simu.mean <- db.stat.simu(grid.simu,
names ="Simu*",
fun="mean",
zmin=NA,
zmax=NA,#
proba=NA,
radix=NA,
modify.target=db.locmod())

#grid.simu.mean <- db.compare(grid.simu, names="Simu*", fun="mean")
grid.simu.mean <- anam.y2z(grid.simu.mean, name="mean", anam=model.anam)

## Create database with mean of simulations
Raw.Simu.mean <- grid.simu.mean@items$Raw.mean
Raw.Simu.mean[round(Raw.Simu.mean,2)<=0.00] <- 0
grid.simu.mean <- db.add(grid.simu.mean, Raw.Simu.mean)
rm(Raw.Simu.mean)
grid.simu.poly <- db.polygon(grid.simu.mean, poly.data)
grid.simu.poly <- db.delete(grid.simu.poly, "Simu.Y2*")
rm(grid.simu.mean)

jet.colors <- colorRampPalette(c("blue", "Orange","yellow","red"))
# Plot mean density surface
plot(grid.simu.poly, name="Raw.Simu.mean",
pos.legend=1,#position of legend
flag.proj=F,asp=1, col=jet.colors(10),
xlab="Longitude (°)",ylab="Latitude (°)",
zlim=c(0,2256))#zlim=c(0,2256)

plot(grid.simu.poly, name="Raw.Simu.mean",
pos.legend=1, flag.proj=F,asp=1,
xlab="Longitude (°)",ylab="Latitude (°)",
col=jet.colors(10), zlim=c(0,2256))

###overlaying cruise track over the mean density surface distribution######
lines(dat$lon,dat$lat, col="red", lwd=1)

## Backtransform all realisations
system.time(ysim <- anam.y2z(grid.simu, names="Simu*", anam=model.anam))

ysim <- db.delete(ysim, "Simu.Y2*")
rm(grid.simu)
# Create median realisation db
# ysim <- db.polygon(ysim, poly.data)
vals <- ysim@items[ysim@items$Polygon==T,]
vals[vals <= 0] <- 0
totalNASC <- colSums(vals,na.rm = T)
totalNASC <- totalNASC[grepl("Raw.Simu",names(totalNASC))]
medianName <- names(totalNASC[totalNASC==median(totalNASC)])
rm(vals)

# Create db
ysim <- db.delete(ysim, "Polygon")
grid.simu.med <- ysim
meds <- db.extract(grid.simu.med, medianName, flag.compress = F)
meds[round(meds,2)<=0.00] <- 0
grid.simu.med <- db.delete(grid.simu.med, "Raw*")
grid.simu.med <- db.add(grid.simu.med, meds)

# Plot median realisation
plot(grid.simu.med, name="meds", asp=1,col=jet.colors(10), pos.legend=1,
flag.proj=F,
xlab="Longitude (°)",ylab="Latitude (°)",
zlim=c(0,2256))
#########################################
### Plot with colour legend in margin ###
#########################################
# Get limits for the legend
lims <- c(0, max(grid.simu.med@items$meds,na.rm=T))

# Create labels for the margin legend
zRange <- c(min(lims), rep(NA,48), rUp(max(lims)))
grid.simu.med <- db.polygon(grid.simu.med, poly.data, flag.replace = T)

# Extract simulated NASC values from database
zs <- ifelse(grid.simu.med@items$Polygon==T,
grid.simu.med@items$meds, NA)

# Create JPEG file for plot
jpeg("median_realisation.jpeg", width = 7, height = 6, units = "in", res = 250)
par(mar=c(3,3,1,4), oma=c(0,0,0,2))
quilt.plot(grid.simu.med@items$x1,
grid.simu.med@items$x2,
zs,
col=jet.colors(10),
nx=gnx, ny=gny,
add.legend = F,
axes=F,
xlim=c(31.55,34.9),
ylim=c(-3.1,0.6),
zlim=c(0,2256))# zlim=lims)
# Add axes
axis(1, at=seq(32,34,1),
labels=c(parse(text=sprintf('%s^o*E', seq(32,34,1)))))
axis(2, at=seq(-3,0,1),
labels=c(parse(text=sprintf('%s^o*%s', abs(seq(3,0,-1)), c(rep("S",3),NA)))))
# Add box around plot
box()
# Add colour legend
btlegend(zlim=c(0,max(zRange,na.rm=T)), col=rev(heat_hcl(50)))
# Close connection to JPEG file - Create JPEG
dev.off()
##################################################################################
### Calculation of Abundance ###
##################################################################################

# Copy simulations database
abundsim <- ysim
NASC2015 <- abundsim@items[,grep("Raw.Simu.Y2", names(abundsim@items))]
NASC2015[NASC2015 < 0] <- 0# Set negative realisations to zero

# SigmaBS for Dagaa
TS <- -57.6#Ts per individual dagaa
TSkg <- -29.4#
sigmabs <- 10^(TS/10)
sigmakg <- 10^(TSkg/10)

# Calculate abundance of dagaa per km2 (or per cell)
abund2015 <- NASC2015/((4 * pi *(1.852^2))*sigmabs)
# Calculate biomass (kg) of dagaa per km2 (or by cell)
biomass2015 <- NASC2015/((4 * pi *(1.852^2))*sigmakg)

# Calculate abundance
abundance <- colSums(abund2015, na.rm = T)
# Plot distribution of abundance values
boxplot(abundance)
# Plot mean abundance with 95% confidence interval
plot(1.4, # This value is just to plot the estimate at the appropriate place on the boxplot
mean(abundance), # Mean abundance estimate
pch=19,
gap=0.5,
ui=quantile(abundance, probs = c(0.975)), # Upper CI (97.5% quantile)
li=quantile(abundance, probs = c(0.025)), # Lower CI (2.5% quantile)
add=T)

#### Calculate biomass#############################
biomass <- colSums(biomass2015, na.rm = T)/1000 ## convert to tonnes
# Plot distribution of abundance values
boxplot(biomass)

# Plot mean abundance with 95% confidence interval
plot(1.4, # This value is just to plot the estimate at the appropriate place on the boxplot
mean(biomass), # Mean abundance estimate
pch=19,
gap=0.5,
ui=quantile(biomass, probs = c(0.975)), # Upper CI (97.5% quantile)
li=quantile(biomass, probs = c(0.025)), # Lower CI (2.5% quantile)
add=T)

quantile(biomass, probs = c(0.025))
mean(biomass)
quantile(biomass, probs = c(0.975))
######################################################

Geostatistical R Package - Board

R crushes when I run anim.fit function in windows 10

Re: R crushes when I run anim.fit function in windows 10

Re: R crushes when I run anim.fit function in windows 10

Re: R crushes when I run anim.fit function in windows 10

Re: R crushes when I run anim.fit function in windows 10

Re: R crushes when I run anim.fit function in windows 10

Re: R crushes when I run anim.fit function in windows 10

Re: R crushes when I run anim.fit function in windows 10

Who is online