[SOLVED] Model.auto and overfittine

If you have problems using RGeostats package or if you simply need some help on an unlisted subject.

[SOLVED] Model.auto and overfittine

Postby Jeffrey Yarus » Mon Mar 28, 2022 8:54 pm

Model_auto.pdf
(411.89 KiB) Downloaded 109 times
I've been experimenting with model.auto in my class. It seems that if you select multiple models like; c(1,2,3,4,5,11), so I am including the stable model, some results can be quite strange. I have a number of questions:

1. First is really wise to string many model types together and allow the fitting algorithm to select what is best? Can this cause overfitting?
2. If I do list multiple model types together, should I include the Gaussian and the Exponential if I am already listing the Stable model?
3. It seems that including the nugget in a long list that includes the Stable model, the Stable model often resolves to the third parameter being 2 (pure Gaussian which we know can be unstable).
4. If I list a model type twice, with the autofit algorithm tend to created nested models with the same model (like list c(2,2) - will that create two nested exponential models)?
5. Do you have a more detailed guide as to how to use and what not to do when using model.fit?

I'm attaching a montage I put together on some of my class data for you to look at...
Jeffrey Yarus
 
Posts: 48
Joined: Wed Jun 26, 2019 9:39 pm

Re: Model.auto and overfittine

Postby Jeffrey Yarus » Thu Mar 31, 2022 6:49 pm

Hoping to get a response to this soon... Merci beaucoup!
Jeffrey Yarus
 
Posts: 48
Joined: Wed Jun 26, 2019 9:39 pm

Re: Model.auto and overfittine

Postby Didier Renard » Fri Apr 01, 2022 7:26 pm

I duplicated your post, in order to answer point by point

I've been experimenting with model.auto in my class. It seems that if you select multiple models like; c(1,2,3,4,5,11), so I am including the stable model, some results can be quite strange. I have a number of questions:
- the use of string is a shortcut in order to avoid naming the basic structures by their official name (such as Nugget Effect, Spherical, ...) as it can become tedious to avoid typos. Use melem.name() to get the translation between name and corresponding number.
- the use of several basic structures during model.auto() is the only way we imagined to offer the possibility of fitting a nested variogram: i.e. a nested model with Nugget Effect and Spherical will be referred to as c(1,3)
- Note that, in the scope of this nested structures, the same basic structure can be mentioned more than once. For example, providing c(2,2) will ask model.auto to perform a fit with a model where two exponential basic structures will be nested (they will certainly show up in the final resulting model with different ranges)

1. First is really wise to string many model types together and allow the fitting algorithm to select what is best? Can this cause overfitting?

The fitting procedure is a double iterative procedure. Given the list of basic structures, the fitting of their parameters (say range and sill) is iterative. When this optimal fit has been found, an additional procedure is used to discard the basic structures whose importance is considered as too small. This criterion is based on the ratio of the sill of this basic structure compared to the total sill. If smaller than a given threshold, the basic structure is discarded and the fitting procedure is resumed. The threshold is an argument of model.auto. Moreover, not that the user can forbid the component discarding. So, after these double iterative procedure, there should be no room for over fitting.

2. If I do list multiple model types together, should I include the Gaussian and the Exponential if I am already listing the Stable model?

The problem with the stable model is ... its lack of stability. As a matter of fact, the third parameter offers a nice possibility of varying gradually from exponential to gaussian. Personally, I prefer letting model.auto() to use more 'traditional" basic structures. Moreover, using stable model will lead to more computing time when processed in Kriging and even more in simulations.

3. It seems that including the nugget in a long list that includes the Stable model, the Stable model often resolves to the third parameter being 2 (pure Gaussian which we know can be unstable).

This should not happen: no reason for that. I confirm that Stable with this value of the third parameter corresponds to the Gaussian basic structure. However, note that some internal procedure check that the Gaussian structure is not provided alone (it adds a small internal nugget effect for stability sake). But the test is not performed in the case of Stable Model for that specific value of the parameter. So, even if your fitting goes correct, you may be stuck in the next step, i.e. kriging.

4. If I list a model type twice, with the autofit algorithm tend to created nested models with the same model (like list c(2,2) - will that create two nested exponential models)?

Already described this. And this feature is considered as positive as it makes sense to have a model with two basic structures of the same type. As I said, there are probably going to be quite different (different ranges). And even if this is not the case (say if the fit returns a model with twice the same basic structure with the same range), this will not hurt: it will simply cost twice more time than necessary.

5. Do you have a more detailed guide as to how to use and what not to do when using model.fit?

A paper is available on the methodology but it will not provide guidelines on what to do or what to avoid.


I'm attaching a montage I put together on some of my class data for you to look at...

If I can read correctly you attached document, I see that you make model.auto() to perform an automatic fit of a model based on an experimental variogram calculated in 2 directions (at least). The two directions show a reasonably strong anisotropy (directions have different shape). Both directions show a high variability at the first lag. This gives room to any interpretation of the origin of the variogram. It makes sense to allow a large nugget effect. But it also makes sense to avoid this nugget effect, replacing it by a short scale basic structure (its range being smaller than the first lag distance). Any intermediate behavior is possible. Just think if you had to perform this fitting manually.

The only trick is that we tried to add enough additional parameters to help the user dictate his choices to "help" the automatic procedure. In case of doubts, the user should resort to cross-validation facility in order to select the best choice... although if it is a matter of behavior at very small distance, I am not sure that cross-validation will make any difference.
A last point to help understanding the choices of the automatic fitting procedure: this procedure takes into account:
- the distance of each lag
- the number of pairs per lag
This may explain why the fit will get closer to the shape of one direction in particular. This happens when fitting in 3-D case where the vertical direction has much more pairs than in horizontal directions. It may also be the case in your data set, due to the extension of the field much longer in one direction. An additional parameter (on the weight definition) is available to help compensating for these artifacts.

Hope this long answer will help.
Didier Renard
 
Posts: 337
Joined: Thu Sep 20, 2012 4:22 pm

Re: Model.auto and overfittine

Postby Jeffrey Yarus » Mon Apr 04, 2022 6:10 pm

Thank you Didier! This helps a lot! I'll read through this a few more times to be sure I understand. What is perplexing to me, and I still need to understand, is why when stacking the basic structures together (eg 1,2,3,4,11) the resulting model (kriged solution) is either very odd, that is, very different than the solution that I know. My gut feel is to use fewer models that I believe to be reasonable, and not stack many structures together blindly... we should chat more about this...
Jeffrey Yarus
 
Posts: 48
Joined: Wed Jun 26, 2019 9:39 pm


Return to Other Troubleshooting

Who is online

Users browsing this forum: No registered users and 9 guests

cron