Fitting a trophic Species distribution model

trophicSDM is used to fit a trophic species distribution model. Requires the species distribution data Y (the sites x species matrix), explanatory variables X and a directed acyclic graph G containing species interactions (i.e., the metaweb, with links going from predators to prey). The function fits the distribution of each species as a function of their preys (with mode = "prey", by default) or predators (if set mode = "predator").

trophicSDM(
  Y,
  X,
  G,
  env.formula = NULL,
  sp.formula = NULL,
  sp.partition = NULL,
  penal = NULL,
  mode = "prey",
  method = "stan_glm",
  family,
  iter = 500,
  chains = 2,
  run.parallel = FALSE,
  verbose = FALSE
)

Arguments

Y

The sites x species matrix containing observed species distribution (e.g. presence-absence).

X

The design matrix, i.e. sites x predictor matrix containing the value of each explanatory variable (e.g. the environmental conditions) at each site.

G

The species interaction network (aka metaweb). Needs to be an igraph object. Links must go from predator to preys. It needs to be a directed acyclic graph.

env.formula

The definition of the abiotic part of the model. It can be :

a string specifying the formula (e.g. "~ X_1 + X_2"). In this case, the same environmental variables are used for every species.
A list that contains for each species the formula that describes the abiotic part of the model. In this case, different species can be modeled as a function of different environmental covariates. The names of the list must coincide with the names of the species.

sp.formula

(optional) It allows to specify a particular definition of the biotic part of the model, e.g., using composite variables (e.g., richness), or an interaction of the biotic and abitic component. More details in 'Details'.

sp.partition

(optional) a list to specify groups of species that are used to compute composite variables, e.g., a species can be modeled as a function of the richness of each group of preys. It has to be a list, each element is a vector containing the names of species in the group. More details in 'Details'.

penal

Penalisation method to shrink regression coefficients. If NULL (default), the model does not penalise the regression coefficient. For now, available penalization method are "horshoe" for method stan_glm, "elasticnet" for method glm. It is also possible to constrain the sign of biotic coefficients (prey coefficients are set to positive and predator coefficients to negative) by setting "coeff.signs" for methods glm and stan_glm.

mode

"prey" if bottom-up control (default), "predators" otherwise. Notice that G needs to be such that links point from predators to prey.

method

which SDM method to use. For now the available choices are: "glm" (frequentist) or "stan_glm" (full bayesian MCMC, default). Notice that using "glm" does not allow error propagation when predicting.

family

the family parameter of the glm function (see glm). gaussian(link = "identity") for gaussian data. binomial(link = "logit") or binomial(link = "probit") for presence-absence data.

iter

(for "stan_glm" only) Number of iterations for each MCMC chain if stan_glm is used

chains

(for "stan_glm" only) Number of MCMC chains (default to 2)

run.parallel

Whether species models are fitted in parallel (can speed computational up time). Default to FALSE.

verbose

Whether to print algorithm progresses

Value

A "trophicSDMfit" object, containing:

model: A list containing the local models (i.e. a SDM for each species). Each local model is an object of class "SDMfit". See ?SDMfit for more informations.
Y: A numeric vector of standard errors on parameters
form.all: A list describing each species formula (both biotic and abiotic terms)
data: A list containing all the data used to fit the model
model.call: A list containing the modeling choices of the fitted model (e.g. method, penalisation...)
coef: A list containing, for each species, the inferred coefficients (with credible intervals or p-values when available)
MCMC.diag: MCMC convergence metrics, only available for MCMC methods
AIC: Model's AIC
log.lik: Model's log.likelihood

Details

"sp.formula" and "sp.partition" can be combined to define any kind of composite variables for the biotic part of the formula. "sp.formula" can be :

A string defining a formula as function of "richness", e.g., "richness+I(richness)^2" (species are modeled as a function of a quadratic polynomial of their prey richness), "I(richness>0)" (species are modeled as a function of a dummy variable that is equal to 1 when at least one species is present). Importantly, when group of preys (or predators) are specified by "sp.partition", species are modeled as a function of the composite variable specified by "sp.formula" for each of their prey (or predator) groups.
A more flexible option is to specify sp.formula as a list (whose names are species' names) that contains for each species the definition of the biotic part of the model. Notice that, in this case, the function does not check that the model is a DAG. This allow to define any kind of composite variable, or to model interactions between environmental covariates and preys (or predators).

Author

Giovanni Poggiato and Jérémy Andréoletti

Examples

data(Y, X, G)
# define abiotic part of the model
env.formula = "~ X_1 + X_2"
# Run the model with bottom-up control using stan_glm as fitting method and no penalisation
# Increase the number of iterations to obtain reliable results.
m = trophicSDM(Y,X,G, env.formula, iter = 50,
               family = binomial(link = "logit"), penal = NULL, 
               mode = "prey", method = "stan_glm")
#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#bulk-ess
#> Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#tail-ess
#> Warning: There were 1 chains where the estimated Bayesian Fraction of Missing Information was low. See
#> https://mc-stan.org/misc/warnings.html#bfmi-low
#> Warning: Examine the pairs() plot to diagnose sampling problems
#> Warning: The largest R-hat is 1.18, indicating chains have not mixed.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#r-hat
#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#bulk-ess
#> Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#tail-ess
#> Warning: Markov chains did not converge! Do not analyze results!
#> Warning: The largest R-hat is 1.09, indicating chains have not mixed.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#r-hat
#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#bulk-ess
#> Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#tail-ess
#> Warning: The largest R-hat is 1.09, indicating chains have not mixed.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#r-hat
#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#bulk-ess
#> Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#tail-ess
#> Warning: The largest R-hat is 1.59, indicating chains have not mixed.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#r-hat
#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#bulk-ess
#> Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#tail-ess
#> Warning: Markov chains did not converge! Do not analyze results!
#> Warning: The largest R-hat is 1.08, indicating chains have not mixed.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#r-hat
#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#bulk-ess
#> Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#tail-ess
print(m)
#> ==================================================================
#> A trophicSDM fit with no penalty, fitted using stan_glm 
#>  with a bottom-up approach 
#>  
#>  Number of species : 6 
#>  Number of links : 6 
#> ==================================================================
#> * Useful fields
#>     $coef 
#> * Useful S3 methods
#>     print(), coef(), plot(), predict(), evaluateModelFit() 
#>     predictPotential(), plotG(), plotG_inferred(), computeVariableImportance() 
#> * Local models (i.e. single species SDM) can be accessed through 
#>     $model

# Access local models (e.g. species "Y5")
m$model$Y5
#> ================================================================== 
#> Local SDMfit for species Y5 with no penalty , fitted using stan_glm 
#> ================================================================== 
#> * Useful S3 methods
#>     print(), coef(), plot(), predict()
#>     $model gives the stanreg class object 
#> ================================================================== 
coef(m$model$Y5)
#>                   mean       2.5%      97.5%
#> (Intercept) -0.5964227 -1.0859447 -0.1829878
#> X_1          1.5270260  1.2337946  2.0596821
#> X_2         -2.3882902 -2.9795288 -1.9103480
#> Y1           1.9737874  1.6128601  2.3532350
#> Y2          -0.1652161 -0.4380471  0.1670369
#> Y3          -1.0254082 -1.2603072 -0.6830364
# The fitted model can be plotted with `plot(m)`

# Fit a sparse model in the Bayesian framework with the horshoe prior
# \donttest{
m = trophicSDM(Y,X,G, env.formula, 
               family = binomial(link = "logit"), penal = "horshoe", 
               mode = "prey", method = "stan_glm")
#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#bulk-ess
#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#bulk-ess
#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#bulk-ess
#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#bulk-ess
#> Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#tail-ess
#> Warning: The largest R-hat is 1.06, indicating chains have not mixed.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#r-hat
#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#bulk-ess
#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#bulk-ess
#> Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#tail-ess
# }
# Fit frequentist glm
m = trophicSDM(Y,X,G, env.formula, 
               family = binomial(link = "logit"), penal = NULL, 
               mode = "prey", method = "glm")
               
# With elasticnet penalty   
m = trophicSDM(Y,X,G, env.formula, 
               family = binomial(link = "logit"), penal = "elasticnet", 
               mode = "prey", method = "glm")

#### Composite variables
# See vignette 'Composite variables' for a complete introduction to the use of composite variables
# Model species as a function of a quadratic polynomial of prey richness
m = trophicSDM(Y,X,G, env.formula, 
               family = binomial(link = "logit"), penal = NULL, 
               sp.formula = "richness + I(richness^2)",
               mode = "prey", method = "glm")
#> Formula was modified since it led to identical columns of the design matrix (e.g. Y_1 or Y_1^2 for binary data)
m$form.all
#> $Y1
#> [1] "y ~ X_1 + X_2"
#> 
#> $Y2
#> [1] "y ~ X_1 + X_2"
#> 
#> $Y3
#> [1] "y ~ X_1 + X_2"
#> 
#> $Y4
#> [1] "y ~ X_1+X_2+I(Y3)"
#> 
#> $Y5
#> [1] "y ~ X_1+X_2+I(Y1 + Y2 + Y3)+I(I(Y1 + Y2 + Y3)^2)"
#> 
#> $Y6
#> [1] "y ~ X_1+X_2+I(Y3 + Y5)+I(I(Y3 + Y5)^2)"
#> 
# Notice that for predators that feed on a single prey (with presence-absence data),
# their richness and the square of their richness is exactly the same variable
# In this case, `trophicSDM()` removes the redundant variable but prints a warning message

# Model species as a function of a dummy variable saying whether they have at leaste one prey
m = trophicSDM(Y,X,G, env.formula, 
               family = binomial(link = "logit"), penal = NULL, 
               sp.formula = "I(richness>0)",
               mode = "prey", method = "glm")
m$form.all
#> $Y1
#> [1] "y ~ X_1 + X_2"
#> 
#> $Y2
#> [1] "y ~ X_1 + X_2"
#> 
#> $Y3
#> [1] "y ~ X_1 + X_2"
#> 
#> $Y4
#> [1] "y ~ X_1+X_2+I(I(Y3) > 0)"
#> 
#> $Y5
#> [1] "y ~ X_1+X_2+I(I(Y1 + Y2 + Y3) > 0)"
#> 
#> $Y6
#> [1] "y ~ X_1+X_2+I(I(Y3 + Y5) > 0)"
#> 

# Define group of preys and model species as a function of the richness (with a quadratic term)
# of these groups of preys separately

# Species Y1 and Y2 belong to the same group, species Y3 and Y4 are both alone in their group and 
# species Y5 and Y6 form another group
sp.partition = list(c("Y1","Y2"),c("Y3"),c("Y4"), c("Y5","Y6"))

m = trophicSDM(Y,X,G, env.formula, 
               family = binomial(link = "logit"), penal = NULL, 
               sp.partition = sp.partition,
               sp.formula = "richness + I(richness^2)",
               mode = "prey", method = "glm")
#> Formula was modified since it led to identical columns of the design matrix (e.g. Y_1 or Y_1^2 for binary data)
#> Formula was modified since it led to identical columns of the design matrix (e.g. Y_1 or Y_1^2 for binary data)
#> Formula was modified since it led to identical columns of the design matrix (e.g. Y_1 or Y_1^2 for binary data)
m$form.all
#> $Y1
#> [1] "y ~ X_1 + X_2"
#> 
#> $Y2
#> [1] "y ~ X_1 + X_2"
#> 
#> $Y3
#> [1] "y ~ X_1 + X_2"
#> 
#> $Y4
#> [1] "y ~ X_1+X_2+I((Y3))"
#> 
#> $Y5
#> [1] "y ~ X_1+X_2+I((Y1 + Y2))+I(I((Y1 + Y2)^2))+I((Y3))"
#> 
#> $Y6
#> [1] "y ~ X_1+X_2+I((Y3))+I((Y5))"
#>