Hello PyMC community!
I’m trying to build a site-occupancy model, a common class of model in ecology, in PyMC. The goal of this model is to estimate the true occurrence probability of a species (i.e., prevalence), admitting the fact that we don’t always detect the species during surveys. The typical way of representing this model is through the true latent occurrence state z_j for site j=1,2,\dots,J, with z_j \sim \text{Bernoulli}(\psi_j). Then, the detection of the species y_{j,k} during survey k=1,2,\dots,K depends on the latent state z_j, with y_{j,k} \sim \text{Bernoulli}(p_{j,k} z_j), where p_{j,k} is the probability of detecting the species given that it’s present. Formulating the model in this way permits the use covariates that affect the detection and occurrence probabilities.
I would like to write this model in PyMC without the use of the latent occurrence state (see bottom of this post for a working example that uses z_j). My first thought was to leverage ZeroInflatedBinomial
, but setting \psi=1 when we’ve previously detected the species at the site, e.g.,
psi_eff = psi[site_idx] ** (1 - detected[site_idx])
obs = pm.ZeroInflatedBinomial('y', p=p, psi=psi_eff, n=1, observed=y_flat)
This, however, did not sample well. And sadly that’s as far as I’ve gotten. Do you, gracious PyMC user, have any ideas? If it helps, here is one link and another link to folks doing this in JAGS and Stan.
import numpy as np
import matplotlib.pyplot as plt
import pymc as pm
import arviz as az
import pytensor.tensor as pt
# ecologists like to scale covariates this way
def scale(x):
return (x - x.mean()) / x.std()
def invlogit(x):
return 1 / (1 + np.exp(-x))
## simulation
SEED = None
rng = np.random.default_rng(seed=SEED)
# sampling characteristics
site_count = 100
survey_count = 5
# index for the site
site = np.arange(site_count)
site_idx = np.repeat(site, survey_count)
## ecological model
# true parameter values
beta0_true = 0.5
beta1_true = 1.25
# covariates
x = scale(rng.uniform(size=site_count))
# linear model
mu_true = beta0_true + beta1_true * x
psi_true = invlogit(mu_true)
# simulate occurrence state
z_true = rng.binomial(1, psi_true)
## detection model
# true parameter values
alpha0_true = 1.5
# covariates
pass
# linear model
nu_true = alpha0_true
p_true = invlogit(nu_true)
# simulate detection
y_flat = rng.binomial(1, p_true, survey_count * site_count) * z_true[site_idx]
# reformat data
y = y_flat.reshape(site_count, survey_count)
y_sum = y.sum(axis=1)
detected = y_sum > 0
with pm.Model() as occupancy:
# occurrence process
# priors
beta0 = pm.Normal("beta0", mu=0, sigma=2)
beta1 = pm.Normal("beta1", mu=0, sigma=2)
# linear model
mu = beta0 + beta1 * x[site_idx]
psi = pm.Deterministic("psi", pm.math.invlogit(mu))
# latent occurrence state
z = pm.Bernoulli('z', p=psi)
# detection process
# priors
alpha0 = pm.Normal('alpha0', mu=0, sigma=2)
# linear model
nu = alpha0
p = pm.Deterministic('p', pm.math.invlogit(nu))
# # likelihood
p_eff = z[site_idx] * p
obs = pm.Bernoulli('y', p=p_eff, observed=y_flat)
pm.model_to_graphviz(occupancy)