Problem with hierachical occupancy model

model1 ![model3|611x71]

Hi all,
I am new to PyMC3. I am having trouble implementing the occupancy model shown in the images above. The alpha and beta each have a prior distribution that follows a multivariate normal distribution.

with pm.Model() as _sampler:
    beta = pm.Normal('beta', mu=0, sd=1000, shape=len(HYPERS['b_mu'])) 
    alpha = pm.Normal('alpha', mu=0, sd=1000, shape=len(HYPERS['a_mu']))
    z = []
    y_ = []
    for i in range(X.shape[0]):
        x_b = pm.math.dot(X[i], beta)
        psi = pm.math.invlogit(x_b)
        z.append(pm.Bernoulli('z'+str(i), psi))
        ydata = y[i]
        yy = []
        for j in range(W[i].shape[0]):
            w_a = pm.math.dot(W[i][j], alpha)
            d = pm.math.invlogit(w_a)
            prob = z[i] * d     
            yy.append(pm.Bernoulli('yy'+str(i)+','+str(j), p=prob, observed=ydata))
        y_.append(yy)
    z = tt.stack(z)
    y_ = tt.stack(y_)
with _sampler:
    # obtain starting values via MAP
    start = pm.find_MAP(model=_sampler)
    # instantiate sampler
    step = pm.Metropolis()
    
    _trace = pm.sample(5000, step=step, start=start)
    
pm.traceplot(_trace[2500:])

The X is a N by p matrix that has the covariate information regarding the beta parameter and W is a N by K by q 3 dimensional array that contains the covariate information regarding the alpha parameter. y is stored in python as a dictionary in Python with its keys being the index (1 to N). The W array is also stored as a dictionary where each element of the keys is a 2d numpy array. Also in the code I used a different link function than in the pictures (logit instead of probit).

The problem is when I run this code this is the output I get from the console:
logp = -inf, ||grad|| = 12.207: 100%|##############################################| 3/3 [00:00<00:00, 28.29it/s]
{β€˜beta’: array([ -8.19231921e-01, -8.18577127e-17]), β€˜alpha’: array([ 0.57346234, 0. ]), β€˜z0’: array(0), β€˜z1’: array(0), β€˜z2’: array(0), β€˜z3’: array(0), β€˜z4’: array(0), β€˜z5’: array(0), β€˜z6’: array(0), β€˜z7’: array(0), β€˜z8’: array(0), β€˜z9’: array(0), β€˜z10’: array(0), β€˜z11’: array(0), β€˜z12’: array(0), β€˜z13’: array(0), β€˜z14’: array(0), β€˜z15’: array(0), β€˜z16’: array(0), β€˜z17’: array(0), β€˜z18’: array(0), β€˜z19’: array(0)}

After that nothing happens and it hangs forever. I can implement this algorithm by directly working out the posterior full conditionals and build a class where I can iteratively sample using Gibbs, but I want to make it work using PyMC3 but it seems like I am doing something wrong. Please help!

You should use the default sampler (i.e., doing trace = pm.sample(...)), as Metropolis sampler does not provides valid proposal in this case (it has symmetric kernel which means the sampling of zz easily goes outside of the supported 0, 1 value and thus all rejected).

In addition, you should rewrite the nested for loop - they are expensive and quite inefficient under theano.

Is there a suggested way of eliminating the for loop given that I have variables like y and d which have 2 subscripts (i and j)? I tried before but it would just throw unclear errors over and over. Also do you think the rest of the code correct? It is my first time using PyMC3 so I am very unsure of what im doing even after reading the documentation.

In general, you want to think in terms of matrix multiplication, which do away the loop over some dimensions. For example, use tensordot from theano to do operation along some dimensions.

Another 2 comments on your models are:

  1. sd=1000 is likely too wide even for a weakly informative prior
  2. latent labels (ie., pm.Bernoulli('z'+str(i), psi)) is usually difficult to sample from efficiently (unless the dimension is low), rewriting it to a marginalized mixture model is much more efficient.

It’s a bit difficult to say as I dont know the dimension of your inputs - If you can put your simulated data and code in a notebook, it would be much easier for me to play around and give you some advice.

import numpy as np
from scipy.linalg import inv
from scipy.special import expit # inverse logit

#######################################################################
np.random.seed(265)#np.random.randint(1, 2**16))
#set the true parameter values of the parameters
true_alpha = np.array([1.35, 1.75])
true_beta = np.array([-1.85, 2.5])
len_alpha = len(true_alpha)
len_beta = len(true_beta)
#n = number of sites visited, K = number of visits per site
n = 100
K_ = 20
K = np.random.randint(2,K_, n)
#K = np.repeat(K_, n)
#=========================generate x and W matrices=========================
#generate an array with n*(r-1) elements as uniform variates on (-2, 2)
x1 = np.random.uniform(-2.0, 2.0, size = n*(len_beta-1))
x = (x1 - np.mean(x1)) / np.std(x1) #standardize the generated results
#X is 2D array with dimensions (n, r)
X = np.ones((n, len_beta), dtype=np.float64) #populate from scipy.stats import bernoullithe X array with 1's
X[:, 1:len_beta] = x.reshape(n, len_beta-1)
psi = expit(X @ true_beta) #returns an array of length n
#generate the true occupancy state data
true_z = np.random.binomial(1, p=psi,size=n) #returns array of length n

W, d, y = {}, {}, {}
#y = np.zeros((n, K_), dtype=np.float64)
for i in range(n):
    w1 = np.random.uniform(-5.0, 5.0, size = K[i]*(len_alpha-1))
    w = (w1 - np.mean(w1)) / np.std(w1) #standardize the generated results
    #W is a 3D array with dimensions (n, K, s)
    _W = np.ones((K[i], len_alpha), dtype=np.float64) #populate the W array with 1's
    #replace all the w_ik covariates with elements of w, except the 1st columns
    _W[:, 1:len_alpha] = w.reshape(K[i], len_alpha-1)
    d[i] = expit(_W @ true_alpha)
    W[i] = _W
    y[i] = np.random.binomial(1, true_z[i]*d[i])

above is the simulated data I used for the model. y is the detection-nondetection where y_ij = 1 if a species is observed in site i during the j’th visit. the indexing for i starts from 1 to n (where n is the numbers of sites). j is the number of visits per site (j on site i ranges from 1 to K, where K is the maximum number of visits). What I end up extracting from the code above is the W covariate dictionary, X matrix, and the y data dictionary.

eventually I want to use this model to run it on a code with roughly 2000 sites (n = 2000) where the visits per site can be anywhere between 1 and 50 visits. Also the covariates I use for psi_i and d_ij are 3-dimentional vectors.

Also, I get this error when I provide my own starting points on the sampler:
AttributeError: ("'int' object has no attribute 'dtype'", 'Container name "z0_shared__"')
I hope that makes more sense

I would do it this way:
First reshape everything so within the pm.Model we are not dealing with dict:

index = []
W_r = []
y_r = []
for i in range(n):
    index.append(np.ones((W[i].shape[0], 1))*i)
    W_r.append(W[i])
    y_r.append(y[i][:, np.newaxis])
index = np.concatenate(index, axis=0).astype(int64)
W_r = np.concatenate(W_r, axis=0)
y_r = np.concatenate(y_r, axis=0)

Then I will use matrix multiplication instead of nested for loops:

with pm.Model() as m:
    beta = pm.Normal('beta', mu=0, sd=10, shape=(2, 1)) 
    alpha = pm.Normal('alpha', mu=0, sd=10, shape=(2, 1))
    x_b = pm.math.dot(X, beta)
    psi = pm.math.invlogit(x_b).flatten()
    
    w_a = pm.math.dot(W_r, alpha)
    d = pm.math.invlogit(w_a)
    
    # z = pm.Bernoulli('z', psi, shape=n)
    # prob = z[index] * d
    prob = psi[index] * d
    
    obs = pm.Bernoulli('y', p=prob, observed=y_r)
    trace = pm.sample(1000, tune=1000)

Note that I commented out the latent label part z = pm.Bernoulli('z', psi, shape=n), as the default value returns z being all zeros which gives error in the observed (it gives an error when you have p=0 but observed=1).

Then after inference, you can generate the latent label by sampling from a Bernoulli conditioned on the posterior:

with m:
    z = pm.Bernoulli('z', psi, shape=n)
    ppc = pm.sample_ppc(trace, vars=[z])

You can have a look at notebook here: https://github.com/junpenglao/Planet_Sakaar_Data_Science/blob/master/PyMC3QnA/discourse_1702.ipynb (which the second one use a mixture parameterization)

1 Like

Thank you so much for your time, it works as expected.

1 Like