Problem with hierachical occupancy model


#1

model1 ![model3|611x71]

Hi all,
I am new to PyMC3. I am having trouble implementing the occupancy model shown in the images above. The alpha and beta each have a prior distribution that follows a multivariate normal distribution.

with pm.Model() as _sampler:
    beta = pm.Normal('beta', mu=0, sd=1000, shape=len(HYPERS['b_mu'])) 
    alpha = pm.Normal('alpha', mu=0, sd=1000, shape=len(HYPERS['a_mu']))
    z = []
    y_ = []
    for i in range(X.shape[0]):
        x_b = pm.math.dot(X[i], beta)
        psi = pm.math.invlogit(x_b)
        z.append(pm.Bernoulli('z'+str(i), psi))
        ydata = y[i]
        yy = []
        for j in range(W[i].shape[0]):
            w_a = pm.math.dot(W[i][j], alpha)
            d = pm.math.invlogit(w_a)
            prob = z[i] * d     
            yy.append(pm.Bernoulli('yy'+str(i)+','+str(j), p=prob, observed=ydata))
        y_.append(yy)
    z = tt.stack(z)
    y_ = tt.stack(y_)
with _sampler:
    # obtain starting values via MAP
    start = pm.find_MAP(model=_sampler)
    # instantiate sampler
    step = pm.Metropolis()
    
    _trace = pm.sample(5000, step=step, start=start)
    
pm.traceplot(_trace[2500:])

The X is a N by p matrix that has the covariate information regarding the beta parameter and W is a N by K by q 3 dimensional array that contains the covariate information regarding the alpha parameter. y is stored in python as a dictionary in Python with its keys being the index (1 to N). The W array is also stored as a dictionary where each element of the keys is a 2d numpy array. Also in the code I used a different link function than in the pictures (logit instead of probit).

The problem is when I run this code this is the output I get from the console:
logp = -inf, ||grad|| = 12.207: 100%|##############################################| 3/3 [00:00<00:00, 28.29it/s]
{‘beta’: array([ -8.19231921e-01, -8.18577127e-17]), ‘alpha’: array([ 0.57346234, 0. ]), ‘z0’: array(0), ‘z1’: array(0), ‘z2’: array(0), ‘z3’: array(0), ‘z4’: array(0), ‘z5’: array(0), ‘z6’: array(0), ‘z7’: array(0), ‘z8’: array(0), ‘z9’: array(0), ‘z10’: array(0), ‘z11’: array(0), ‘z12’: array(0), ‘z13’: array(0), ‘z14’: array(0), ‘z15’: array(0), ‘z16’: array(0), ‘z17’: array(0), ‘z18’: array(0), ‘z19’: array(0)}

After that nothing happens and it hangs forever. I can implement this algorithm by directly working out the posterior full conditionals and build a class where I can iteratively sample using Gibbs, but I want to make it work using PyMC3 but it seems like I am doing something wrong. Please help!


Including a multivariate constrained improper prior in a hierachical model
#2

You should use the default sampler (i.e., doing trace = pm.sample(...)), as Metropolis sampler does not provides valid proposal in this case (it has symmetric kernel which means the sampling of zz easily goes outside of the supported 0, 1 value and thus all rejected).

In addition, you should rewrite the nested for loop - they are expensive and quite inefficient under theano.


#3

Is there a suggested way of eliminating the for loop given that I have variables like y and d which have 2 subscripts (i and j)? I tried before but it would just throw unclear errors over and over. Also do you think the rest of the code correct? It is my first time using PyMC3 so I am very unsure of what im doing even after reading the documentation.


#4

In general, you want to think in terms of matrix multiplication, which do away the loop over some dimensions. For example, use tensordot from theano to do operation along some dimensions.

Another 2 comments on your models are:

  1. sd=1000 is likely too wide even for a weakly informative prior
  2. latent labels (ie., pm.Bernoulli('z'+str(i), psi)) is usually difficult to sample from efficiently (unless the dimension is low), rewriting it to a marginalized mixture model is much more efficient.

It’s a bit difficult to say as I dont know the dimension of your inputs - If you can put your simulated data and code in a notebook, it would be much easier for me to play around and give you some advice.


#5
import numpy as np
from scipy.linalg import inv
from scipy.special import expit # inverse logit

#######################################################################
np.random.seed(265)#np.random.randint(1, 2**16))
#set the true parameter values of the parameters
true_alpha = np.array([1.35, 1.75])
true_beta = np.array([-1.85, 2.5])
len_alpha = len(true_alpha)
len_beta = len(true_beta)
#n = number of sites visited, K = number of visits per site
n = 100
K_ = 20
K = np.random.randint(2,K_, n)
#K = np.repeat(K_, n)
#=========================generate x and W matrices=========================
#generate an array with n*(r-1) elements as uniform variates on (-2, 2)
x1 = np.random.uniform(-2.0, 2.0, size = n*(len_beta-1))
x = (x1 - np.mean(x1)) / np.std(x1) #standardize the generated results
#X is 2D array with dimensions (n, r)
X = np.ones((n, len_beta), dtype=np.float64) #populate from scipy.stats import bernoullithe X array with 1's
X[:, 1:len_beta] = x.reshape(n, len_beta-1)
psi = expit(X @ true_beta) #returns an array of length n
#generate the true occupancy state data
true_z = np.random.binomial(1, p=psi,size=n) #returns array of length n

W, d, y = {}, {}, {}
#y = np.zeros((n, K_), dtype=np.float64)
for i in range(n):
    w1 = np.random.uniform(-5.0, 5.0, size = K[i]*(len_alpha-1))
    w = (w1 - np.mean(w1)) / np.std(w1) #standardize the generated results
    #W is a 3D array with dimensions (n, K, s)
    _W = np.ones((K[i], len_alpha), dtype=np.float64) #populate the W array with 1's
    #replace all the w_ik covariates with elements of w, except the 1st columns
    _W[:, 1:len_alpha] = w.reshape(K[i], len_alpha-1)
    d[i] = expit(_W @ true_alpha)
    W[i] = _W
    y[i] = np.random.binomial(1, true_z[i]*d[i])

above is the simulated data I used for the model. y is the detection-nondetection where y_ij = 1 if a species is observed in site i during the j’th visit. the indexing for i starts from 1 to n (where n is the numbers of sites). j is the number of visits per site (j on site i ranges from 1 to K, where K is the maximum number of visits). What I end up extracting from the code above is the W covariate dictionary, X matrix, and the y data dictionary.

eventually I want to use this model to run it on a code with roughly 2000 sites (n = 2000) where the visits per site can be anywhere between 1 and 50 visits. Also the covariates I use for psi_i and d_ij are 3-dimentional vectors.

Also, I get this error when I provide my own starting points on the sampler:
AttributeError: ("'int' object has no attribute 'dtype'", 'Container name "z0_shared__"')
I hope that makes more sense


#6

I would do it this way:
First reshape everything so within the pm.Model we are not dealing with dict:

index = []
W_r = []
y_r = []
for i in range(n):
    index.append(np.ones((W[i].shape[0], 1))*i)
    W_r.append(W[i])
    y_r.append(y[i][:, np.newaxis])
index = np.concatenate(index, axis=0).astype(int64)
W_r = np.concatenate(W_r, axis=0)
y_r = np.concatenate(y_r, axis=0)

Then I will use matrix multiplication instead of nested for loops:

with pm.Model() as m:
    beta = pm.Normal('beta', mu=0, sd=10, shape=(2, 1)) 
    alpha = pm.Normal('alpha', mu=0, sd=10, shape=(2, 1))
    x_b = pm.math.dot(X, beta)
    psi = pm.math.invlogit(x_b).flatten()
    
    w_a = pm.math.dot(W_r, alpha)
    d = pm.math.invlogit(w_a)
    
    # z = pm.Bernoulli('z', psi, shape=n)
    # prob = z[index] * d
    prob = psi[index] * d
    
    obs = pm.Bernoulli('y', p=prob, observed=y_r)
    trace = pm.sample(1000, tune=1000)

Note that I commented out the latent label part z = pm.Bernoulli('z', psi, shape=n), as the default value returns z being all zeros which gives error in the observed (it gives an error when you have p=0 but observed=1).

Then after inference, you can generate the latent label by sampling from a Bernoulli conditioned on the posterior:

with m:
    z = pm.Bernoulli('z', psi, shape=n)
    ppc = pm.sample_ppc(trace, vars=[z])

You can have a look at notebook here: https://github.com/junpenglao/Planet_Sakaar_Data_Science/blob/master/PyMC3QnA/discourse_1702.ipynb (which the second one use a mixture parameterization)


Frequently Asked Questions
Looping through random variables
#7

Thank you so much for your time, it works as expected.