Hi!
I am trying to model a dataset where each object has a variable number of features.
In particular, I am trying to build on a finite approximated beta process-bernoulli process type of model, where each object has a variable-length list of values drawn from a normal distribution, and where the rest of the list is filled with some dummy values.
My example model is as follows:
import pymc3 as pm
import numpy as np
import scipy.stats.distributions as dist
import theano.tensor as tt
D = 100
A = 4
K = 15
poissons = dist.poisson.rvs(A, size=D)
normals = []
for p in poissons:
normals.append(
np.concatenate([dist.norm.rvs(0, 1, size=p), -999 * np.ones(K - p)]))
normals = np.asarray(normals)
with pm.Model() as model:
pis = pm.Beta('pis', A/K, 1, shape=K)
bs = pm.Bernoulli('bs', pis, shape=K)
ns = pm.Normal('ns', 0, 1, shape=K)
errs = pm.Normal('errs', -999, 1e-10, shape=K)
vs = pm.Normal('vs', tt.switch(tt.eq(bs, 0), ns, errs), 1, observed=normals)
Note that it tries to model the contained values with two different distributions: one for normal values ‘ns’ and one for dummy values ‘errs’. It seems to work fine if I simply have the switch inside the mean of a single Normal, but I would like in the future to have different types of distributions for ‘ns’ and ‘errs’.
Currently, when sampling:
with model:
trace = pm.sample(10)
I get the following error:
ValueError: Mass matrix contains zeros on the diagonal. Some derivatives might always be zero
.
Are there any suggestions on how to possibly avoid such error, or how to change the model so it works for the particular use case?
Thank you in advance!