I am trying to model a dataset where each object has a variable number of features.
In particular, I am trying to build on a finite approximated beta process-bernoulli process type of model, where each object has a variable-length list of values drawn from a normal distribution, and where the rest of the list is filled with some dummy values.
My example model is as follows:
import pymc3 as pm import numpy as np import scipy.stats.distributions as dist import theano.tensor as tt D = 100 A = 4 K = 15 poissons = dist.poisson.rvs(A, size=D) normals =  for p in poissons: normals.append( np.concatenate([dist.norm.rvs(0, 1, size=p), -999 * np.ones(K - p)])) normals = np.asarray(normals) with pm.Model() as model: pis = pm.Beta('pis', A/K, 1, shape=K) bs = pm.Bernoulli('bs', pis, shape=K) ns = pm.Normal('ns', 0, 1, shape=K) errs = pm.Normal('errs', -999, 1e-10, shape=K) vs = pm.Normal('vs', tt.switch(tt.eq(bs, 0), ns, errs), 1, observed=normals)
Note that it tries to model the contained values with two different distributions: one for normal values ‘ns’ and one for dummy values ‘errs’. It seems to work fine if I simply have the switch inside the mean of a single Normal, but I would like in the future to have different types of distributions for ‘ns’ and ‘errs’.
Currently, when sampling:
with model: trace = pm.sample(10)
I get the following error:
ValueError: Mass matrix contains zeros on the diagonal. Some derivatives might always be zero
Are there any suggestions on how to possibly avoid such error, or how to change the model so it works for the particular use case?
Thank you in advance!