I have been trying to model a mixture of censored normals and I have ran into some warnings and errors which I could not resolve. The warning itself does not cause any problem and the error appears when I try to sample logp of the data with respect to components (rather than the mixture). The model is below:
import pymc as pm
import numpy as np
import arviz as az
import pytensor.tensor as ptt
import pytensor as pt
means = np.array([0.0, 3.0])
sd = 1
lower = -0.5
ndata = [100, 100]
np.random.seed(0)
data = np.zeros((np.sum(ndata),))
data[:ndata[0]] = np.random.normal(means[0], sd, size=ndata[0])
data[ndata[0]:np.sum(ndata)] = np.random.normal(means[1], sd, size=ndata[1])
data[data<lower] = lower
with pm.Model() as model:
# if I use the prior below the code does not work
centers = pm.Normal("centers", means, np.array([2,2]), size=2,
transform=pm.distributions.transforms.univariate_ordered,
initval=[0.0, 3.0])
# If I use it like below it works but cant do ordering constraint.
# Using the normal above but removing transform does not work.
# centers = [pm.Normal(f"centers{i}", means[i], 2) for i in range(2)]
weights = pm.Dirichlet("weights", a=np.ones(2))
sd = 1
components =\
[pm.Censored.dist(pm.Normal.dist(centers[i], sd), lower=lower, upper=np.inf,
shape=int(np.sum(ndata))) for i in range(2)]
# if I sample the logp below then it produces and error, if I remove it,
# it works fine.
pm.Deterministic("comp0_logp",
pm.logp(components[0], data))
pm.Deterministic("comp1_logp",
pm.logp(components[1], data))
mix = pm.Mixture("like", w=weights, comp_dists=components, observed=data)
trace = pm.sample(5000, chains=6, tune=1000)
1- When I run this code I first of all get the following warning:
UserWarning: Found a random variable that was neither among the observations nor the conditioned variables: [normal_rv{0, (0, 0), floatX, False}.0, normal_rv{0, (0, 0), floatX, False}.out]
When I dig into this and identify what variable this is, it turns out to be centers. I do not understand however what is going on in here.
2- If I let the code run including the pm.logp parts I get the following error:
ValueError: Multiple update expressions found for the variable RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F06B2263120>)
Again digging into this reveals that this variable is the centers variable.
3- Motivated by this, if I change the centers expression to what is below it (commented out), warnings remain but errors disappear.
When I do pytensor dprint on mix this is what I get:
MarginalMixtureRV{inline=True}.1 [id A] 'like'
|RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F06B1E61580>) [id B]
|dirichlet_rv{1, (1,), floatX, False}.1 [id C] 'weights'
| |RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F06B203CAC0>) [id D]
| |TensorConstant{[]} [id E]
| |TensorConstant{11} [id F]
| |TensorConstant{(2,) of 1.0} [id G]
|UnmeasurableCensoredRV{inline=True} [id H]
| |normal_rv{0, (0, 0), floatX, False}.1 [id I]
| | |RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F06B1E613C0>) [id J]
| | |TensorConstant{(1,) of 200} [id K]
| | |TensorConstant{11} [id L]
| | |Subtensor{int64} [id M]
| | | |normal_rv{0, (0, 0), floatX, False}.1 [id N] 'centers'
| | | | |RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F06B2263120>) [id O]
| | | | |TensorConstant{(1,) of 2} [id P]
| | | | |TensorConstant{11} [id Q]
| | | | |TensorConstant{[0. 3.]} [id R]
| | | | |TensorConstant{(2,) of 2.0} [id S]
| | | |ScalarConstant{0} [id T]
| | |TensorConstant{1.0} [id U]
| |TensorConstant{-0.5} [id V]
| |TensorConstant{inf} [id W]
|UnmeasurableCensoredRV{inline=True} [id X]
|normal_rv{0, (0, 0), floatX, False}.1 [id Y]
| |RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F06B1EE0660>) [id Z]
| |TensorConstant{(1,) of 200} [id BA]
| |TensorConstant{11} [id BB]
| |Subtensor{int64} [id BC]
| | |normal_rv{0, (0, 0), floatX, False}.1 [id N] 'centers'
| | |ScalarConstant{1} [id BD]
| |TensorConstant{1.0} [id BE]
|TensorConstant{-0.5} [id BF]
|TensorConstant{inf} [id BG]
I don’t understand every part of this tree but I find it strange that at the two places where the centers appear, they don’t appear symmetrically. This does not change if I remove the ordering transformation. They do appear symmetrically (as center0 and center1) if I define the centers as how I do it in the commented out section.
I want to sample the logp values because I want to use them to actually sample the probability of each data belonging to a specific cluster.
Thanks