Strange warning and error when using censored with mixture

I have been trying to model a mixture of censored normals and I have ran into some warnings and errors which I could not resolve. The warning itself does not cause any problem and the error appears when I try to sample logp of the data with respect to components (rather than the mixture). The model is below:

import pymc as pm
import numpy as np
import arviz as az
import pytensor.tensor as ptt
import pytensor as pt

means =  np.array([0.0, 3.0])
sd = 1
lower = -0.5
ndata = [100, 100]

np.random.seed(0)

data = np.zeros((np.sum(ndata),))
data[:ndata[0]] = np.random.normal(means[0], sd, size=ndata[0])
data[ndata[0]:np.sum(ndata)] = np.random.normal(means[1], sd, size=ndata[1])
data[data<lower] = lower


with pm.Model() as model:

  # if I use the prior below the code does not work
  centers = pm.Normal("centers", means, np.array([2,2]), size=2,
                      transform=pm.distributions.transforms.univariate_ordered,
                      initval=[0.0, 3.0])

  # If I use it like below it works but cant do ordering constraint.
  # Using the normal above but removing transform does not work.
  # centers = [pm.Normal(f"centers{i}", means[i], 2) for i in range(2)]

  weights = pm.Dirichlet("weights", a=np.ones(2))
  sd = 1

  components =\
    [pm.Censored.dist(pm.Normal.dist(centers[i], sd), lower=lower, upper=np.inf,
                      shape=int(np.sum(ndata))) for i in range(2)]

  # if I sample the logp below then it produces and error, if I remove it, 
  # it works fine.
  pm.Deterministic("comp0_logp",
                   pm.logp(components[0], data))

  pm.Deterministic("comp1_logp",
                   pm.logp(components[1], data))

  mix = pm.Mixture("like", w=weights, comp_dists=components, observed=data)

  trace = pm.sample(5000, chains=6, tune=1000)

1- When I run this code I first of all get the following warning:

UserWarning: Found a random variable that was neither among the observations nor the conditioned variables: [normal_rv{0, (0, 0), floatX, False}.0, normal_rv{0, (0, 0), floatX, False}.out]

When I dig into this and identify what variable this is, it turns out to be centers. I do not understand however what is going on in here.

2- If I let the code run including the pm.logp parts I get the following error:

ValueError: Multiple update expressions found for the variable RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F06B2263120>)

Again digging into this reveals that this variable is the centers variable.

3- Motivated by this, if I change the centers expression to what is below it (commented out), warnings remain but errors disappear.

When I do pytensor dprint on mix this is what I get:

MarginalMixtureRV{inline=True}.1 [id A] 'like'
 |RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F06B1E61580>) [id B]
 |dirichlet_rv{1, (1,), floatX, False}.1 [id C] 'weights'
 | |RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F06B203CAC0>) [id D]
 | |TensorConstant{[]} [id E]
 | |TensorConstant{11} [id F]
 | |TensorConstant{(2,) of 1.0} [id G]
 |UnmeasurableCensoredRV{inline=True} [id H]
 | |normal_rv{0, (0, 0), floatX, False}.1 [id I]
 | | |RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F06B1E613C0>) [id J]
 | | |TensorConstant{(1,) of 200} [id K]
 | | |TensorConstant{11} [id L]
 | | |Subtensor{int64} [id M]
 | | | |normal_rv{0, (0, 0), floatX, False}.1 [id N] 'centers'
 | | | | |RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F06B2263120>) [id O]
 | | | | |TensorConstant{(1,) of 2} [id P]
 | | | | |TensorConstant{11} [id Q]
 | | | | |TensorConstant{[0. 3.]} [id R]
 | | | | |TensorConstant{(2,) of 2.0} [id S]
 | | | |ScalarConstant{0} [id T]
 | | |TensorConstant{1.0} [id U]
 | |TensorConstant{-0.5} [id V]
 | |TensorConstant{inf} [id W]
 |UnmeasurableCensoredRV{inline=True} [id X]
   |normal_rv{0, (0, 0), floatX, False}.1 [id Y]
   | |RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F06B1EE0660>) [id Z]
   | |TensorConstant{(1,) of 200} [id BA]
   | |TensorConstant{11} [id BB]
   | |Subtensor{int64} [id BC]
   | | |normal_rv{0, (0, 0), floatX, False}.1 [id N] 'centers'
   | | |ScalarConstant{1} [id BD]
   | |TensorConstant{1.0} [id BE]
   |TensorConstant{-0.5} [id BF]
   |TensorConstant{inf} [id BG]

I don’t understand every part of this tree but I find it strange that at the two places where the centers appear, they don’t appear symmetrically. This does not change if I remove the ordering transformation. They do appear symmetrically (as center0 and center1) if I define the centers as how I do it in the commented out section.

I want to sample the logp values because I want to use them to actually sample the probability of each data belonging to a specific cluster.

Thanks

1 Like

This is an area that we have made improvements more recently. Unfortunately you are using a (relatively) old version of PyMC.

If you try running your code in the latest version of PyMC you will get a more informative warning:

UserWarning: RandomVariables {centers} were found in the derived graph. These variables are a clone and do not match the original ones on identity.
If you are deriving a quantity that depends on model RVs, use `model.replace_rvs_by_values` first. For example: `logp(model.replace_rvs_by_values([rv])[0], value)`
  pm.logp(components[0], data))

In your case, you can write the Deterministics like this:

safe_components = model.replace_rvs_by_values(components)
pm.Deterministic("comp0_logp", pm.logp(safe_components[0], data))
pm.Deterministic("comp1_logp", pm.logp(safe_components[1], data))

The reason why you need this extra step is a bit too technical, but tends to happen when mixing logp and random graphs (when you write a PyMC model you usually write only random graphs)

1 Like

Ah yes, indeed the warnings disappear with the newer version and the error disappears when I sample logp the way you suggested. Should have checked for the version first!. Thanks.