Strange warning and error when using censored with mixture

iavicenna · October 30, 2023, 3:48pm

I have been trying to model a mixture of censored normals and I have ran into some warnings and errors which I could not resolve. The warning itself does not cause any problem and the error appears when I try to sample logp of the data with respect to components (rather than the mixture). The model is below:

import pymc as pm
import numpy as np
import arviz as az
import pytensor.tensor as ptt
import pytensor as pt

means =  np.array([0.0, 3.0])
sd = 1
lower = -0.5
ndata = [100, 100]

np.random.seed(0)

data = np.zeros((np.sum(ndata),))
data[:ndata[0]] = np.random.normal(means[0], sd, size=ndata[0])
data[ndata[0]:np.sum(ndata)] = np.random.normal(means[1], sd, size=ndata[1])
data[data<lower] = lower


with pm.Model() as model:

  # if I use the prior below the code does not work
  centers = pm.Normal("centers", means, np.array([2,2]), size=2,
                      transform=pm.distributions.transforms.univariate_ordered,
                      initval=[0.0, 3.0])

  # If I use it like below it works but cant do ordering constraint.
  # Using the normal above but removing transform does not work.
  # centers = [pm.Normal(f"centers{i}", means[i], 2) for i in range(2)]

  weights = pm.Dirichlet("weights", a=np.ones(2))
  sd = 1

  components =\
    [pm.Censored.dist(pm.Normal.dist(centers[i], sd), lower=lower, upper=np.inf,
                      shape=int(np.sum(ndata))) for i in range(2)]

  # if I sample the logp below then it produces and error, if I remove it, 
  # it works fine.
  pm.Deterministic("comp0_logp",
                   pm.logp(components[0], data))

  pm.Deterministic("comp1_logp",
                   pm.logp(components[1], data))

  mix = pm.Mixture("like", w=weights, comp_dists=components, observed=data)

  trace = pm.sample(5000, chains=6, tune=1000)

1- When I run this code I first of all get the following warning:

UserWarning: Found a random variable that was neither among the observations nor the conditioned variables: [normal_rv{0, (0, 0), floatX, False}.0, normal_rv{0, (0, 0), floatX, False}.out]

When I dig into this and identify what variable this is, it turns out to be centers. I do not understand however what is going on in here.

2- If I let the code run including the pm.logp parts I get the following error:

ValueError: Multiple update expressions found for the variable RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F06B2263120>)

Again digging into this reveals that this variable is the centers variable.

3- Motivated by this, if I change the centers expression to what is below it (commented out), warnings remain but errors disappear.

When I do pytensor dprint on mix this is what I get:

MarginalMixtureRV{inline=True}.1 [id A] 'like'
 |RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F06B1E61580>) [id B]
 |dirichlet_rv{1, (1,), floatX, False}.1 [id C] 'weights'
 | |RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F06B203CAC0>) [id D]
 | |TensorConstant{[]} [id E]
 | |TensorConstant{11} [id F]
 | |TensorConstant{(2,) of 1.0} [id G]
 |UnmeasurableCensoredRV{inline=True} [id H]
 | |normal_rv{0, (0, 0), floatX, False}.1 [id I]
 | | |RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F06B1E613C0>) [id J]
 | | |TensorConstant{(1,) of 200} [id K]
 | | |TensorConstant{11} [id L]
 | | |Subtensor{int64} [id M]
 | | | |normal_rv{0, (0, 0), floatX, False}.1 [id N] 'centers'
 | | | | |RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F06B2263120>) [id O]
 | | | | |TensorConstant{(1,) of 2} [id P]
 | | | | |TensorConstant{11} [id Q]
 | | | | |TensorConstant{[0. 3.]} [id R]
 | | | | |TensorConstant{(2,) of 2.0} [id S]
 | | | |ScalarConstant{0} [id T]
 | | |TensorConstant{1.0} [id U]
 | |TensorConstant{-0.5} [id V]
 | |TensorConstant{inf} [id W]
 |UnmeasurableCensoredRV{inline=True} [id X]
   |normal_rv{0, (0, 0), floatX, False}.1 [id Y]
   | |RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F06B1EE0660>) [id Z]
   | |TensorConstant{(1,) of 200} [id BA]
   | |TensorConstant{11} [id BB]
   | |Subtensor{int64} [id BC]
   | | |normal_rv{0, (0, 0), floatX, False}.1 [id N] 'centers'
   | | |ScalarConstant{1} [id BD]
   | |TensorConstant{1.0} [id BE]
   |TensorConstant{-0.5} [id BF]
   |TensorConstant{inf} [id BG]

I don’t understand every part of this tree but I find it strange that at the two places where the centers appear, they don’t appear symmetrically. This does not change if I remove the ordering transformation. They do appear symmetrically (as center0 and center1) if I define the centers as how I do it in the commented out section.

I want to sample the logp values because I want to use them to actually sample the probability of each data belonging to a specific cluster.

Thanks

ricardoV94 · October 30, 2023, 6:40pm

This is an area that we have made improvements more recently. Unfortunately you are using a (relatively) old version of PyMC.

If you try running your code in the latest version of PyMC you will get a more informative warning:

UserWarning: RandomVariables {centers} were found in the derived graph. These variables are a clone and do not match the original ones on identity.
If you are deriving a quantity that depends on model RVs, use `model.replace_rvs_by_values` first. For example: `logp(model.replace_rvs_by_values([rv])[0], value)`
  pm.logp(components[0], data))

In your case, you can write the Deterministics like this:

safe_components = model.replace_rvs_by_values(components)
pm.Deterministic("comp0_logp", pm.logp(safe_components[0], data))
pm.Deterministic("comp1_logp", pm.logp(safe_components[1], data))

The reason why you need this extra step is a bit too technical, but tends to happen when mixing logp and random graphs (when you write a PyMC model you usually write only random graphs)

iavicenna · October 31, 2023, 6:29pm

Ah yes, indeed the warnings disappear with the newer version and the error disappears when I sample logp the way you suggested. Should have checked for the version first!. Thanks.

Topic		Replies	Views
Mixture of Censored iid Normals v5	24	534	March 5, 2024
Modelling censored data in pymc v4: a simple Gaussian random effects example v5 development , modeling	0	385	August 30, 2022
Weibull mixed distribution	0	269	September 22, 2023
Version dependant slowing down of Gaussian Mixture sampling in Ubuntu 20.04	44	716	November 7, 2023
Error when trying mixture distribution for likelihood version agnostic modeling	1	338	October 20, 2022

Strange warning and error when using censored with mixture

Related topics