Shared RandomType issue using nuts-numpyro w.r.t data containing NaNs

Seuri · December 20, 2023, 1:03pm

Hi “long” time reader, first time writer, experiencing problems when trying to use the nuts-numpyro sampling method on data containing NaNs.

I originally tried to use, for different models, the nuts-numpyro sampling method on data sets that I’m currently working with, all of which contain some number of NaNs. Now the setup is as followed, for any of the models, each data set is assigned to the observation of the GEV distribution in PyMC Experimental. When preforming sample from prior predictive, each of the setups generates the following warning:

RNG Variable RandomGeneratorSharedVariable(<Generator(PCG64) at 0x2B84F390B760>) has multiple clients. This is likely an inconsistent random graph. warnings.warn(

From there, after running the simulation using the nuts-numpyro sampling to completion, the following error message is raised:

ValueError: Graph contains shared RandomType variables which cannot be safely replaced

It’s worth mentioning that the models work when using the PyMC nuts sampling.

From the above, I considered the GEV example notebook, using the above sampling method, which worked. However, when adding a np.nan to the example data set, the same error is raised.

Code below

import numpy as np
import pandas as pd
import pymc as pm
import arviz as az
import numpyro
import pymc_experimental
import pymc_experimental.distributions as pmx

print("numpy:", np.__version__)
print("pandas:", pd.__version__)
print("pymc:", pm.__version__)
print("ariz:", az.__version__)
print("numpyro:", numpyro.__version__)
print("pymc_expr:",  pymc_experimental.__version__)

>>numpy: 1.23.5
>>pandas: 2.1.1
>>pymc: 5.9.0
>>ariz: 0.16.0
>>numpyro: 0.13.0
>>pymc_expr: 0.0.12

RANDOM_SEED = 19881129
rng = np.random.default_rng(RANDOM_SEED)

#Two datasets, the first is the same as in the GEV example, with two additional np.nan entries.
NaNs = np.array([4.03, np.nan, np.nan, 3.83, 3.65, 3.88, 4.01, 4.08, 4.18, 3.80, 
                 4.36, 3.96, 3.98, 4.69, 3.85, 3.96, 3.85, 3.93, 
                 3.75, 3.63, 3.57, 4.25, 3.97, 4.05, 4.24, 4.22, 
                 3.73, 4.37, 4.06, 3.71, 3.96, 4.06, 4.55, 3.79, 
                 3.89, 4.11, 3.85, 3.86, 3.86, 4.21, 4.01, 4.11, 
                 4.24, 3.96, 4.21, 3.74, 3.85, 3.88, 3.66, 4.11, 
                 3.71, 4.18, 3.90, 3.78, 3.91, 3.72, 4.00, 3.66, 
                 3.62, 4.33, 4.55, 3.75, 4.08, 3.90, 3.88, 3.94, 
                 4.33])
#The second one as in the example.
noNaNs = np.array([4.03, 3.83, 3.65, 3.88, 4.01, 4.08, 4.18, 3.80, 
                 4.36, 3.96, 3.98, 4.69, 3.85, 3.96, 3.85, 3.93, 
                 3.75, 3.63, 3.57, 4.25, 3.97, 4.05, 4.24, 4.22, 
                 3.73, 4.37, 4.06, 3.71, 3.96, 4.06, 4.55, 3.79, 
                 3.89, 4.11, 3.85, 3.86, 3.86, 4.21, 4.01, 4.11, 
                 4.24, 3.96, 4.21, 3.74, 3.85, 3.88, 3.66, 4.11, 
                 3.71, 4.18, 3.90, 3.78, 3.91, 3.72, 4.00, 3.66, 
                 3.62, 4.33, 4.55, 3.75, 4.08, 3.90, 3.88, 3.94, 
                 4.33])

#Two models, one acting on the set NaNs, the other on the set noNaNs.

with pm.Model() as model_NaNs:
    # Priors
    μ = pm.Normal("μ", mu=3.8, sigma=0.2)
    σ = pm.HalfNormal("σ", sigma=0.3)
    ξ = pm.TruncatedNormal("ξ", mu=0, sigma=0.2, lower=-0.6, upper=0.6)

    # Estimation
    gev = pmx.GenExtreme("gev", mu=μ, sigma=σ, xi=ξ, scipy=True, observed=NaNs)
    # Return level
    z_p = pm.Deterministic("z_p", μ - σ / ξ * (1 - (-np.log(1 - 1/10)) ** (-ξ)))
  
>>ImputationWarning: Data in gev contains missing values and will be automatically imputed from the sampling distribution.
  warnings.warn(impute_message, ImputationWarning)


with pm.Model() as model_noNaNs:
    # Priors
    μ = pm.Normal("μ", mu=3.8, sigma=0.2)
    σ = pm.HalfNormal("σ", sigma=0.3)
    ξ = pm.TruncatedNormal("ξ", mu=0, sigma=0.2, lower=-0.6, upper=0.6)

    # Estimation
    gev = pmx.GenExtreme("gev", mu=μ, sigma=σ, xi=ξ, scipy=True, observed=noNaNs)
    # Return level
    z_p = pm.Deterministic("z_p", μ - σ / ξ * (1 - (-np.log(1 - 1/10)) ** (-ξ)))
    
#Next we sample from the priors.     
with model_NaNs:
    NaNsidata = pm.sample_prior_predictive(samples=5000, random_seed=rng)

>>  UserWarning: RNG Variable RandomGeneratorSharedVariable(<Generator(PCG64) at 0x2AB96EA50740>) has multiple clients. This is likely an inconsistent random graph.
  warnings.warn(
Sampling: [gev_observed, gev_unobserved, μ, ξ, σ]

with model_noNaNs:
    noNaNsidata = pm.sample_prior_predictive(samples=5000, random_seed=rng)

>>>Sampling: [gev, μ, ξ, σ]

#We run the simulation
with model_NaNs:
    NaNstrace = pm.sample(1000, tune=1000, chains=2, cores=2, nuts_sampler="numpyro")

>>>Compiling...
>>>Compilation time =  0:00:03.183715
>>>Sampling...
>>>
>>>Running chain 0: 100%
>>>2000/2000 [00:04<00:00, 5260.12it/s]
>>>Running chain 1: 100%
>>>2000/2000 [00:04<00:00, 5455.31it/s]
>>>
>>>Sampling time =  0:00:04.936128
>>>Transforming variables...

>>>ValueError                                Traceback (most recent call last)
Cell In[16], line 2
      1 with model_NaNs:
----> 2     NaNstrace = pm.sample(1000, tune=1000, chains=2, cores=2, nuts_sampler="numpyro")
:::
:::
:::
>>> shared_variables = [var for var in graph_inputs(graph) if isinstance(var, SharedVariable)]
    101 if any(isinstance(var.type, RandomType) for var in shared_variables):
--> 102     raise ValueError(
    103         "Graph contains shared RandomType variables which cannot be safely replaced"
    104     )
    106 if any(var.default_update is not None for var in shared_variables):
    107     raise ValueError(
    108         "Graph contains shared variables with default_update which cannot "
    109         "be safely replaced."
    110     )

>>>ValueError: Graph contains shared RandomType variables which cannot be safely replaced

with model_noNaNs:
    noNaNstrace = pm.sample(1000, tune=1000, chains=2, cores=2, nuts_sampler="numpyro")

>>>Compiling...
>>>Compilation time =  0:00:01.546892
>>>Sampling...
>>>
>>>Running chain 0: 100%
>>>2000/2000 [00:03<00:00, 7812.54it/s]
>>>Running chain 1: 100%
>>>2000/2000 [00:03<00:00, 6127.70it/s]
>>>
>>>Sampling time =  0:00:03.619498
>>>Transforming variables...
>>>Transformation time =  0:00:00.226946

print(noNaNstrace)
>>>Inference data with groups:
>>>  	> posterior
>>>	        > sample_stats
>>>  	> observed_data

Is there any way to work around this problem, which allows for missing values in the data set?

ricardoV94 · December 20, 2023, 3:35pm

Hi @Seuri, thanks for reporting. This is definitely a bug in PyMC. It should be fixed by this Pull Request: Fix issue with sampling of PartialObservedRVs by ricardoV94 · Pull Request #7071 · pymc-devs/pymc · GitHub

Seuri · December 21, 2023, 7:06am

Hi @ricardoV94, okay great, and of course! I use the PyMC packages everyday , so glad to be of some assistans!

ricardoV94 · December 21, 2023, 11:17am

We just released a new version with the bugfix, should hit the conda channels in a few hours/days. Let me know if that fixes the issues on your end.

Seuri · December 22, 2023, 10:17am

Hi it works on my end now, both for the example above and w.r.t my applications. Thanks for the help

Topic		Replies	Views
Use of numpyro/Jax with pymc-dev v5 jax	5	2270	July 28, 2022
Bayesian VAR with missing data: PyTensor error when sampling with blackjax or numpyro v5 bug , modeling	3	420	June 2, 2023
What is an inconsistent graph? version agnostic modeling	4	398	September 26, 2023
Numpyro sampler version agnostic	1	59	December 4, 2024
Runtime Exception with NUTS v3	16	1093	March 16, 2022

Shared RandomType issue using nuts-numpyro w.r.t data containing NaNs

Related topics