ValueError: Random variables detected in the logp graph - what does it mean?

Dear all,
I have the following model:


exp_rewards = np.random.rand(300) < 0.75

T = 10
with pm.Model() as model:
    k = pm.Normal("k", -2.3, 0.1)
    K = pm.Deterministic("K", pt.exp(k))
    vi = pm.Normal.dist(mu=-2.7, sigma=K)
    ri = pm.Normal.dist(mu=0, sigma=np.exp(vi))
    v = pm.GaussianRandomWalk('v', init_dist=vi, mu=0, sigma=K, steps=T-1)
    V = pm.Deterministic("V", np.exp(-v))
    rw = pm.GaussianRandomWalk('rw', init_dist=ri, mu=0, sigma=V[:-1], steps=T-1, shape=T)
    r = pm.Deterministic("r", pt.math.sigmoid(rw))
    y = pm.Bernoulli(f"y", r, observed=exp_rewards[:T])

    trace = pm.sample(chains=4)

(The goal is actually to reproduce this model, but it would require a “BetaRandomWalk” for rw)

But it is giving me the following error:

ValueError: Random variables detected in the logp graph: {normal_rv{0, (0, 0), floatX, False}.out}.
This can happen when DensityDist logp or Interval transform functions reference nonlocal variables,
or when not all rvs have a corresponding value variable.

By commenting/uncommeting lines, it seems rw declaration is intrudocing the problem, but I don’t understand what the error actually means.
What is problem?

By reading similar threads, such as:

I guess the problem is something like “GaussianRandomWalk of rw requires init_dist (ri) to be an unregistered (unnamed) distribution but this in turn depends on some named variable” (like k, which must be named as it is a variable we want to keep track of in the model) - so in the end there is an unnamed distribution between two named variables and pymc doesn’t like it.
On the other hand, if I comment the line after V, the model runs just fine, so the GaussianRandomWalk of v doesn’t give the same problem…
I am confused… :slight_smile:

I think the problem is here:

ri can’t depend on another unnamed dist. Note that neither vi nor ri are ever used in the model other than to inform PyMC what kind of distribution the initial point has.

You probably want to defind ri as such:

ri = pm.Normal.dist(mu=0, sigma=np.exp(v[0]))

There’s a pm.RandomWalk that accepts arbitrary innovation distributions, but I doubt you want a straightforward Beta random walk where innovations are just Betas, as that will be something that will drift outside of [0, 1].

There was a model sometime ago, where each step was a posterior update, and if your model is similar it still isn’t easy to implement in PyMC: Behrens' Bayesian Learner Model : How to share parameters between steps? - #45 by ricardoV94

If on the other hand your model is just a simple sequence of interrelated points generated from betas (as opposed to interraleted full-blown distributions) you could implement it with a CustomDist with a Scan these days (just make sure to pass a logodds transform): PyMC_timeseries_AR.ipynb · GitHub

1 Like

Thanks @ricardoV94 !

After fixing ri using your suggestion, the model now correctly runs.
No divergences and a few minutes of sampling for T=300 and 1000+1000 samples.

There was a model sometime ago, where each step was a posterior update, and if your model is similar it still isn’t easy to implement in PyMC: Behrens’ Bayesian Learner Model : How to share parameters between steps? - #45 by ricardoV94

Yes, it is exactly the same model!!
I didn’t expect to find it here :slight_smile:
I’ll have a look at the thread and see if I can implement also the “Beta random walk” as reported in the paper.

I had actually implemented the original model in PyMC using a naive approach with a for loop for both the v and r variable but the result was too slow. With T=300, creating 300 different v and 300 different r resulted in a “build time” (the time between the pm.sample invocation and when the sample process actually start) to become larger than 48 hours… a bit too long :slight_smile:
It was also complaining a lot about divergences (but I hadn’t done parameter tuning, so maybe that was the problem).