Theano shared variable error

I have a problem that has been totally eating me up all day. I believe there is a simple solution to the question…

If I specify a theano shared variable for use in a pymc3 model so that i can loop without re-specifying the model, it seems the initial value of the shared variable effects the result. If the initial value is too far from the new value, I get a “The derivative of RV test1.ravel()[0] is zero.” error. Why is this??

See my code below. changing the nf value to 10 or change the new value to 100000 would fix the error…

def test():

nf = 10000000000000000000
sharednf = shared(nf)

with pm.Model() as model:
    test = pm.Normal("test"+str(1), mu=sharednf, sd=1**100)
    sum = test + 1
    pm.Deterministic("sum"+str(1), sum)
for i in range(10):
    with model:
        trace = pm.sample(5000)
        trace1 = trace

It looks like this GitHub issue. I don’t know the reason behind it, but it looks like modifying the priors or using init="adapt_diag" have solved the issue in some similar situations.

Another option would be to stop using GitHub master and install 3.8 version instead if you do not need any recently added feature.

Thank you for getting back to me! These modifications did sadly not make a difference… I am already using pymc3 3.8. :frowning: other suggestions?

PyMC3 assigns a default value to each Random Variable, for Normal distribution it is the mean. So in this case the underlying default value is sharednf, when it is super huge and the new set_value is too far away from it, the gradient is essentially 0.
There are a few way to fix it:
1, providing starting value to pm.sample:

pm.sample(..., start={"test"+str(1): float(1)})

2, non-center parameterization

with pm.Model() as model:
    test = pm.Normal("test"+str(1), mu=0., sd=1**100)
    sum = test + 1 + sharednf
    pm.Deterministic("sum"+str(1), sum)

Hi! thank you so much for getting back to me! I see the root cause of the problem now. However, setting start value = new shared variable value is not doing the trick. Am i misunderstanding something?

Maybe there is error cause by caching? Try clearing out your theano cache…

i tried that without success. this is what i did:
trace = pm.sample(5000, start={“start”+str(1):float(sharednf.get_value())})

if the new and old values of the variable are far apart, the gradient is zero. ok. But if the variables are little enough apart for the sampler to converge, the resulting mean is still dependent on how far the variables were apart. Why is there a correlation here? I am really not able to use the theano shared variables the way i am hoping to :frowning:

Using the second suggested solution by non-center parametrize the model, I have no way of introducing new shared varibles for use in the sd? I was hoping to loop through a dataset with means and sds by the help of theano shared.

start needs to be a dictionary-like object that each str name correspondent to the random variable name, ie pm.sample(..., start={"test1": float(1)})

Likely not enough burning/tuning, increase the tuning should help

location-scale distribution are all be able to reparameterized like rv_you_want = unit_rv*sd + mu. A bit more information could be find in: Statistics (scipy.stats) — SciPy v1.7.1 Manual

1 Like

awesome! thank you so much for getting back to me!!

I have a quick followup question. Can one use theano shared variables to change a distribution?

From what I have understood, theano shared can only be used with np arrays.
Is there a way to do something like:
distList = [pm.Normal, pm.Lognormal,…]
sharedVar = theano.shared(0)
with pm.Model as model:
distList[sharedVar](“name”, mu=0, sd=1)

for i in range(10):
    shareVar = i
    with model:

This would allow the change of what distribution is used inside the “loop”

I dont think so - we cannot index a list with theano variables

ok. thank you. I have now implemented the method of non center parametrization as suggested. I am just curious, Why is it so that doing

pm.Normal(“test”, mu=1e-10, sd=1e-20)

gives a “derivative is zero” error for the sampling, while shifting a the distribution by using the below works? As far as I can tell, the resulting distributions are equal.

test = pm.Normal("test", mu=0, sd=1)
shift = test*1e-20 + 1e-10
shifted = pm.Deterministic("shifted", shift)

Yeah, the reparametrization trick feels like :star2: magic :star2:
Basically, it’s because it presents HMC with a much better geometry to sample from. Here is a blog post where it’s explained in details.

1 Like