Theano.gradient(beta_0, alpha (beta samples are from a beta_dist which takes as input alpha)) returns DisconnectedInputError

theano

#1

13%20PM

I am unable to understand the DisconnectedInputError.

Clearly the pdf of beta_0 is a function of alpha, a gradient should be possible.
I might just be missing some knowledge here, and this could be fine.

Looking for help to understand this.


#2

RVs in the pm.Model is not exactly a tensor, if you want to get the gradient of the logp, you should do: theano.grad(beta_x.distribution.logp(.5), alpha_x)


#3

A follow up question, on the same model but a different question.

When I am taking the gradient of logpt with respect to alpha, it throws an DisconnectedInputError.
The logp of the observed variable is dependent on the value of alpha, as I understand.
Can you help me figure what I am missing ?

Thanks


#4

Did you try to turn transform=None? Also try setting disconnected_inputs='warn' in theano.grad.


#5

hmm, it seems to work on simpler models like the one in http://docs.pymc.io/notebooks/gaussian-mixture-model-advi.html

dy = theano.grad(xs.logpt, pi)
f = theano.function(model.free_RVs, dy)
f(np.array([0., 0.]), np.array([5., 0.]), np.array([.5]))

# array([153.57588194,  11.66830386])

#6

I set disconnected_inputs=‘warn’ and the log says – the computation (obs.logpt) is non-differentiable wrt to alpha.

Where did you want me to set the transform=True flag?


#7

I mean something like: alpha = pm.Gamma('alpha', 1., 1., transform=None)

A bit more explanation: in PyMC3, bounded variables are transformed into real line (-inf, inf) automatically, and the node in the actual computational graph is the unbounded one (the one sampler “sees”), the bounded (original RV you specify) is in the trace just for bookkeeping.
For alpha ~ Gamma, the one in the graph is actually log(alpha) alpha_log__, which means that when you are taking gradient regarding alpha it might not work (but gradient regarding alpha_log__ should be fine).


#8

But in your case, I dont think that is the reason… just tried on a similar model, with stick-breaking i also can not take gradient with regard to alpha…


#9

Hmmm seems the logp of obs is not depend on alpha, but the model logp is:

theano.grad(obs.logpt, alpha) --> error
theano.grad(model.logpt, alpha) --> is fine

#10

That shouldn’t be happening right ? The gradient wrt to the model should be the same as the observed variable.


#11

Not necessary, for example if you add Potential to the model logp.

But in this case it should be. The only reasoning I would give is again the transformed (ie., only the transformed RVs are on the grad and differentiable, but not the deterministic nodes)


#12

You can do theano.printing.debugprint(model.logpt) to check whether alpha is part of the graph

[Edit]: actually, using inputs to check is easier:

from theano.gof.graph import inputs
inputs([xs.logpt])
inputs([model.logpt])

#13

So printing out the input shows, that alpha is not an input to the observed variable, but it is an input to the model. Why is there this disconnect? The observed variables in this case, should be connected the alpha__ node on the theano graph, right?


#14

They are disconnected because for the logp of a RV only takes input from the parent but not the parent of the parent.
You can see that the outputs are different for

obs.logp(model.testpoint)
model.logp(model.testpoint)

Because the prior of obs is not counted in obs.logp