Unobserved variables in blackbox likelihood

I’ve been using this example to implement a “blackbox” likelihood function, but I am having trouble trying to extend it in the following way.

In the linked example, two parameters (m and c) are defined with priors and then converted to a theano tensor. This is then given as the “observed” argument to DensityDist():

pm.DensityDist('likelihood', lambda v: logl(v), observed={'v': theta})

Suppose I have a variable called l whose prior I do not want to update during sampling (I think this is called an unobserved variable, but I could be wrong here). The variable l is a normally distributed RV and I know the mean and the standard deviation, and I want to sample from this distribution in the likelihood blackbox function, but I don’t want its posterior updated as the sampler runs. How can I expand the example in the link to include this?

Please let me know if my question is unclear and thanks for your help!

Best way is to use a rand_stream from theano and put it in your Density function logl:

rand_stm = tt.shared_randomstreams.RandomStreams()
def logl(v):
    ...
    I = rand_stm.normal(
            avg=mean, std=scale) #  <== this is a stochastic node
                                 #      that theano sample its value
                                 #      from N(mean, scale) at each call

Thanks! One issue though: I can’t seem to use I in that form to do math (I get that I is a “RandomFunction{normal}.1”, not a number). I’ve tried adding “.eval()” to the definition of I you provided but that seems to slow things down extremely.

what kind of math you want to do? well maybe:

rand_stm = tt.shared_randomstreams.RandomStreams()
def logl(v, l):
    ...
with pm.Model() as m:
    ...
    I = pm.Deterministic('I', rand_stm.normal(avg=mean, std=scale))
    pm.DensityDist('likelihood', logl, observed={'v': theta, 'l': l})

Essentially, I’m trying to include an uncertainty in an independent variable (\lambda) that is used in the calculation of my likelihood: P(d|\theta) \propto exp(-\frac{||d−f(\theta, \lambda)||^2}{σ^2}). The function f is a bit gnarly so I’ve followed the “black box” likelihood example, and my code looks similar to that of the example:

# create our Op
logl = LogLikeWithGrad(my_loglike, data, x, sigma)

# use PyMC3 to sampler from log-likelihood
with pm.Model() as opmodel:
    # uniform priors on m and c
    m = pm.Uniform('m', lower=-10., upper=10.)
    c = pm.Uniform('c', lower=-10., upper=10.)

    # convert m and c to a tensor vector
    theta = tt.as_tensor_variable([m, c])

    # use a DensityDist
    pm.DensityDist('likelihood', lambda v: logl(v), observed={'v': theta})

    trace = pm.sample(ndraws, tune=nburn, discard_tuned_samples=True)

My intuition is to just add another variable after the definitions of m and c, but I’m not sure how to pass this to the likelihood function while specifying that its prior should not be updated.

Hmmm, at this point I am a bit worry that I misunderstand what you mean - in a sense, the prior are not being updated, what you are getting is the posterior, which is the parameter space that set up by your prior and likelihood. The solution I provide is essentially to introduce a stochastic node into your model, it is not a random variable, meaning that information from your data does not flow towards this variable, and it mainly introduces additional stochastic to the evaluation of your function.

Ah, yes, I guess I should have said posterior there. Sorry for the imprecision in my language.

In any case, what you describe sounds exactly like what I am trying to achieve:

…to introduce a stochastic node into your model, it is not a random variable, meaning that information from your data does not flow towards this variable, and it mainly introduces additional stochastic to the evaluation of your function.

So again using the “black box” likelihood code as an example, what I had done to try to implement your original suggestion was something like the following, though this throws an error because l is not a number:

rand_stm = tt.shared_randomstreams.RandomStreams()
# create our Op
logl = LogLikeWithGrad(my_loglike, data, x, sigma, rand_stm)

# use PyMC3 to sampler from log-likelihood
with pm.Model() as opmodel:
    # uniform priors on m and c
    m = pm.Uniform('m', lower=-10., upper=10.)
    c = pm.Uniform('c', lower=-10., upper=10.)
    
    # convert m and c to a tensor vector
    theta = tt.as_tensor_variable([m, c])

    # use a DensityDist
    pm.DensityDist('likelihood', lambda v: logl(v), observed={'v': theta})

    trace = pm.sample(ndraws, tune=nburn, discard_tuned_samples=True)


...

def my_loglike(theta, data, x, sigma, rand_stm):
    model = my_model(theta, x, rand_stm)
    ...
    return loglikelihood_value
...
def my_model(theta, x, rand_stm):
    m, c = theta
    l = rand_stm.normal(avg=mean, std=scale)
    out = ...(math with m, c, and l)...
    return out

Further, I’m not really sure how to try the suggestion in your second post, as it is different in syntax than the “black box” example, which looks like
logl = LogLikeWithGrad(my_loglike, data, x, sigma)
instead of
def logl(v): ...

Again, thanks for your help.

Okay so I think I may have figured out how to include this value as a pm.Deterministic('l', rand_stm.normal(avg=mean, std=scale)), but now I’m getting an error of:

<__main__.LogLikeWithGrad object at 0x7f3e9c2b0950> returned the wrong number of gradient terms.

I assume the number of gradient terms should correspond to the dimension of the parameter space I’m working with. I’ve checked that this is the case so I’m a little confused about what is wrong here. A similar problem is mentioned in this post, but I’m not sure if it was resolved. I’ll keep trying to figure out what I’m missing but if anyone has any perspective on how adding this deterministic would change the implementation of the gradient that would be really helpful.

It should not be taking gradient re the deterministic, I think you should not put that in the observed={'v': theta, 'l': l}
Could you rewrite it into the format of:

pm.DensityDist('likelihood', lambda v: logl(v, l), observed={'v': theta})

In that case, you generated a closure and l would hopefully not be considered as an input that need gradient any more.

I’m still having trouble correctly passing this variable into the “LogLikeWithGrad” and “LogLikeGrad” objects. It seems that if I feed in two variables to the logl() function, I need to add another tt.dvector to the “itypes” variable (the example code is below for reference). But if I do that, I’m not sure how I’m supposed to treat that variable as it relates to my gradient. The gradient class still uses the “blackbox” loglikelihood and model functions, which requires the l variable to evaluate. I can pass l as an argument to self.logpgrad() (like theta), but then the gradient class complains about not having the right number of terms to return.

class LogLikeWithGrad(tt.Op):

itypes = [tt.dvector] # expects a vector of parameter values when called
otypes = [tt.dscalar] # outputs a single scalar value (the log likelihood)

def __init__(self, loglike, data, x, sigma):
    """
    Initialise with various things that the function requires. Below
    are the things that are needed in this particular example.

    Parameters
    ----------
    loglike:
        The log-likelihood (or whatever) function we've defined
    data:
        The "observed" data that our log-likelihood function takes in
    x:
        The dependent variable (aka 'x') that our model requires
    sigma:
        The noise standard deviation that out function requires.
    """

    # add inputs as class attributes
    self.likelihood = loglike
    self.data = data
    self.x = x
    self.sigma = sigma

    # initialise the gradient Op (below)
    self.logpgrad = LogLikeGrad(self.likelihood, self.data, self.x, self.sigma)

def perform(self, node, inputs, outputs):
    # the method that is used when calling the Op
    theta, = inputs  # this will contain my variables

    # call the log-likelihood function
    logl = self.likelihood(theta, self.x, self.data, self.sigma)

    outputs[0][0] = np.array(logl) # output the log-likelihood

def grad(self, inputs, g):
    # the method that calculates the gradients - it actually returns the
    # vector-Jacobian product - g[0] is a vector of parameter values
    theta, = inputs  # our parameters
    return [g[0]*self.logpgrad(theta)]

I’ve also tried making l a class variable in LogLikeWithGrad, but then I run into the same problem I mentioned in post #3 of this thread.

In case the above is not clear, I have a few shorter questions that might help me if answered:

  • Should my pm.Deterministic variable l be recast with tt.as_tensor_variable (like theta)?
  • Could I give the my_loglike and my_model functions access to all pymc3 variables by defining these functions within the with pm.Model() as opmodel: block? Is this advisable?

As a reminder, I am trying to do what you said in post #6 of this thread, in the context of the “black box” example.

Thanks!

No need, pm.Deterministic is already a theano tensor

I dont think so, only what you need to evaluate the log_like.

I was going to suggest this, and quite surprise that it doesnt work. But maybe I am over thinking this, how about just putting a numpy random in your LogLikeWithGrad? Make sure to validate it works.

I was going to suggest this, and quite surprise that it doesnt work.

I think this is the solution (I believe I was overthinking the issue). I’ve found that with l as a class variable, I can sample successfully using Metropolis, but NUTS is still really slow. I’ll be looking into this more but for the moment I consider the original question answered.

P.S. @junpenglao your commitment to helping people out on this forum is astounding - thanks for putting up with my confusion!!

1 Like