Observed Deterministic

Hi there :)
As the title says, I’m looking for a way to build a model where the observed variable is a combination of several latent ones, and the uncertainties correspond to those latent variables. To give a very simple example, suppose I measure values a_i with some uncertainties but observe only differences a_i - a_0. So I would like to represent this as a = pm.Normal(..., shape=n), a_diff = pm.Deterministic(a[1:] - a[0], observed=data) - but currently there is no such possibility. The only similar thing I found is something like a_diff = pm.Normal(mu=a[1:] - a[0], sd=0.001, observed=data) and it works, but sampler has difficulties for small sd (as expected), and sd value itself has to be specified.
So, what do you think about this?

1 Like

I don’t think there is an easy way to model it in PyMC3, as this class of problem falls into Likelihood-free inference. To model it in PyMC3 you need to find a way to express the likelihood.
For example, in the case above if a_i and a_0 are both have a Normal distribution, say a_i ~ N(mu_i, var_i) and a_0 ~ N(mu0, var0), then a_i - a0 ~ N(mu_i-mu0, var_i+var0). You can try using Sympy to work it out.
If a close form likelihood is difficult to find, you need some likelihood-free inference approach (approximate bayesian computation, etc) like https://github.com/elfi-dev/elfi

[Edit] You can also use SMC-ABC introduced in https://github.com/pymc-devs/pymc3/pull/3404

Well, currently I do basically this (like a_i - a0 ~ N(mu_i-mu0, var_i+var0)), just hoped that maybe there is a way to specify the model as it is. And what’s the deal-breaking difference between a_diff = pm.Normal(mu=a[1:] - a[0], sd=0.001, observed=data) (which is possible) and a_diff = pm.Deterministic(a[1:] - a[0], observed=data) (which isn’t)? I thought that they would require basically the same processing.

Although they seem similar, in practice associating observation to a deterministic node is really difficult. In PyMC3 a deterministic node is meant to record a deterministic mapping $f(x_1, x_2, …) \to y$, and if y is observed we need to find the likelihood of $\pi(y | x_1, x_2, …)$. This is something difficult to do besides a few limited cases, as we need to do a jacobian correction to keep the volume constant before and after the deterministic mapping.

Not to drive away the focus of the discussion but what are some of the draw backs of likelihood-free inference approaches?

Hi,

I was just wondering if this has changed since this was raised.

I have a similar situation as the original poster.

Thanks in advance!

I wouldn’t expect much progress in this area, as the limitation is not that pymc3 has no implementation of observed deterministics. Rather, no good algorithm has been found to do this.

To expand on @junpenglao’s comment, the issue lies in the fact that saying

pm.Deterministic('F', f(x1, x2, x3, x4), observed=x_obs)

corresponds to the likelihood

\prod_{i=1}^N \int \int \int \int \mathbb{I}[f(x_1, x_2, x_3, x_4) = x_{obs}^{(i)}]P(x_1,x_2,x_3,x_4)dx_1dx_2dx_3dx_4

which is a nasty high-dimensional integral. This comes with the added bonus that the event set is typically of measure-0, even in the simplest case of f(x_1, x_2) = x_1 + x_2. This corresponds to a line x_2 = x_{obs}^{(i)} - x_1 which will be of 0 volume in \mathbb{R}^2, and thus impossible to sample from. So we’d need some way of performing high-dimensional integration on an arbitrary implicit surface. Naturally this problem is a great deal harder than sampling from a likelihood.

@aplavin’s comment, in this light, is even more striking. Why is it that, in the above likelihood, the parameters of P(x_1, \dots) are in general intractable, while using

x_{obs} \sim N(f(x_1, x_2, x_3, x_4), \sigma_e^2)

Or, equivalently, replacing

y = f(x)

with

y = f(x) + \epsilon

(as an aside: this is the entry-point into everything Gaussian Process). After all the situation is barel
y any different as now:

P(x|y) = \int \int \int \int Z_\sigma e^{(y-f(x))^2/(2\sigma^2)}P(x_1,x_2,x_3,x_4)dx_1dx_2dx_3dx_4

The short answer is that \epsilon provides a volume, so that there is no longer a measure-0 constraint set, making the integral amenable to standard numerical integration (such as MCMC).

4 Likes

A post was split to a new topic: Decomposition of rates?

Thank you @chartl, this is exactly the explanation I was hopping for. So, in summary, we can assume model - obs = N(0, sigma), and

with pm.Model() as model:
    x1 = pm.Uniform('x1', 0, 1)
    x2 = pm.Uniform('x2', 0, 1)

def fun(x = x1, x2 = x2):
    return x1 + x2

with model:
    modeled_mu = pm.Deterministic('modeled_mu', fun())

with model:
    obs = pm.Normal( "obs",  mu=modeled_mu, tau=1, observed=observed_data)

because we are assuming that the likelihood is gaussian and we are only modeling mu (we could also model tau). Is this correct?

Thank you!

I still don’t see why this makes calculating the likelihood impossible.

Say I have observations y_i, “latent random variables” x_i (deterministic w.r.t y_i) and hyperparameter alpha that I’m inferring. y = f(x) and x \sim N(\alpha, 1).

The likelihood (of observations y_i w.r.t alpha) is easy to define: L(\alpha) = f_{N}(f^{-1}(y_i) - \alpha) where f_N is standard normal. As long as you can invert the deterministic relation, you can calculate the likelihood here. If you can’t invert it then you’re likelihood-free (e.g. if you need to run a physical simulation to obtain y_i from x_i).

I guess the user would need to provide a deterministic relation with its inversion (or it could be automatically inverted if it’s based on simple operations).

But in terms of doing MCMC/NUTS, what blockers would there be for this? You would be able to calculate gradietns w.r.t alpha (you have gradients for p(x_i | \alpha) w.r.t alpha and for y w.r.t x_i, you can calculate likelihood etc…

I think at the moment there’s a whole host of problems in inferring certain types of stochastic processes that are difficult or impossible with pymc, which would become very easy if we could have deterministic relations to observed variables.

Maybe I’m missing something, but if the inversion is trivial, wouldn’t it just be easier to invert the observed y, and skip the deterministic altogether?

Yes, I tried to come up with simple example but now I realize that it makes it too trivial and meaningless. I’ll come back in a bit with a better example sorry.

The real, non-simple example where it would be useful is where you have multiple random variables that deterministically produce one observed variable through complicated function, but now I think about it a bit more, this is precisely where the likelihood becomes hard to write down.

So ignore my post.

1 Like

Here is perhaps a good example: Exponential likelihood at different location

And this is something we can already do (for some simple invertible transformations) in PyMC but it’s not yet documented or packaged for users.

import pymc as pm

with pm.Model() as m:

  x = pm.Normal("x")

  # This is kind of like
  # y = pm.Deterministic("y", pm.Exponential.dist() + x, observed=5)

  y = pm.Exponential.dist() + x
  m.register_rv(y, name="y", data=5)
1 Like