Observed Deterministic

I wouldn’t expect much progress in this area, as the limitation is not that pymc3 has no implementation of observed deterministics. Rather, no good algorithm has been found to do this.

To expand on @junpenglao’s comment, the issue lies in the fact that saying

pm.Deterministic('F', f(x1, x2, x3, x4), observed=x_obs)

corresponds to the likelihood

\prod_{i=1}^N \int \int \int \int \mathbb{I}[f(x_1, x_2, x_3, x_4) = x_{obs}^{(i)}]P(x_1,x_2,x_3,x_4)dx_1dx_2dx_3dx_4

which is a nasty high-dimensional integral. This comes with the added bonus that the event set is typically of measure-0, even in the simplest case of f(x_1, x_2) = x_1 + x_2. This corresponds to a line x_2 = x_{obs}^{(i)} - x_1 which will be of 0 volume in \mathbb{R}^2, and thus impossible to sample from. So we’d need some way of performing high-dimensional integration on an arbitrary implicit surface. Naturally this problem is a great deal harder than sampling from a likelihood.

@aplavin’s comment, in this light, is even more striking. Why is it that, in the above likelihood, the parameters of P(x_1, \dots) are in general intractable, while using

x_{obs} \sim N(f(x_1, x_2, x_3, x_4), \sigma_e^2)

Or, equivalently, replacing

y = f(x)

with

y = f(x) + \epsilon

(as an aside: this is the entry-point into everything Gaussian Process). After all the situation is barel
y any different as now:

P(x|y) = \int \int \int \int Z_\sigma e^{(y-f(x))^2/(2\sigma^2)}P(x_1,x_2,x_3,x_4)dx_1dx_2dx_3dx_4

The short answer is that \epsilon provides a volume, so that there is no longer a measure-0 constraint set, making the integral amenable to standard numerical integration (such as MCMC).

6 Likes