Hard evidence in PyMC during model testing

Andrey · August 22, 2020, 4:14pm

Still struggling with what is possible with PyMC and what is not…
If I have a linear model with multiple likelihood-nodes, which I want to test on a new observation.
It happens so that I know the true value of one of the observed nodes, so I would like to use that value as hard evidence for other nodes calculation.

is it possible not to sample from the posterior for particular nodes, if their evidence is provided? Like in Bayes nets with categoricals and CPTs I can “freeze” any node by providing hard evidence and make one of the states to have P=1, and all others 0.
Is it possible to update the model with new knowledge? For instance, I test the model on a new observation and want the model to be updated at the same time. Or is retraining the model from scratch the only option?

ckrapu · August 22, 2020, 10:49pm

is it possible not to sample from the posterior for particular nodes, if their evidence is provided?

Correct - if we have a random variable, then once we observe it, it is no longer a free variable and we cannot sample its distribution. Specifying observed variables is the same thing as providing an observed value to a Bayes net.

Is it possible to update the model with new knowledge?

Generally speaking, this is not one of PyMC3’s strongest points. It does not have built-in capacity for adaptively updating a Bayesian posterior with one data point at a time, in the same way that a hierarchical linear model with conjugate priors can do it.

Andrey · August 23, 2020, 9:23am

That is what happens during training the model:

pred1,..., data_resp1, data_resp2 = pm.Data(...), ..., pm.Data(...)
...
mu_resp1 ~ linear(pred1,...)
resp1 = pm.Normal('resp1', mu=mu_resp1, sd=error_std_resp1, observed=data_resp1)
...
mu_resp2 ~ linear(resp1,...)
resp2 = pm.Normal('resp2', mu=mu_resp2, sd=error_std_resp2, observed=data_resp2)

Then during testing I have an observation= {'pred1':p1,...'resp1':R1} which means I know true value for resp1 and want to use that to model resp2. It may sound strange to treat it now as Deterministic, but the rationale is that I want to create a model with many likelihoods (similar to Bayes net in my understanding of it), covering several use-cases at once (different test cases may contain different known variables).

I substitute all pm.Data() in the model and run post_pred = pm.sample_posterior_predictive().
post_pred['resp1'] still gives me a distributions, which, I think, is used internally by the model, not my value R1.

Thanks, now I know that for sure. I am confused by functionality behind different tools. What are the main differences of PyMC3 and alike (Stan?) from what is implemented by Netica Software (pomegranate?)?
The latter, I guess, is called Bayes Net and it supports only categorical values, requires CPTs, but there are ways to deduct network structure and CPTs from a large enough dataset.
PyMC operates continuous distributions, model structure needs to be defined manually through many iterations, doesn’t learn on the fly…

Topic		Replies	Views
Adding node to pymc3 model breaks sampling Questions	1	596	February 1, 2018
Observed data in Bayesian networks Questions	13	2670	July 6, 2021
Sampling problem of discrete parent nodes for continuous nodes v5 modeling , sampling	2	13	October 17, 2024
Random variable as observation Questions	9	1426	January 20, 2023
Alternatives to Bayesian updating for non-shareable data? v5 modeling	17	535	September 25, 2023

Hard evidence in PyMC during model testing

Related topics