Meaning of various posterior predictions with multiple observations (Bayesian Network)

dhajnes · September 15, 2021, 1:52pm

Hello,

I have a question about inference from a model with multiple observations. I am modelling a Bayesian Network, where all observations should have a say in the weights of a Dirichlet’s distribution.

What I expected:

I expected to get a posterior prediction of the material category (‘matCat’) and density (since I have mass and volume measurements, where mass has no logical real-life reference, volume has at least some real life reference, thus the “VOLUME_MUS” and “VOLUME_SIGS”). Mass and Volume are measurable, but density is not directly observable (in a sense that there is no easy measurement of density).

What I got:

I got N posterior predictions for mass where N is the number of observation points for volume (here 5, i.e. I made 5 consecutive measurements, that are normally distributed around the true measurement value).

I am trying this out on something that is inherently an MVE, so I can share the code:

from scipy.stats import norm
import numpy as np
import pymc3 as pm
import matplotlib.pyplot as plt

VOLUME_MUS = np.array([10000, 15000])  # cm^3
VOLUME_SIGS = np.array([100, 100])

DENSITY_MUS = np.array([0.1, 0.3])  # g/cm^3
DENSITY_SIGS = np.array([0.3, 0.25])

ob = norm.rvs(loc=1000, scale=100, size=5)  # observation for mass
ob_vol = norm.rvs(loc=10000, scale=1000, size=5)  # observation for volume

with pm.Model() as mini:
    mat_cat = pm.Dirichlet('matCat', np.ones(2))
    volume = pm.NormalMixture('volu', w=mat_cat, mu=VOLUME_MUS, sigma=VOLUME_SIGS, observed=ob_vol)
    density = pm.NormalMixture('dens', w=mat_cat, mu=DENSITY_MUS, sigma=DENSITY_SIGS)
    
    mass = pm.Deterministic('mass', volume * density)
    obsv_mass = pm.Normal('obs_mass', mu=mass, sigma=100, observed=ob)

    start = pm.find_MAP()
    step = pm.Metropolis()
    trace = pm.sample(1000, start=start, step=step)
    burn_in = 500  # skips the n iters, where the MCMC is just wandering
    chain = trace[burn_in:]
    ppc = pm.sample_posterior_predictive(chain, var_names=['matCat'], model=mini)
    names = ["A", "B"]
    for i, samp in enumerate(ppc['matCat'].transpose()):
        prob = np.mean(samp)
        print("Prob for {} is {}.".format(names[i], prob))

    pm.traceplot(trace)
    pm.plot_posterior(chain)
    pm.autocorrplot(chain)
    plt.show()

The mentioned 5 mass estimations are shown here:

I do not want to estimate the masses again. I have measured mass and also volume, I just want those two measurements to be appropriately represented in density.

Would you be so kind and advise me on how to understand what PyMC3 is reporting and why is that happening? Am I missing something obvious, or am I using the inference in a wrong way?

If I change the number of data points for either of the observation like this:

ob = norm.rvs(loc=1000, scale=100, size=5)
ob_vol = norm.rvs(loc=10000, scale=1000, size=15)

the PyMC3 crashes with:

File "pymc3-playground.py", line 38, in <module>
    obsv_mass = pm.Normal('obs_mass', mu=mass, sigma=100, observed=ob)
.
.
File "/home/robot3/.local/lib/python3.6/site-packages/theano/gof/cc.py", line 1845, in __call__
    raise exc_value.with_traceback(exc_trace)
ValueError: Input dimension mis-match. (input[0].shape[0] = 5, input[1].shape[0] = 15)

So it clearly does some funky stuff with moving the mass predictions around, because the number of observation datapoints must match…

Thank you for any insights,

Andrej

PS, this is how the traces look, it is obvious, that the mass is just moved around, the shape is not changed much, or at all:

dhajnes · March 4, 2022, 3:57pm

The problem was with using pm.Deterministic(). I chose a different solution, where the mass is not an observation, but a free variable. Samples of mass are then divided by volume samples to yield density distribution. Those density samples are then used for observation of density directly.

Without pm.Deterministic() this strange problem went away.

Topic		Replies	Views
Predicting from a model with multiple observed variables? Questions	4	4014	October 9, 2018
Bayesian network with multiple observed variables Questions	3	1083	March 1, 2022
Building a bayesian network with pymc v5 modeling	0	366	May 24, 2024
How to get posterior predictive distribution sample data for a single prediction? v5	1	129	November 5, 2024
Bayesian Inference with Physical Models Questions	1	709	March 21, 2019

Meaning of various posterior predictions with multiple observations (Bayesian Network)

What I expected:

What I got:

Related topics