Stepwise Sampling of Multiple Observed Variables

nmwitzig · November 8, 2023, 1:35pm

Hi everyone!

I’m trying to do something fairly “simple”. Imagine I observe a process generating y2 and a sub-process which generates y, where y2 = f(y) I have an assumption about how y comes into place (i.e., I have a model of y) and how y2 and y are related (i.e., about f()).

I want to sample the parameters of the function that constitute y and those that make up the connection of y and y2. So far I just call multiple sample statements with obs= (which works for simpler models) but Im wondering how to approach this properly.

One idea would be the sample first from the posterior of the parameters of y (maybe in a separate call) or try to formulate a prior based on the empirical posterior samples of the parameters of y but I’m a bit lost right now.

This can be set up quite quickly (and as I said works), but for more complex models the chains dont converge.

Here is my MWE:

x = np.random.normal(0, 1, 1000)
m = 5.
b = 2.
y = m*x + b + np.random.normal(0, 1, 1000)

scale = np.random.normal(0, 1, 1000) + 5
scale


y2 = y*scale

with pm.Model() as model:
    m = pm.Normal('m', mu=0, sigma=1)
    b = pm.Normal('b', mu=0, sigma=1)

    error = pm.HalfNormal('error', sigma=1)

    scale = pm.HalfNormal('scale', sigma=1)

    y = pm.Normal('y', mu=m*x + b, sigma=error, observed=y)

    y2 = pm.Normal('y2', mu=y*scale, sigma=1, observed=y2)

    trace = pm.sample(1000, tune=1000, cores=4, chains=4, init="adapt_diag", return_inferencedata=True)

pm.summary(trace)

I appreciate your time and help! Thanks!

cluhmann · November 8, 2023, 1:47pm

If I understand correctly, you are already on the right track. Having multiple observed variables is conventional and if one contributes to the generative process of another, that’s fine. If you are having convergence problems, my guess is that the issue lies elsewhere (but that’s a guess). Do you have an example of a model with multiple observed variables that doesn’t converge?

nmwitzig · November 9, 2023, 7:12am

Hi @cluhmann thanks for reaching out!

It took me a while to come up with something more complicated, and in the end, I just decided it’s best to work with a setting as close as possible to my data. I simulated some data and applied my routine, which you can find in a gist here: https://nbviewer.org/gist/nmwitzig/3fa7617df49220f4e22cb02f2dea4268

We see that even with simulated data, the chains don’t converge and do not fully recover the parameters of the first stage (which I called the “performance measurement phase”); What makes things worse is probably that I observe only a single observation per person in the second stage (which I try to predict based on the performance measures).

Let me know if you need more! I know it’s a bit complicated setup in the Gist, but the modeling part in itself is (hopefully) quite straightforward.

Thank you for your time and patience!

Best
N

Topic		Replies	Views
Predicting from a model with multiple observed variables? Questions	4	4025	October 9, 2018
Sampling semantics of multiple observed variables Questions	5	2567	April 25, 2019
Multiple Obseravations Correlated Data v3 modeling	0	341	April 5, 2022
How Does the Sampler Use Covariate Observed Data? version agnostic	9	292	March 15, 2024
How to inference with interrelated observations version agnostic modeling	13	63	October 15, 2024

Stepwise Sampling of Multiple Observed Variables

Related topics