How Does the Sampler Use Covariate Observed Data?

hayfreed · March 5, 2024, 12:22pm

Hi everyone,
I am trying to understand the way that PyMC sampler draws from an observed random variable acting as a covariate in a regression. I am still a beginner in statistical modeling so I apologize if there are any terminological issues in my question.
Let’s say I have the following model, where I want to put priors (x_mu, x_sigma) on the observed data of the covariate (x) in order to account for potential measurement errors:

x_mu = pm.Normal("x_mu",0,1)
x_sigma = pm.HalfNormal("x_sigma",1)
x = pm.Normal("x",x_mu,x_sigma,observed=x_observed)

a = pm.Normal("alpha",0,1)
b = pm.Normal("beta",0,1)

y_mu = pm.Deterministic("y_mu",a+b*x)
y_sigma = pm.HalfNormal("y_sigma",1)
y = pm.Normal("y",y_mu,y_sigma,observed=y_observed)

My understanding is that, depending on the implementation, the sampler could:

use the observed data from X directly to compute the posterior of Y, essentially ignoring the mu and sigma terms for the downstream inference, while separately computing the posterior of X from the given prior/observed data combo. In this scenario, the posterior of X is the same regardless of the values of Y.
jointly compute the posterior distributions of both X and Y based on their observed data/priors. In this scenario, the posterior of X depends on the values of Y.
do something else that I have not understood.

While I have generally assumed that #2 is what is happening under the hood, I have empirically observed that when I run a model like the below code (computing X without actually doing the regression), the posterior of x is the same as in the model above which includes the regression, which to me indicates that something closer to #1 is happening under the hood.

x_mu = pm.Normal("x_mu",0,1)
x_sigma = pm.HalfNormal("x_sigma",1)
x = pm.Normal("x",x_mu,x_sigma,observed=x_observed)

If someone could help me to understand, or let me know whether my question is completely missing the point, it would be great! Also happy to read whatever documentation already exists that might give me a clue.

cluhmann · March 5, 2024, 2:24pm

In general, observed variables and unobserved variables are treated differently. In a standard regression set up, you would have a and b and y_sigma as your parameters (not directly observed) and you would take x and y as observed:

# observed data
x = [17, 7, 2, 21, 42]
y = [21, 23, 45, 31, 101]

# model parameters
a = pm.Normal("a")
b = pm.Normal("b")
y_sigma = pm.HalfNormal("y_sigma", 1)

# likelihood
likelihood = pm.Normal("likelihood", a+(b*x), y_sigma, observed=y_observed)

If you model x_mu and x_sigma, then you are doing something closer to an error-in-variables model, which is a thing, but is probably not what you’re trying to do.

iavicenna · March 5, 2024, 3:59pm

I am not sure how the sampler uses the observed value in conjunction with regression downstream but one suggested way to add noise to such a variable is a bit different than what you are doing. It is to create a latent variable, see for instance the answer here:

So in your case it would be:

import pymc as pm
import arviz as az

x_observed = [17, 7, 2, 21, 42]
y_observed = [21, 23, 45, 31, 101]

with pm.Model() as model:
  x_mu = pm.Normal("x_mu",0,1)
  x_sigma = pm.HalfNormal("x_sigma",1)
  x_err = pm.HalfNormal("x_err", 1)
  x_lat = pm.Normal("x_lat", x_mu, x_sigma)
  x = pm.Normal("x",x_lat, x_err, observed=x_observed)

  a = pm.Normal("alpha",0,1)
  b = pm.Normal("beta",0,1)

  y_mu = pm.Deterministic("y_mu",a+b*x_lat)
  y_sigma = pm.HalfNormal("y_sigma",1)
  y = pm.Normal("y",y_mu,y_sigma,observed=y_observed)

  trace = pm.sample()

The posterior I get from here is
test1

where as if I run this

with pm.Model() as model:
  x_mu = pm.Normal("x_mu",0,1)
  x_sigma = pm.HalfNormal("x_sigma",1)
  x_err = pm.HalfNormal("x_err", 1)
  x_lat = pm.Normal("x_lat", x_mu, x_sigma)
  x = pm.Normal("x",x_lat, x_err, observed=x_observed)

  trace = pm.sample()

the posterior I get is
test2

ricardoV94 · March 5, 2024, 4:55pm

You are correct that posterior sampling corresponds to #1. In general every model RV is given a value during sampling, for observed RVs this is fixed to the “observed values”.

Prior and posterior predictive sampling will still have uncertainty flowing, but not posterior where those are fixed.

The model @iavicenna described is a valid alternative, but both are fine. It’s a model choice

ricardoV94 · March 5, 2024, 4:56pm

@junpenglao talk on multivariate imputation touches on this question as well: Partial Missing Multivariate Observation and What to Do With Them by Junpeng Lao

iavicenna · March 5, 2024, 5:03pm

Interesting to know that the model where observed RVs are used for x is a valid choice and just replaces the input x, never thought of it before from that perspective. Updated my answer from “the suggested way” to “one suggested way”

hayfreed · March 6, 2024, 1:59pm

This was super helpful - thanks everyone who offered some help!

hayfreed · March 15, 2024, 10:46am

Hi @ricardoV94, thanks again for your helpful response here. I have a quick follow up question if you have a moment…what if the co-variate X has missing values? My assumption is that the non-missing values of X will be passed into the regression as deterministic scalars, and the missing values will be passed as distributions based on the prior put on X, but I wasn’t sure and haven’t figured out how to test this empirically yet.

ricardoV94 · March 15, 2024, 12:00pm

Yes that’s what should happen. I suggest writing a very simple model and looking at model.to_graphviz to confirm

hayfreed · March 15, 2024, 12:01pm

Thanks!

Topic		Replies	Views
Linear model where predictor also being sampled? Questions	1	551	September 7, 2018
Using a random variable as observed Questions	19	4027	October 20, 2023
Is there any material gently explain how PYMC3 class/function works Questions	2	637	June 29, 2019
Posterior values in factored pdf v5 modeling	7	483	September 8, 2022
Sample from posterior predictive after fixing latent varaibles Questions	1	582	October 23, 2018

How Does the Sampler Use Covariate Observed Data?

Related topics