Probabilistic model with deterministic switch

mschmidt87 · August 4, 2022, 2:16am

Hi,
I am trying to build a simple linear regression model with a switch for the observed variable, i.e., my model looks like this

a ~ N(0, 1)
b ~ N(0, 1)
mu = a*x + b

y ~ N(mu, sigma) if x != NaN else y = 0

That is, when x is NaN, y is definitely 0. Usually, I would just filter the data samples and kick out all those samples where x is NaN but in my actual dataset, I am building a multiple linear regression model (i.e., x, y are N-dimensional vectors) and the cases where some x_i is NaN need to be used for the fitting.

Here is my simple code example:

import pymc as pm
import numpy as np
import aesara.tensor as at
import arviz as az
# Simulate some data
N_samples = 100
p_NaN = 0.1
rng = np.random.RandomState(123)
x = rng.normal(10., 2., size=N_samples)
x[rng.rand(N_samples) < p_NaN] = np.nan

a, b = 2., 1.
sigma = 2.
y = a*x + b + sigma * rng.normal(0.0, 1.0, size=N_samples)
y[np.isnan(y)] = 0.0

Naively, I would have used something like this:

with pm.Model() as model_1:
    a = pm.Normal('a', 0.0, 1.0)
    b = pm.Normal('b', 0.0, 1.0)
    sigma = pm.Exponential('sigma', 1.0)
    mu = a*x + b
    y_normal = pm.Normal('y_normal', mu, sigma)

    pm.Deterministic('y', tt.switch(np.isnan(x), 0.0, y_normal), observed=y)

However, this is not possible because we cannot sample from deterministic variables in pymc3.

So, here is my attempt using aesara.tensor.switch():

with pm.Model() as model_1:
    a = pm.Normal('a', 0.0, 1.0)
    b = pm.Normal('b', 0.0, 1.0)
    sigma1 = pm.Exponential('sigma', 1.0)
    sigma2 = pm.Exponential('sigma2', 1.0)
    mu = at.switch(np.isnan(x), 0.0, a*x + b)
    sigma = at.switch(np.isnan(x), sigma2, sigma1)
    pm.Normal('y', mu, sigma, observed=y)

    pm.sample(target_accept=0.99)

As you can see, I’m here using some construct where I switch the mean value to 0.0 if x is NaN, and also learn a different sigma value for those cases.
However, sampling from this model leads to many divergences:

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 80 seconds.
There were 767 divergences after tuning. Increase `target_accept` or reparameterize.
The chain reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.
There were 789 divergences after tuning. Increase `target_accept` or reparameterize.
There were 872 divergences after tuning. Increase `target_accept` or reparameterize.
The chain reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.
There were 789 divergences after tuning. Increase `target_accept` or reparameterize.
The chain reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.

Is there a better way to build such a model or parametrize it?

junpenglao · August 4, 2022, 6:49am

Deterministic switch is tricky during sampling with HMC/NUTS. My suggestion is to use a multiplication “trick”:

x_b = np.ones_like(x)
x_copy = x.copy()
x_copy[np.isnan(y)] = 0.
x_b[np.isnan(y)] = 0.

The result is the same as doing mu = at.switch(np.isnan(x), 0.0, a*x + b) but without the switch.
Then for sigma, dont create a new free random variable but instead set to a fix one (doesn’t matter what value as it just add a constant to the log_prob)

sigma_fixed = aesara.shared(1.)

with pm.Model() as model_1:
    a = pm.Normal('a', 0.0, 1.0)
    b = pm.Normal('b', 0.0, 1.0)
    sigma = pm.Exponential('sigma', 1.0)
    sigma_ = at.stack([sigma, sigma_fixed])
    mu = a * x_copy + b * x_b
    y_normal = pm.Normal('y_normal', mu, sigma_[np.isnan(x).astype(int)], observed=y)
    idata = pm.sample()

mschmidt87 · August 9, 2022, 1:07am

Thank you for reply, this does work, indeed. One thing I noticed is that I need to increase the number of tuning steps quite a lot, otherwise the acceptance probability is too low. Also, in your example the sample with X being NaN do have a variance of 1. If I decrease this to better approximate the true value of exactly 0, the sampling leads to very low acceptance probabilities, no matter how high I go with the tuning steps (I tried up to 10000).

junpenglao · August 9, 2022, 5:05am

Probably running into numerical issue - I think you already observed that changing that value does not change the posterior, as they are constant added to the log_prob (for your model, it is equivalent to removing the NaNs)

mschmidt87 · August 9, 2022, 5:13am

Yes, true. It is just a bit inconvenient for predicting values because the posterior prediction for values with x == NaN will have that variance added. Of course, I can manually correct for it, but it ideally, it would be nicer to not be required to do that.

Topic		Replies	Views
Combining models using switch modeling	5	796	June 8, 2023
Switch and deterministic rules Questions	2	1401	April 10, 2019
Sampling time is very long (minutes per sample) Questions	13	4474	November 2, 2017
Switch point Metropolis tuning Questions	25	4853	May 19, 2018
Switch function has little constraint in sampling v3 theano , modeling	7	359	March 24, 2023

Probabilistic model with deterministic switch

Related topics