 # The pymc3 way: Linear regression and inferring given posterior/trace

Hello!

I’m a newbie and I have a question about how to setup a problem - the" PyMC3 Way".

Let’s take a simple linear model as an example:

``````# From https://docs.pymc.io/notebooks/GLM-linear.html

from pymc3 import  *
import numpy as np
import matplotlib.pyplot as plt
import theano

np.random.seed(123)

size = 200
true_intercept = 1
true_slope = 2

x = np.linspace(0, 1, size)
# y = a + b*x
true_regression_line = true_intercept + true_slope * x
y = true_regression_line + np.random.normal(scale=.5, size=size)

x_shared = theano.shared(x)

with Model() as model: # model specifications in PyMC3 are wrapped in a with-statement
# Define priors
sigma = HalfCauchy('sigma', beta=10, testval=1.)
intercept = Normal('Intercept', 0, sd=20)
x_coeff = Normal('x', 0, sd=20)

# Define likelihood
likelihood = Normal('y', mu=intercept + x_coeff * x_shared,
sd=sigma, observed=y)

# Inference!
trace = sample(3000, cores=2) # draw 3000 posterior samples using NUTS sampling
``````

Now if I get new observations for `x`, I know that I can infer about `y` drawing samples with my model and trace:

``````x_shared.set_value(np.array([0.55]))
ppc = sample_ppc(trace, model=model, samples=500)
plt.boxplot(ppc['y'])  # Ok, cool
``````

Now lets say I want to use the trace to make an inference about `x`, given some observation for `y`
I’m not even sure how I would set this up. Do I setup a new model for this and use the existing trace or is there some clever Theano symbolic magic that I can use? Or, do I have to calc an inverse covariance matrix, use cholesky decomposition (and so forth) from scratch?

What’s the pymc3 way to approach this so that I can take advantage of the same trace (and maybe the same model) to use observations of `x` to infer about `y` and observations of `y` to infer about `x`?

Thanks much for the help! I’ve enjoyed playing with the package.

You will have to change the way you write down the model, either modelling X and Y together as generated from a multivariate Gaussian, or put prior on X.
You might find this blog post useful: https://junpenglao.xyz/Blogs/posts/2017-10-23-OOS_missing.html
A handy way to archive this is to use sampled from @colcarroll https://github.com/ColCarroll/sampled. The readme should gives you quite some information to start.

2 Likes

Thank you for the links @junpenglao. Very clever. I think that solves this for me and gives me plenty to think about.

In case anyone comes searching for this in the future, one solution is roughly:

``````from pymc3 import  *
import numpy as np
import matplotlib.pyplot as plt
import theano

np.random.seed(123)

size = 200
true_intercept = 1
true_slope = 2

x = np.linspace(0, 1, size)
# y = a + b*x
true_regression_line = true_intercept + true_slope * x
y = true_regression_line + np.random.normal(scale=.5, size=size)

x_shared = theano.shared(x)
y_shared = theano.shared(y)

with Model() as model: # model specifications in PyMC3 are wrapped in a with-statement
# Define priors
sigma = HalfCauchy('sigma', beta=10, testval=1.)
intercept = Normal('Intercept', 0, sd=20)
x_coeff = Normal('x', 0, sd=20)

x_mu = Normal('x_mu', mu=0.5, sd=0.1)
x_modeled = Normal('x_modeled', mu=x_mu, sd=0.25, observed=x_shared)

# Define likelihood
likelihood = Normal('y', mu=intercept + x_coeff * x_modeled,
sd=sigma, observed=y_shared)

# Inference!
trace = sample(3000, cores=2) # draw 3000 posterior samples using NUTS sampling

x_shared.set_value(np.array([0.55]))
ppc = sample_ppc(trace, model=model, samples=500) # `y` Should be around ~2.1
plt.boxplot(ppc['y'])

y_shared.set_value(np.array([2.1]))
ppc = sample_ppc(trace, model=model, samples=500) # `x_modeled` Should be around ~0.55
plt.boxplot(ppc['x_modeled'])
``````

I think using `set_value()` works in this way - if the value for `x_shared` is observed but not `y_shared`, and then `y_shared` is observed but not `x_shared`

1 Like