Predicting from a model with multiple observed variables?


#1

I have a model (simplified model below) that has multiple observed variables that depend on each other. After model training, I’d like to predict by sampling from the posterior.
It seems like this version does not allow me to successfully predict both target variables.

Generate some example data:

seed = 42
np.random.seed(seed)

def get_data(n = 100):
    mu = 1000
    X_obs = stats.poisson.rvs(mu=mu, size=n).astype(float)
    r_obs = stats.beta.rvs(a=100, b=50, size=n)
    br_obs = stats.beta.rvs(a=90, b=10, size=n)
    w_obs = [0.25, 0.75]
    
    y_obs = X_obs * r_obs * br_obs
    
    return y_obs, r_obs, br_obs, w_obs, X_obs

n = 1000
y_obs, r_obs, br_obs, w_obs, X_obs = get_data(n)
R_obs = r_obs * X_obs
X_shared = tt.shared(X_obs)
with pm.Model() as model_not_predictable:
    
    sd_alpha = pm.HalfNormal("sd_a", sd=1, shape=2)
    sd_beta = pm.HalfNormal("sd_b", sd=1, shape=2)

    alpha = pm.HalfNormal("alpha", sd=sd_alpha, shape=2)
    beta = pm.HalfNormal("beta", sd=sd_beta, shape=2)
    
    r = pm.Beta("r", alpha=alpha[0], beta=beta[0], shape=n, testval=r_obs)
    
    mu_R = r * X_shared
    R = pm.Poisson('R', mu=mu_R, observed=R_obs)
    
    br = pm.Beta("br", alpha=alpha[1], beta=beta[1], shape=n, testval=br_obs)

    # y is a factor of R and br
    y_mu = R * br
    
    y = pm.Poisson('y', mu=y_mu, observed=y_obs)

If I sample from this model’s posterior after replacing X_shared with new values, only variable R, which is directly dependent on X_shared will reflect the updated X values.

It seems like I can get this work if I replace R in the definition of y_mu with either R_mu, which seems to make y_mu directly dependent on X_shared, with r * X_shared explicitly.

However, in my actual model, not this toy model, this leads to much slower sampling rates. Presumably that’s b/c the sampling of y is now dependent on add’l variables and can’t take advantage of R.

Question: am I approaching this correctly? Any obvious errors in the above model? Or, is the model below the only way to to do? Any ideas much appreciated!!

with pm.Model() as model_predictable:
    
    sd_alpha = pm.HalfNormal("sd_a", sd=1, shape=2)
    sd_beta = pm.HalfNormal("sd_b", sd=1, shape=2)

    alpha = pm.HalfNormal("alpha", sd=sd_alpha, shape=2)
    beta = pm.HalfNormal("beta", sd=sd_beta, shape=2)
    
    r = pm.Beta("r", alpha=alpha[0], beta=beta[0], shape=n, testval=r_obs)
    
    R_mu = r * X_shared
    R = pm.Poisson('R', mu=R_mu, observed=R_obs)
    
    br = pm.Beta("br", alpha=alpha[1], beta=beta[1], shape=n, testval=br_obs)
    
    y_mu = r * X_shared * br
    
    y = pm.Poisson('y', mu=y_mu, observed=y_obs)

#2

Dear @clausherther,
Just some brief notes:

if sd_alpha and sd_beta are hyper-priors, their shape should probably be set to 1. Unless: you intend to create [2 x 2] structures here, but maybe the following is better:

Indexing should work, but why don’t you separate the priors? That should make the model much clearer. I fear with the current construct, although shapes might match, the consecutive model operations leave some of these distributions without observation, thus they sample data-less.

I tend to use vector notation, and pm.math.dot, because what you are doing here is vector/matrix algebra.
For example, if X_shared is [100 x 2], then r should be the following with shape=(1, n):

r = pm.Beta("r", alpha=alpha[0], beta=beta[0], shape=(1, n), testval=r_obs)

Then the dot product should be of correct shape. Also, sampling is much faster. This has to do with the fact that patsy/numpy matrices are very efficient and increas sampling speed drastically.

You can check this notebook, from a previous thread in this forum, and convert the patsy design matrices to theano.shared. If I do have more time tomorrow, I’ll try to apply it to your example.

Cheers,

Falk


#3

HI @falk, thanks for those suggestions! I think I’m a little confused though. My model has multiple y’s, not multiple Xs, so I think the use case for me is a little different.
I’ve tried the different way to declare hyper-priors and it doesn’t seem to really make a difference. I just wrote it this way for the example to reduce code for the post.
Claus


#4

Sorry, I mixed that in my first response (it’s all just distributions after all).
I speculated that something would be wrong with the hyperpriors because you wrote that you can only get one of the observables to run, and because you reported slow sampling.

As I wrote, I will try to have a deeper look tomorrow, unless someone else is quicker.


#5

So here we go. Sorry again for my premature answer.

I now realized that I should have checked your get_data function more carefully. What you indeed do is that you multiply X_obs * r_obs * br_obs element-wise. This gives you one vector of n elements.

Then further down you specify the shape of r and br to be n. This means that there are n individual beta distributions (which gives quite a mess in the traceplots).
All of them get multiplied, again element-wise, to form y_mu. This means you infer n=1000 Poisson distributions, along with each n=1000 values for r and br, and all of those are inferred from only one observation each. I guess that is not what you intend.

Instead, I suppose you want to have the data wrapped under a single Poisson.
As, for example, here:

with pm.Model() as model_suggestion:
    alpha_r = pm.HalfNormal("alpha_r", sd=1)
    beta_r = pm.HalfNormal("beta_r", sd=1)
    r = pm.Beta("r", alpha=alpha_r, beta=beta_r)

    alpha_br = pm.HalfNormal("alpha_br", sd=1)
    beta_br = pm.HalfNormal("beta_br", sd=1)
    br = pm.Beta("br", alpha=alpha_br, beta=beta_br)


    y_mu = r * X_shared * br
    y = pm.Poisson('y', mu=y_mu, observed=y_obs)

But attention this does not sample well. This model still leaves ambiguity in how to separate factors r and br, because you only have one equation, but two unknowns, thus they cannot be fixed to distinct values. So the suggestion can be fixed by setting

r = pm.Beta("r", alpha=alpha_r, beta=beta_r, observed = r_obs)

or

br = pm.Beta("br", alpha=alpha_br, beta=beta_br, observed = br_obs)

or both.

Thus, could you please specify again what you observe, and what you intend to infer? I have the feeling that I still did not understand what you are trying to achieve. Thanks!

Hope to help,

Falk