Implementing ARD Regression

with pm.Model() as model:
    # Define hyper-prior
    alpha = pm.Gamma("alpha", alpha=1e-2, beta=1e-4, shape=X.shape[1])
    # Define priors'
    w = pm.Normal("w", mu=0, sd=alpha, shape=X.shape[1])
    sigma = pm.HalfCauchy("sigma", beta=10)
    mu = tt.dot(w, X.T)

    # Define likelihood
    likelihood = pm.Normal("y", mu=mu, sd=sigma, observed=y)

I want to implement ARD linear regression with pymc3. I have written the code. However, I’m not sure what happens when alpha hyper-prior is given to “w”. Does it select the corresponding alpha for each w correctly?

The hyperprior is just another distribution from which the std of the normal distribution is drawn. If the combination of this hyperprior and the likelihood can not produce samples that are close to your data then either the likelihood isn’t adapted to your problem (wrong model), or your prior isn’t adapted.

I suggest taking some samples from you model using this hyperprior and see if the resulting data point ‘make sense’. If they don’t it might be that your hyperprior is wrong.

Did I understand your question correctly?

2 Likes

@junpenglao can we generate prior predictive distributions with PyMC3? The prior choice comes up a lot in the questions here (and from colleagues). Simplifying the procedure could help. If it’s not implemented I might give it a go.

This question was recently brought up in another post:

There is a WIP pull request to implement this: https://github.com/pymc-devs/pymc3/pull/2983 But we met some complication as there are bugs with the repeated shape/size.

For now, you can use this old (slow) implementation
https://github.com/junpenglao/Planet_Sakaar_Data_Science/blob/master/WIP/Test_sample_prior.ipynb

I am not sure if any progress has been made recently (cc @colcarroll). Maybe it is worth to implemented the slower version as reference for now?

2 Likes

Yes, I’ve thought about pushing to roll out the current flawed version since it works in most cases, but at the very least I would like try to put some exceptions around those edge cases that cause trouble. @rlouf I would love help, either with the current attempt, or if you would like to start from scratch!

https://github.com/pymc-devs/pymc3/pull/2984 would be helpful, since it fixes some obvious bugs, but everything touching variable shapes is trickier than expected. It feels like merging that would make draw_values([model.vars], size=1000) just about do it.

1 Like

I think the hyper-prior is not working as expected. I wanted to assign single distribution from “alpha” RV vector to the single “w_i” from “w” RV vector. Therefore, each “w_i” from “w” RV vector has single hyper-prior describing the std of “w_i”.

Instead of RVs vector, now I’m following the below approach.

with pm.Model() as model:
    w = []
    for i in range(X.shape[1]):
        # Define hyper-prior
        alpha_i = pm.Gamma("alpha%i"%i, alpha=1e-2, beta=1e-4)

        # Define priors'
        w_i = pm.Normal("w%i"%i, mu=0, sd=alpha)
        w.append(w_i)

    mu = pm.math.dot(w, X.T)
    sigma = pm.HalfCauchy("sigma", beta=10)

    # Define likelihood
    likelihood = pm.Normal("y", mu=mu, sd=sigma, observed=y)

you dont need the for loop:

alpha = pm.Gamma('alpha', 1e-2, 1e-4, shape=X.shape[1])
w = pm.Normal('w', 0, alpha, shape=X.shape[1])

Thanks for the suggestion. That is what I tried first. However, the predictions were poor when I used above script. I assumed that it does not assign each alpha_i to corresponding w_i as expected in ARD.

No the two is the same - the poor fitting is not related to this. Try following @rlouf’s suggestion and check the data generation process.

1 Like

Try putting a smaller prior on sigma of the observed, halfCauchy with beta=10 is way too heavy tail:

sigma = pm.HalfNormal("sigma", 1.)

In general, you can think of the error/variance on the observed being contributed by the smoothing (covariate with the nearby points) and the measurement error (the variance at each observation). So a better way to restrict it is to model the total variance - you can have a look of my notebook with a bit more information of this perspective: https://github.com/junpenglao/Bayesian_Smoothing_EyeMovement/blob/master/Bayesian_Smoothing.ipynb

1 Like