Implementing ARD Regression

Nadheesh · May 28, 2018, 10:04am

with pm.Model() as model:
    # Define hyper-prior
    alpha = pm.Gamma("alpha", alpha=1e-2, beta=1e-4, shape=X.shape[1])
    # Define priors'
    w = pm.Normal("w", mu=0, sd=alpha, shape=X.shape[1])
    sigma = pm.HalfCauchy("sigma", beta=10)
    mu = tt.dot(w, X.T)

    # Define likelihood
    likelihood = pm.Normal("y", mu=mu, sd=sigma, observed=y)

I want to implement ARD linear regression with pymc3. I have written the code. However, I’m not sure what happens when alpha hyper-prior is given to “w”. Does it select the corresponding alpha for each w correctly?

rlouf · May 28, 2018, 12:35pm

The hyperprior is just another distribution from which the std of the normal distribution is drawn. If the combination of this hyperprior and the likelihood can not produce samples that are close to your data then either the likelihood isn’t adapted to your problem (wrong model), or your prior isn’t adapted.

I suggest taking some samples from you model using this hyperprior and see if the resulting data point ‘make sense’. If they don’t it might be that your hyperprior is wrong.

Did I understand your question correctly?

rlouf · May 28, 2018, 2:16pm

@junpenglao can we generate prior predictive distributions with PyMC3? The prior choice comes up a lot in the questions here (and from colleagues). Simplifying the procedure could help. If it’s not implemented I might give it a go.

junpenglao · May 28, 2018, 2:26pm

This question was recently brought up in another post:

There is a WIP pull request to implement this: https://github.com/pymc-devs/pymc3/pull/2983 But we met some complication as there are bugs with the repeated shape/size.

For now, you can use this old (slow) implementation
https://github.com/junpenglao/Planet_Sakaar_Data_Science/blob/master/WIP/Test_sample_prior.ipynb

I am not sure if any progress has been made recently (cc @colcarroll). Maybe it is worth to implemented the slower version as reference for now?

colcarroll · May 28, 2018, 11:51pm

Yes, I’ve thought about pushing to roll out the current flawed version since it works in most cases, but at the very least I would like try to put some exceptions around those edge cases that cause trouble. @rlouf I would love help, either with the current attempt, or if you would like to start from scratch!

https://github.com/pymc-devs/pymc3/pull/2984 would be helpful, since it fixes some obvious bugs, but everything touching variable shapes is trickier than expected. It feels like merging that would make draw_values([model.vars], size=1000) just about do it.

Nadheesh · May 30, 2018, 3:36am

I think the hyper-prior is not working as expected. I wanted to assign single distribution from “alpha” RV vector to the single “w_i” from “w” RV vector. Therefore, each “w_i” from “w” RV vector has single hyper-prior describing the std of “w_i”.

Instead of RVs vector, now I’m following the below approach.

with pm.Model() as model:
    w = []
    for i in range(X.shape[1]):
        # Define hyper-prior
        alpha_i = pm.Gamma("alpha%i"%i, alpha=1e-2, beta=1e-4)

        # Define priors'
        w_i = pm.Normal("w%i"%i, mu=0, sd=alpha)
        w.append(w_i)

    mu = pm.math.dot(w, X.T)
    sigma = pm.HalfCauchy("sigma", beta=10)

    # Define likelihood
    likelihood = pm.Normal("y", mu=mu, sd=sigma, observed=y)

junpenglao · May 30, 2018, 5:00am

you dont need the for loop:

alpha = pm.Gamma('alpha', 1e-2, 1e-4, shape=X.shape[1])
w = pm.Normal('w', 0, alpha, shape=X.shape[1])

Nadheesh · May 30, 2018, 6:42am

Thanks for the suggestion. That is what I tried first. However, the predictions were poor when I used above script. I assumed that it does not assign each alpha_i to corresponding w_i as expected in ARD.

junpenglao · May 30, 2018, 8:23am

No the two is the same - the poor fitting is not related to this. Try following @rlouf’s suggestion and check the data generation process.

junpenglao · May 30, 2018, 8:48am

Try putting a smaller prior on sigma of the observed, halfCauchy with beta=10 is way too heavy tail:

sigma = pm.HalfNormal("sigma", 1.)

In general, you can think of the error/variance on the observed being contributed by the smoothing (covariate with the nearby points) and the measurement error (the variance at each observation). So a better way to restrict it is to model the total variance - you can have a look of my notebook with a bit more information of this perspective: https://github.com/junpenglao/Bayesian_Smoothing_EyeMovement/blob/master/Bayesian_Smoothing.ipynb

Topic		Replies	Views
Appropriate Gamma prior for ARD regression v5 prior , modeling	1	476	January 25, 2024
Must be real number, not FreeRV Questions	2	494	October 22, 2021
Multiple priors in regression model Questions	2	932	January 22, 2021
Prior predicitive for linear regression Questions	2	526	May 22, 2019
Gaussian Process Regression with hyperpriors Questions	6	1976	August 28, 2019

Implementing ARD Regression

Related topics