NUTS speeds issue in model?

Gon_F · January 28, 2019, 3:41am

Hello, all. After spending some time on this forum, I wanted to get my feet wet and try out a more advanced example that I found online (my data will be attached below), but no matter what I’ve done, my NUTS sampler never finishes, or runs into some error, even on a fast machine.

This is my code attempting a basic linear regression,

 with pm.Model() as model:
    
    # Priors 
    beta = pm.Uniform('beta', shape=(135, 1))
    intercept = pm.Normal('intercept', mu=0, sd=3)
    
    std = pm.HalfNormal('std', sd=3)

    # Likelihood
    price = intercept + pm.math.dot(train_norm_retained, beta)
    
    y_lik = pm.Normal('y_lik', mu=price, sd=std, observed=np.log(SalePrice.values))

    trace = pm.sample(cores=4, njobs=4)

My data consists of 270 variables, and the model simply refused to run that at first, with the kernel just dying. To make NUTS’ job easier, I decided to standardize all of my data, log my outcome variable, and used feature selection to pick the most important 50% of them, so now there are only 135 variables. Still, the model runs into plenty of errors, either Mass matrix contains zeros on the diagonal, or another random error.

I tried using exoplanet’s PyMC3 modifications, and that one gets a bit further with my model, but outputs a LAPACK error saying __'th leading minor of the array is not positive definite.
Is there anything else I could do to get this model to work, or is my data that unwieldy, which would be fine by me, but at least I would like a confirmation if that is indeed true.

train_norm_retained.csv (1.3 MB)
SalePrice.csv (9.9 KB)

junpenglao · January 28, 2019, 9:30am

Your input matrix has a really large condition number, which means that many of the columns are likely redundant:

np.linalg.cond(train_norm_retained) # ==> 7.768470541187648e+17

You can try some dimension reduction trick first on train_norm_retained (for example, a PCA or SVD).

Topic		Replies	Views
Issues with NUTS performance for large scale problems Questions	1	757	January 30, 2019
Runtime Exception with NUTS v3	16	1108	March 16, 2022
NUTS and zero derivative error Questions	6	7026	August 18, 2019
Excessive memory - Multiple regression Questions	9	2913	October 24, 2017
Shape error after pm.math.dot in multivar logistic regression model Questions	1	661	September 12, 2019

NUTS speeds issue in model?

Related topics