"Input dimension mis-match" in basic model?

Gon_F · January 21, 2019, 8:33am

Hello. I just started diving into PyMC3 after using machine learning and wanting those flexible Bayesian models. To start, I wanted to run a basic regression on a housing dataset that I cleaned from kaggle, but my model keeps giving me a dimension mis-match error, no matter what I do?

This is my code,

with pm.Model() as model:
    
    # Priors 
    beta = pm.Normal('beta', mu=0, sd=10000, shape=list(train_norm.columns))
    intercept = pm.Normal('intercept', mu=0, sd=10000)

    std = pm.HalfNormal('std', sd=100)

    # Likelihood
    price = intercept + beta*train_norm
    
    y_lik = pm.Normal('y_lik', mu=price, sd=std, observed=SalePrice)

    trace = sample()

, with the resulting error: “Input dimension mis-match. (input[0].shape[1] = 1460, input[1].shape[1] = 270)”.

I uploaded my input and output data, which can be loaded back in easily with

SalePrice = pd.read_csv('SalePrice.csv')
train_norm = pd.read_csv('train_norm.csv')

# Minor editing
train_norm.drop(axis=1, labels='Unnamed: 0', inplace=True)
train_norm = train_norm.iloc[1:,:]

Could anyone please tell me what I am doing wrong?

train_norm.csv (1.9 MB)
SalePrice.csv (15.9 KB)

junpenglao · January 21, 2019, 9:33am

You need a dot product here:

Gon_F · January 21, 2019, 8:41pm

So, I tried out your suggestion.

First, doing
price = intercept + np.dot(beta,train_norm)
ran endlessly without any messages (and I have a fairly decent cpu so something else must have been not working).

Next, being inspired by the suggestion and looking up the api quickstart more closely, I tried,
price = intercept + beta.dot(train_norm)
, which actually ran (displaying a message about theano first), but ended up giving me a new error:
shapes (270,) and (1460,270) not aligned: 270 (dim 0) != 1460 (dim 0).

Attempting to fix that by manually inputting the shape instead as shape=(1460,270) (I have 1460 rows and 270 variable columns in the input dataframe), gave me the error now
shapes (1460,270) and (1460,270) not aligned: 270 (dim 1) != 1460 (dim 0)

I am rather confused. I would appreciate any more help greatly.

junpenglao · January 21, 2019, 9:33pm

Try:

beta = pm.Normal('beta', mu=0, sd=10000, shape=(270, 1))
price = intercept + pm.math.dot(train_norm, beta)

Gon_F · January 21, 2019, 10:36pm

This finally worked, thank you very much for your help! It was the two numbers in the shape option, as well as pymc’s math operations being necessary, that were evading me.

Now python keeps crashing for me when I try to run the model, but I figured it was because of the large number of variables my data has, and I have managed to run the model with 40/270 variables (that being the limit). Do you have any advice, or resources you could point me toward, on running pymc3 with large data, since I heard NUTS can supposedly handle hundreds of variables?

Gon_F · January 21, 2019, 11:02pm

Coincidentally, after further sleuthing it was you helping another user with the same problem that gave me the solution.

So, unless you know how to fix such memory issues now preventing multi-core use, I’ll be looking to see how to fix them on os x El Capitan.

junpenglao · January 22, 2019, 5:56am

I would not expect memory error with input matrix of this size - could you try casting all pandas table into numpy arrays? something like train_norm = train_norm.values

Topic		Replies	Views
Hierarchical Model - Input Dimension Mis-Match on New Data Questions	4	1337	October 9, 2020
How to resolve Input Dimension Mis-match Error in Hierarchical Bayesian Inference with PyMC3 v5 theano , modeling , jax , hierarchical	3	327	December 20, 2023
Memory issues with creating simple regression model Questions	4	1938	June 17, 2019
Create model matrix Questions	7	1490	June 23, 2021
Matrix Multiplication With Multiple Dimensions in PYMC Model v5 modeling	4	1091	July 7, 2022

"Input dimension mis-match" in basic model?

Related topics