Hello. I just started diving into PyMC3 after using machine learning and wanting those flexible Bayesian models. To start, I wanted to run a basic regression on a housing dataset that I cleaned from kaggle, but my model keeps giving me a dimension mis-match error, no matter what I do?
First, doing price = intercept + np.dot(beta,train_norm)
ran endlessly without any messages (and I have a fairly decent cpu so something else must have been not working).
Next, being inspired by the suggestion and looking up the api quickstart more closely, I tried, price = intercept + beta.dot(train_norm)
, which actually ran (displaying a message about theano first), but ended up giving me a new error: shapes (270,) and (1460,270) not aligned: 270 (dim 0) != 1460 (dim 0).
Attempting to fix that by manually inputting the shape instead as shape=(1460,270) (I have 1460 rows and 270 variable columns in the input dataframe), gave me the error now shapes (1460,270) and (1460,270) not aligned: 270 (dim 1) != 1460 (dim 0)
I am rather confused. I would appreciate any more help greatly.
This finally worked, thank you very much for your help! It was the two numbers in the shape option, as well as pymc’s math operations being necessary, that were evading me.
Now python keeps crashing for me when I try to run the model, but I figured it was because of the large number of variables my data has, and I have managed to run the model with 40/270 variables (that being the limit). Do you have any advice, or resources you could point me toward, on running pymc3 with large data, since I heard NUTS can supposedly handle hundreds of variables?
I would not expect memory error with input matrix of this size - could you try casting all pandas table into numpy arrays? something like train_norm = train_norm.values