I am trying to fit a linear regression model (using PyMC3) where one of the predictors has a posterior distribution. I normally use pm.Data to set the values of predictors but I don’t think I can do that when the predictor is not a column but a 2 dimensional array where rows are posterior samples and columns are time points.
Is there any way to do that in PyMC3?
To give a concrete example I am trying to predict revenue ~ B_trend * trend + B_seas * seas + B_spend * spend. Spend is a time series. Trend and seasonality are instead computed using an external library that return posteriors in the form of 2d arrays.
How can I feed that to the regression?
You can definitely use multi-column matrices. But instead of using the multiplication operator, think of matrix multiplication (“dot” operator). You can find all kinds of operations in the
theano.tensor library (
tt), for example you can also stack column vectors. So try something like:
The tricky part is getting the dimensionality right for
spend is the
pm.Data tensor). I’m sure there are examples in the docs (search for
theano.tensor). I also recently put one here.
Can I do that even if the X used for training was unidimensional? It only becomes bi-dimensional at prediction (since I am feeding the whole posterior predictive instead of just the mean/median).
I think changing dimensionality will most likely break the maths… Unless my brain is too inflexible by all the shape error treatment it had to overcome
So you either have to initialize bi-dimensionally from the start (I always do that, most data vectors are
n x 1 column vectors and slopes are
1 x o shaped tensors with
o being the number of observables).
Or you can solve the extra dimension by using a “loop”.