Shared theano in multiple regression

madarshahian · February 9, 2019, 4:58am

I have a multiple regression model. y is dependent with shape of (12000,) and x is independent with shape of (12000, 23).
my model is pm.math.dot(x_shared,a)+c) where x_shared is theano.shared(x)

. After sampling (30000 samples), I want to obtain posterior predictive for x_new with shape of (400,23). I use x_shared.set_value(xx_new) and then use posterior_pred = pm.sample_posterior_predictive(trace, model=model, samples=100). At the end of the day I expect posterior_pred['obs'].shape be (100,400), amazingly I get (100,12000)! Noted 12000 is my input data size for 23 independent variables.
I really confused why I do not get the correct shape for my predictive values? The only difference I can see with solved examples, is my input is a matrix, not a vector, because it is a multiple regression not a one dimensional regression.
Any idea?
Thanks!

junpenglao · February 9, 2019, 9:15am

Are you on the latest release or the master branch? This should be fixed now.

madarshahian · February 9, 2019, 8:16pm

pymc 3.6
theano 1.0.4

madarshahian · February 9, 2019, 8:43pm

I uninstall conda and try to install all things again. I am on windows. After installing anaconda, is there any recommendation for order of the packages should be installed? I think I should first start theano then pymc3. For theano also there are some packages which were recommended in its page to install first. So I will install them first and then Theano and then pymc3.
If I miss something please let me know?
After this I will run this code again to see if the problem resolved or not, and then keep you updated.
Thanks!

junpenglao · February 9, 2019, 9:23pm

FYI, this is the code I am using:

import numpy as np
import pymc3 as pm
import theano

n, k = 1000, 5
X = np.random.rand(n, k) * 5.
beta = np.random.randn(k, 1)
Y = np.dot(X, beta) + 0.3 + np.random.randn(n)*.5

X_shared = theano.shared(X[:900])
Y_shared = theano.shared(Y[:900])

with pm.Model() as m:
    beta = pm.Normal('beta', 0., 100., shape=(k, 1))
    b = pm.Normal('b', 0., 100.)
    yhat = pm.math.dot(X_shared, beta) + b
    sigma = pm.HalfCauchy('sd', 2.5)
    obs = pm.Normal('y', yhat, sigma, observed=Y_shared)
    trace = pm.sample()

X_shared.set_value(X[900:])
posterior_pred = pm.sample_posterior_predictive(trace, model=m, samples=500)

madarshahian · February 9, 2019, 10:09pm

Thank you for sharing. I do very similar code. Maybe a problem in my installation, so now I am re-installing everything. Keep you posted. ty

madarshahian · February 9, 2019, 10:58pm

Well, after installing everything still I has that bug. Maybe something is wrong in my data shapes. Will go through them Monday.
Thanks!

lucianopaz · February 10, 2019, 8:33am

I’m curious of how you installed pymc3. Did you use conda install pymc3 or pip install pymc3?

If you did use one of those commands, they install the latest release of pymc3. However, as @junpenglao said, we have fixed somethings on the latest development branch. In particular, one fix was related to sampling from the posterior predictive when using shareds. I recommend that you uninstall pymc3 and then install the latest development branch by calling:
pip install git+https://github.com/pymc-devs/pymc3.git

Let us know if your problem persists after this, and if it does, could you share a minimal example of code and data that produces the faulty behavior?

madarshahian · February 11, 2019, 3:44pm

Thank you Luciano. I initially install everything using conda on the base environment, but after considering the possibility of installation problem, I uninstall everything, then after installing Anaconda, I made a new environment for Python 3.6. In that I installed Theano, Tensorflow, Keras and PyMC3 and all can be imported wo any problem. Still I has problem, so I decided to run @junpenglao’s code now to see if that code gives me the same problem, or not. It is running now and I will keep you posted.
P.S. I was thinking problem with njobs>1 has been solved in the newest version of PyMC3 but in this new installation, I also has that problem and I have to put it equal to 1.
Thanks!

madarshahian · February 11, 2019, 3:45pm

Thank you Luciano. I initially installed everything using conda on the base environment, but after considering the possibility of installation problem, I uninstalled everything, then after installing Anaconda, I made a new environment for Python 3.6. In that I installed Theano, Tensorflow, Keras and PyMC3 and all can be imported wo any problem. Still I have the problem, so I decided to run @junpenglao’s code now to see if that code gives me the same problem, or not. It is running now and I will keep you posted.
P.S. I was thinking problem with njobs>1 has been solved in the newest version of PyMC3 but in this new installation, I also has that problem and I have to put it equal to 1.
Thanks!

madarshahian · February 11, 2019, 11:52pm

Dear @junpenglao and @lucianopaz, I think I found where was the problem. I am not sure if you guys want to open an issue for developers for this or not?
The problem happens when we use “@deterministic” decorator in the model. Inspiring from the code shared above, if you change yhat = pm.math.dot(X_shared, beta) + b to yhat = pm.Deterministic("yhat",pm.math.dot(X_shared, beta) + b), though you do not get any error, the dimension for posterior predictive is not correct and would be like your input training data. I am not sure it is a version error or is something the same for all versions? I am using the most recent versions currently:

pymc 3.6
theano 1.0.4

lucianopaz · February 12, 2019, 5:20am

@madarshahian, there was an issue when sampling the posterior predictive with deterministics in the model. The issue is now fixed in the latest development branch (not in the latest stable release on conda or pip, version number 3.6). To install the latest development branch instead of the latest stable release, you have to:

Delete pymc3 from the conda environment.
Run pip install git+https://github.com/pymc-devs/pymc3.git

Only if your problem is still present in the latest development branch, should you open an issue with some small code example that produces the faulty behavior.

madarshahian · February 12, 2019, 3:35pm

OK, great!
Thanks!

Topic		Replies	Views
Posterior predictive sampling with shared matrix Questions	2	588	August 31, 2018
Theano shared and prediction not working as expected Questions	2	657	July 13, 2019
How to predict new values on hold-out data Questions	24	13373	July 22, 2020
Test_value shape errors with theano.shared Questions	13	2560	March 27, 2018
Is it possible to convert column from pandas to theano.shared? Questions	12	2410	September 24, 2021

Shared theano in multiple regression

Related topics