Shared theano in multiple regression


#1

I have a multiple regression model. y is dependent with shape of (12000,) and x is independent with shape of (12000, 23).
my model is pm.math.dot(x_shared,a)+c) where x_shared is theano.shared(x)

. After sampling (30000 samples), I want to obtain posterior predictive for x_new with shape of (400,23). I use x_shared.set_value(xx_new) and then use posterior_pred = pm.sample_posterior_predictive(trace, model=model, samples=100). At the end of the day I expect posterior_pred['obs'].shape be (100,400), amazingly I get (100,12000)! Noted 12000 is my input data size for 23 independent variables.
I really confused why I do not get the correct shape for my predictive values? The only difference I can see with solved examples, is my input is a matrix, not a vector, because it is a multiple regression not a one dimensional regression.
Any idea?
Thanks!


#2

Are you on the latest release or the master branch? This should be fixed now.


#3

pymc 3.6
theano 1.0.4


#4

I uninstall conda and try to install all things again. I am on windows. After installing anaconda, is there any recommendation for order of the packages should be installed? I think I should first start theano then pymc3. For theano also there are some packages which were recommended in its page to install first. So I will install them first and then Theano and then pymc3.
If I miss something please let me know?
After this I will run this code again to see if the problem resolved or not, and then keep you updated.
Thanks!


#5

FYI, this is the code I am using:

import numpy as np
import pymc3 as pm
import theano

n, k = 1000, 5
X = np.random.rand(n, k) * 5.
beta = np.random.randn(k, 1)
Y = np.dot(X, beta) + 0.3 + np.random.randn(n)*.5

X_shared = theano.shared(X[:900])
Y_shared = theano.shared(Y[:900])

with pm.Model() as m:
    beta = pm.Normal('beta', 0., 100., shape=(k, 1))
    b = pm.Normal('b', 0., 100.)
    yhat = pm.math.dot(X_shared, beta) + b
    sigma = pm.HalfCauchy('sd', 2.5)
    obs = pm.Normal('y', yhat, sigma, observed=Y_shared)
    trace = pm.sample()

X_shared.set_value(X[900:])
posterior_pred = pm.sample_posterior_predictive(trace, model=m, samples=500)

#6

Thank you for sharing. I do very similar code. Maybe a problem in my installation, so now I am re-installing everything. Keep you posted. ty


#7

Well, after installing everything still I has that bug. Maybe something is wrong in my data shapes. Will go through them Monday.
Thanks!


#8

I’m curious of how you installed pymc3. Did you use conda install pymc3 or pip install pymc3?

If you did use one of those commands, they install the latest release of pymc3. However, as @junpenglao said, we have fixed somethings on the latest development branch. In particular, one fix was related to sampling from the posterior predictive when using shareds. I recommend that you uninstall pymc3 and then install the latest development branch by calling:
pip install git+https://github.com/pymc-devs/pymc3.git

Let us know if your problem persists after this, and if it does, could you share a minimal example of code and data that produces the faulty behavior?


#9

Thank you Luciano. I initially install everything using conda on the base environment, but after considering the possibility of installation problem, I uninstall everything, then after installing Anaconda, I made a new environment for Python 3.6. In that I installed Theano, Tensorflow, Keras and PyMC3 and all can be imported wo any problem. Still I has problem, so I decided to run @junpenglao’s code now to see if that code gives me the same problem, or not. It is running now and I will keep you posted.
P.S. I was thinking problem with njobs>1 has been solved in the newest version of PyMC3 but in this new installation, I also has that problem and I have to put it equal to 1.
Thanks!


#10

Thank you Luciano. I initially installed everything using conda on the base environment, but after considering the possibility of installation problem, I uninstalled everything, then after installing Anaconda, I made a new environment for Python 3.6. In that I installed Theano, Tensorflow, Keras and PyMC3 and all can be imported wo any problem. Still I have the problem, so I decided to run @junpenglao’s code now to see if that code gives me the same problem, or not. It is running now and I will keep you posted.
P.S. I was thinking problem with njobs>1 has been solved in the newest version of PyMC3 but in this new installation, I also has that problem and I have to put it equal to 1.
Thanks!


#11

Dear @junpenglao and @lucianopaz, I think I found where was the problem. I am not sure if you guys want to open an issue for developers for this or not?
The problem happens when we use β€œ@deterministic” decorator in the model. Inspiring from the code shared above, if you change yhat = pm.math.dot(X_shared, beta) + b to yhat = pm.Deterministic("yhat",pm.math.dot(X_shared, beta) + b), though you do not get any error, the dimension for posterior predictive is not correct and would be like your input training data. I am not sure it is a version error or is something the same for all versions? I am using the most recent versions currently:

pymc 3.6
theano 1.0.4


#12

@madarshahian, there was an issue when sampling the posterior predictive with deterministics in the model. The issue is now fixed in the latest development branch (not in the latest stable release on conda or pip, version number 3.6). To install the latest development branch instead of the latest stable release, you have to:

  1. Delete pymc3 from the conda environment.
  2. Run pip install git+https://github.com/pymc-devs/pymc3.git

Only if your problem is still present in the latest development branch, should you open an issue with some small code example that produces the faulty behavior.


#13

OK, great!
Thanks!