Test_value shape errors with theano.shared

simpleton · March 22, 2018, 7:04pm

Hi,

I am running into problems when I use theano.shared variables. I have simplified the problem down to a minimal example below. When I replace the ‘observed’ variable with a shared object and operate on the pymc3 distribution, I get the error listed below in the final line.

Any help would be much appreciated.

=======================

import numpy as np
import theano
floatX = theano.config.floatX
import pymc3 as pm

Xshape = (10,2)
X_train = np.random.randn(*Xshape).astype(floatX)
ann_input = theano.shared(X_train)

with pm.Model() as model:
    xTrain = pm.Normal('xTrain', mu=0, sd=1., 
                       observed=X_train, 
                       total_size=(Xshape))

    # Build neural-network using tanh activation function
    act_1 = xTrain + 1
    
    xTrain2 = pm.Normal('xTrain2', mu=0, sd=1., 
                       observed=ann_input, 
                       total_size=(Xshape))

    # Build neural-network using tanh activation function
    act_1 = xTrain2 + 1

=======================

TypeError: For compute_test_value, one input test value does not have the requested type.

The error when converting the test value to that variable type:
Wrong number of dimensions: expected 1, got 2 with shape (10, 2).

junpenglao · March 22, 2018, 8:18pm

This is a weird one, I have no idea how to debug it.
For now, a workaround is to turn off the test_value before the last statement:

with pm.Model() as model:
    ...
    theano.config.compute_test_value='off'
    act_1 = xTrain2 + 1

Otherwise, you can set compute_test_value in model’s theano_config argument like

with pm.Model(theano_config={'compute_test_value': 'off'}) as model:
  ...

But it wont work with

xTrain = pm.Normal('xTrain', mu=0, sd=1., 
                       observed=X_train, 
                       total_size=(Xshape))

which means you need to set all nparray input to theano.shared

simpleton · March 23, 2018, 9:42am

OK. Thanks for the quick response. The first method definitely fixes the issue I presented. The problem is I then want to use ‘act_1’ in a likelihood, for instance like:

yTrue = pm.Normal('yTrue', mu=act_1, sd=1, shape=Xshape)

which returns

AttributeError: 'scratchpad' object has no attribute 'test_value'

Passing yTrue testval=Xtrain gives

ValueError: Cannot compute test value: input 0 (Elemwise{add,no_inplace}.0) of Op InplaceDimShuffle{x,0}(Elemwise{add,no_inplace}.0) missing default value.

where it is complaining about ‘act_2=xTrain2+1’ line. Also, I don’t understand how to get the second method to work. If I run

with pm.Model(theano_config={'compute_test_value': 'off'}) as model:    
    xTrain2 = pm.Normal('xTrain2', mu=0, sd=1., 
                   observed=ann_input, 
                   total_size=(Xshape))
    act_2 = xTrain2+1

I just get

AttributeError: 'scratchpad' object has no attribute 'test_value'

with or without explicitly passing a testval.

junpenglao · March 23, 2018, 1:34pm

Could you please share a more complete code (ideally with data)?

simpleton · March 26, 2018, 9:35am

I can provide a much fuller code but I have tried to remove extraneous stuff to focus on the problem I am having. I am trying to construct a neural network for a set of inputs and outputs both with uncertainties. In the following, I have replaced the NN code by a much simpler model.

import numpy as np
import theano
floatX = theano.config.floatX
import pymc3 as pm

Xshape = (10,2)
X_train = np.random.randn(*Xshape).astype(floatX)
Y_train = np.random.randn(*Xshape).astype(floatX)
Xshared = theano.shared(X_train)
Yshared = theano.shared(Y_train)

with pm.Model() as model:

    x = pm.Normal('x', 
                  mu=0, 
                  sd=1., 
                  observed=Xshared, 
                  total_size=Xshape)

    theano.config.compute_test_value='off'
    
    # Neural-network here but for clarity replace with simple model
    nn_output = x+1
    
    y = pm.Normal('y',
                   mu=nn_output,
                   sd=1,
                   shape=Xshape,
                   observed=Yshared,
                   total_size=Xshape
                   )

I have looked at your blogpost: http://junpenglao.xyz/Blogs/posts/2017-10-23-OOS_missing.html If I copy the model presented here and replace X_train, Y_train with theano.shared I get the same issue

Xshape = (10,2)
Yshape = (10,1)
X_train = theano.shared(np.random.randn(*Xshape).astype(floatX))
Y_train = theano.shared(np.random.randn(*Yshape).astype(floatX))

# build model, fit, and check trace
with pm.Model() as model:
    alpha = pm.Normal('alpha', mu=0, sd=10)
    beta = pm.Normal('beta', mu=0, sd=10, shape=(2,))
     
    Xmu = pm.Normal('Xmu', mu=0, sd=10, shape=(2,))
    X_modeled = pm.Normal('X', mu=Xmu, sd=1., observed=X_train)
    
    mu = alpha + theano.tensor.dot(X_modeled, beta)
    sd = pm.HalfCauchy('sd', beta=10)
    y = pm.Normal('y', mu=mu, sd=sd, observed=Y_train)

junpenglao · March 26, 2018, 1:15pm

OK, I see what you are trying to do now. I think you should go for the approach similar to my blog post, as the intention is to apply a function on the latent “true” variable of the input (as otherwise you would just directly apply the function on the input, which in this case is doing nn_output = Xshared+1).

with pm.Model() as model:
    alpha = pm.Normal('alpha', mu=0, sd=10)
    beta = pm.Normal('beta', mu=0, sd=10, shape=(2,1))
     
    Xmu = pm.Normal('Xmu', mu=0, sd=10, shape=Xshape)
    X_modeled = pm.Normal('X', mu=Xmu, sd=1., observed=X_train)
    
    mu = alpha + theano.tensor.dot(Xmu, beta)
    sd = pm.HalfCauchy('sd', beta=10)
    y = pm.Normal('y', mu=mu, sd=sd, observed=Y_train)

simpleton · March 26, 2018, 5:28pm

OK thanks. I think I am understanding a little more how to specify my model. Taking the simple linear regression case again, I have generated some fake data with uncertainties in both x and y. If I then want to predict y values for new x values, do I do the following?

## Mock linear regression data -- uncertainties in both x and y = 0.1
Xshape = (10,2)
Yshape = (10,1)
X_train_err = 0.1*np.abs(np.random.randn(*Xshape).astype(floatX))
Y_train_err = 0.1*np.abs(np.random.randn(*Yshape).astype(floatX))

X_train = (np.random.normal(scale=X_train_err)+np.random.randn(*Xshape).astype(floatX))
Y_train = (np.random.normal(scale=Y_train_err)+np.dot(X_train,np.array([0.8,0.5]))+0.3).astype(floatX)

with pm.Model() as model:
    alpha = pm.Normal('alpha', mu=0, sd=10)
    beta = pm.Normal('beta', mu=0, sd=10, shape=(2,))
    
    Xmu = pm.Normal('Xmu', mu=0, sd=10, shape=Xshape)
    X_modeled = pm.Normal('X', mu=Xmu, sd=X_train_err, observed=X_train)
    
    mu = alpha + theano.tensor.dot(X_modeled, beta)
    y = pm.Normal('y', mu=mu, sd=Y_train_err, observed=Y_train)

with model:
    ftt = pm.fit(100000)
trace = ftt.sample(1000)

## New data
xTestErr = 0.1*np.abs(np.random.randn(*Xshape).astype(floatX))
xTestNErr = np.random.randn(*Xshape).astype(floatX)
xTest = np.random.normal(scale=xTestErr)+xTestNErr
with model:
    xTest2 = pm.Normal('xTest',mu=xTest,sd=xTestErr,shape=Xshape)
    mu = alpha + theano.tensor.dot(xTest2, beta)
    yTest = pm.Normal('yTest', mu=mu, sd=.01, observed=Y_train)
    TestModel = pm.sample_ppc(trace, vars=[yTest, xTest], samples=5000)

junpenglao · March 26, 2018, 5:40pm

I do that in the blog post because I assume there are missing data. If there is no missing data and you are just using a new input (eg from a testing set) to predict new output, then it is easier to use theano.shared and .set_value

simpleton · March 26, 2018, 6:31pm

But isn’t it slightly more complicated than that as each of the new data have their own associated latent variables?

junpenglao · March 26, 2018, 6:35pm

Oh yes you are right. In that case, what you are doing makes sense, with small edit:

with model:
    xTest2 = pm.Normal('xTest',mu=xTest,sd=xTestErr,shape=Xshape)
    mu = alpha + theano.tensor.dot(xTest2, beta)
    yTest = pm.Normal('yTest', mu=mu, sd=.01, shape=Y_train.shape)
    TestModel = pm.sample_ppc(trace, vars=[yTest, xTest2], samples=5000)

simpleton · March 27, 2018, 1:36pm

OK. Thanks for clarifying this. There was a mistake before in my ‘mu=…’ line which should contain Xmu not X_modeled. How would I then use the advi mini-batches with this model? I am unsure how we keep track of the correct ‘true’ x variables for each mini-batch. I have tried slicing Xmu but this isn’t producing the correct results.

Xshape = (100,2)
Yshape = (100,1)
X_train_err = 0.1*np.abs(np.random.randn(*Xshape).astype(floatX))
Y_train_err = 0.1*np.abs(np.random.randn(*Yshape).astype(floatX))
X_train = (np.random.normal(scale=X_train_err)+np.random.randn(*Xshape).astype(floatX))
Y_train = (np.random.normal(scale=Y_train_err)+np.dot(X_train,np.array([0.8,0.5]))+0.3).astype(floatX)
indx = np.arange(Xshape[0])

batch_size=10
X_train, Y_train = pm.Minibatch(X_train, batch_size=batch_size), pm.Minibatch(Y_train, batch_size=batch_size)
X_train_err, Y_train_err = pm.Minibatch(X_train_err, batch_size=batch_size), pm.Minibatch(Y_train_err, batch_size=batch_size)
indx = pm.Minibatch(indx, batch_size=batch_size)

# build model
with pm.Model() as model:
    alpha = pm.Normal('alpha', mu=0, sd=10,testval=.5)
    beta = pm.Normal('beta', mu=0, sd=10, shape=(2,),testval=np.array([0.8,0.5]))
    
    Xmu = pm.Normal('Xmu', mu=0, sd=10, shape=Xshape, total_size=Xshape)
    X_modeled = pm.Normal('X', mu=Xmu[indx], sd=X_train_err, 
                           observed=X_train, shape=Xshape, total_size=Xshape)
    mu = alpha + theano.tensor.dot(Xmu[indx], beta)
    y = pm.Normal('y', mu=mu, sd=Y_train_err, 
                  total_size=Yshape, observed=Y_train, shape=Yshape)

junpenglao · March 27, 2018, 1:54pm

The way you have written down make sense to me, but to make slicing more explicit, I will follow the docstring in the minibatch:

github.com

pymc-devs/pymc3/blob/7493d5b61eeff58120f0d0e8b6cfbc05556c565b/pymc3/data.py#L176-L184


To be more concrete about how we get minibatch, here is a demo
1) create shared variable 
>>> shared = theano.shared(data)


2) create random slice of size 10
>>> ridx = pm.tt_rng().uniform(size=(10,), low=0, high=data.shape[0]-1e-10).astype('int64')


3) take that slice
>>> minibatch = shared[ridx]

which gives something like:

with pm.Model() as model:
    alpha = pm.Normal('alpha', mu=0, sd=10,testval=.5)
    beta = pm.Normal('beta', mu=0, sd=10, shape=(2,),testval=np.array([0.8,0.5]))
    # the true latent variable, but with the total_size removed
    Xmu = pm.Normal('Xmu', mu=0, sd=10, shape=Xshape)
    X_modeled = pm.Normal('X', mu=Xmu[ridx], sd=X_train_err[ridx], 
                           observed=X_train[ridx], total_size=Xshape)
    mu = alpha + theano.tensor.dot(Xmu[ridx], beta)
    y = pm.Normal('y', mu=mu, sd=Y_train_err[ridx], 
                  total_size=Yshape, observed=Y_train[ridx])

simpleton · March 27, 2018, 2:10pm

OK, thanks for the quick response. You are very patient So something like this?

Xshape = (100,2)
Yshape = (100,1)
X_train_err = 0.1*np.abs(np.random.randn(*Xshape).astype(floatX))
Y_train_err = 0.1*np.abs(np.random.randn(*Yshape).astype(floatX))
sd = theano.shared(np.copy(Y_train_err))
X_train = (np.random.normal(scale=X_train_err)+np.random.randn(*Xshape).astype(floatX))
Y_train = (np.random.normal(scale=Y_train_err)+np.dot(X_train,np.array([0.8,0.5]))+0.3).astype(floatX)
indx = np.arange(Xshape[0])

batch_size=10
ridx = pm.tt_rng().uniform(size=(batch_size,), low=0, high=Xshape[0]-1e-10).astype('int64')
X_train = theano.shared(X_train)
X_train_err = theano.shared(X_train_err)
Y_train = theano.shared(Y_train)
Y_train_err = theano.shared(Y_train_err)

with pm.Model() as model:
    alpha = pm.Normal('alpha', mu=0, sd=10,testval=.5)
    beta = pm.Normal('beta', mu=0, sd=10, shape=(2,),testval=np.array([0.8,0.5]))
    # the true latent variable, but with the total_size removed
    Xmu = pm.Normal('Xmu', mu=0, sd=10, shape=Xshape)
    X_modeled = pm.Normal('X', mu=Xmu[ridx], 
                          sd=X_train_err[ridx], 
                          observed=X_train[ridx], 
                          total_size=Xshape)
    mu = alpha + theano.tensor.dot(Xmu[ridx], beta)
    y = pm.Normal('y', mu=mu, 
                  sd=Y_train_err[ridx], 
                  observed=Y_train[ridx], 
                  total_size=Yshape)

The final line has a input mismatch error as mu and Y_train[ridx] are not the same size (although they appear to be from the code…)

junpenglao · March 27, 2018, 2:31pm

hmmm Y_train.shape is 100 * 100
You should do

Y_train = (np.random.normal(scale=Y_train_err) +
           np.dot(X_train, np.array([[0.8, 0.5]]).T)+0.3).astype(floatX)
# and later on in the pm.Model block...
    y = pm.Normal('y', mu=mu, 
                  sd=tt.flatten(Y_train_err[ridx]), 
                  observed=tt.flatten(Y_train[ridx]), 
                  total_size=Yshape[0])

In general, you can check the shape of the input to make sure they are as intended by doing some thing like (tt.flatten(Y_train_err[ridx])).tag.test_value

Topic		Replies	Views
Shapes of shared variables Questions theano	3	981	June 29, 2018
Shape mismatch using theano shared variable Questions	20	1573	February 1, 2019
Shared theano in multiple regression Questions	12	1086	February 12, 2019
Is it possible to convert column from pandas to theano.shared? Questions	12	2407	September 24, 2021
Changing shape of shared variable Questions shape_issue	4	1527	December 4, 2018

Test_value shape errors with theano.shared

Related topics