Test_value shape errors with theano.shared

Hi,

I am running into problems when I use theano.shared variables. I have simplified the problem down to a minimal example below. When I replace the ‘observed’ variable with a shared object and operate on the pymc3 distribution, I get the error listed below in the final line.

Any help would be much appreciated.

=======================

import numpy as np
import theano
floatX = theano.config.floatX
import pymc3 as pm

Xshape = (10,2)
X_train = np.random.randn(*Xshape).astype(floatX)
ann_input = theano.shared(X_train)

with pm.Model() as model:
    xTrain = pm.Normal('xTrain', mu=0, sd=1., 
                       observed=X_train, 
                       total_size=(Xshape))

    # Build neural-network using tanh activation function
    act_1 = xTrain + 1
    
    xTrain2 = pm.Normal('xTrain2', mu=0, sd=1., 
                       observed=ann_input, 
                       total_size=(Xshape))

    # Build neural-network using tanh activation function
    act_1 = xTrain2 + 1

=======================

TypeError: For compute_test_value, one input test value does not have the requested type.

The error when converting the test value to that variable type:
Wrong number of dimensions: expected 1, got 2 with shape (10, 2).

This is a weird one, I have no idea how to debug it.
For now, a workaround is to turn off the test_value before the last statement:

with pm.Model() as model:
    ...
    theano.config.compute_test_value='off'
    act_1 = xTrain2 + 1

Otherwise, you can set compute_test_value in model’s theano_config argument like

with pm.Model(theano_config={'compute_test_value': 'off'}) as model:
  ...

But it wont work with

xTrain = pm.Normal('xTrain', mu=0, sd=1., 
                       observed=X_train, 
                       total_size=(Xshape))

which means you need to set all nparray input to theano.shared

OK. Thanks for the quick response. The first method definitely fixes the issue I presented. The problem is I then want to use ‘act_1’ in a likelihood, for instance like:

yTrue = pm.Normal('yTrue', mu=act_1, sd=1, shape=Xshape)

which returns

AttributeError: 'scratchpad' object has no attribute 'test_value'

Passing yTrue testval=Xtrain gives

ValueError: Cannot compute test value: input 0 (Elemwise{add,no_inplace}.0) of Op InplaceDimShuffle{x,0}(Elemwise{add,no_inplace}.0) missing default value.  

where it is complaining about ‘act_2=xTrain2+1’ line. Also, I don’t understand how to get the second method to work. If I run

with pm.Model(theano_config={'compute_test_value': 'off'}) as model:    
    xTrain2 = pm.Normal('xTrain2', mu=0, sd=1., 
                   observed=ann_input, 
                   total_size=(Xshape))
    act_2 = xTrain2+1

I just get

AttributeError: 'scratchpad' object has no attribute 'test_value'

with or without explicitly passing a testval.

Could you please share a more complete code (ideally with data)?

I can provide a much fuller code but I have tried to remove extraneous stuff to focus on the problem I am having. I am trying to construct a neural network for a set of inputs and outputs both with uncertainties. In the following, I have replaced the NN code by a much simpler model.

import numpy as np
import theano
floatX = theano.config.floatX
import pymc3 as pm

Xshape = (10,2)
X_train = np.random.randn(*Xshape).astype(floatX)
Y_train = np.random.randn(*Xshape).astype(floatX)
Xshared = theano.shared(X_train)
Yshared = theano.shared(Y_train)

with pm.Model() as model:

    x = pm.Normal('x', 
                  mu=0, 
                  sd=1., 
                  observed=Xshared, 
                  total_size=Xshape)

    theano.config.compute_test_value='off'
    
    # Neural-network here but for clarity replace with simple model
    nn_output = x+1
    
    y = pm.Normal('y',
                   mu=nn_output,
                   sd=1,
                   shape=Xshape,
                   observed=Yshared,
                   total_size=Xshape
                   )

I have looked at your blogpost: http://junpenglao.xyz/Blogs/posts/2017-10-23-OOS_missing.html If I copy the model presented here and replace X_train, Y_train with theano.shared I get the same issue

Xshape = (10,2)
Yshape = (10,1)
X_train = theano.shared(np.random.randn(*Xshape).astype(floatX))
Y_train = theano.shared(np.random.randn(*Yshape).astype(floatX))

# build model, fit, and check trace
with pm.Model() as model:
    alpha = pm.Normal('alpha', mu=0, sd=10)
    beta = pm.Normal('beta', mu=0, sd=10, shape=(2,))
     
    Xmu = pm.Normal('Xmu', mu=0, sd=10, shape=(2,))
    X_modeled = pm.Normal('X', mu=Xmu, sd=1., observed=X_train)
    
    mu = alpha + theano.tensor.dot(X_modeled, beta)
    sd = pm.HalfCauchy('sd', beta=10)
    y = pm.Normal('y', mu=mu, sd=sd, observed=Y_train)

OK, I see what you are trying to do now. I think you should go for the approach similar to my blog post, as the intention is to apply a function on the latent “true” variable of the input (as otherwise you would just directly apply the function on the input, which in this case is doing nn_output = Xshared+1).

with pm.Model() as model:
    alpha = pm.Normal('alpha', mu=0, sd=10)
    beta = pm.Normal('beta', mu=0, sd=10, shape=(2,1))
     
    Xmu = pm.Normal('Xmu', mu=0, sd=10, shape=Xshape)
    X_modeled = pm.Normal('X', mu=Xmu, sd=1., observed=X_train)
    
    mu = alpha + theano.tensor.dot(Xmu, beta)
    sd = pm.HalfCauchy('sd', beta=10)
    y = pm.Normal('y', mu=mu, sd=sd, observed=Y_train)

OK thanks. I think I am understanding a little more how to specify my model. Taking the simple linear regression case again, I have generated some fake data with uncertainties in both x and y. If I then want to predict y values for new x values, do I do the following?

## Mock linear regression data -- uncertainties in both x and y = 0.1
Xshape = (10,2)
Yshape = (10,1)
X_train_err = 0.1*np.abs(np.random.randn(*Xshape).astype(floatX))
Y_train_err = 0.1*np.abs(np.random.randn(*Yshape).astype(floatX))

X_train = (np.random.normal(scale=X_train_err)+np.random.randn(*Xshape).astype(floatX))
Y_train = (np.random.normal(scale=Y_train_err)+np.dot(X_train,np.array([0.8,0.5]))+0.3).astype(floatX)

with pm.Model() as model:
    alpha = pm.Normal('alpha', mu=0, sd=10)
    beta = pm.Normal('beta', mu=0, sd=10, shape=(2,))
    
    Xmu = pm.Normal('Xmu', mu=0, sd=10, shape=Xshape)
    X_modeled = pm.Normal('X', mu=Xmu, sd=X_train_err, observed=X_train)
    
    mu = alpha + theano.tensor.dot(X_modeled, beta)
    y = pm.Normal('y', mu=mu, sd=Y_train_err, observed=Y_train)

with model:
    ftt = pm.fit(100000)
trace = ftt.sample(1000)

## New data
xTestErr = 0.1*np.abs(np.random.randn(*Xshape).astype(floatX))
xTestNErr = np.random.randn(*Xshape).astype(floatX)
xTest = np.random.normal(scale=xTestErr)+xTestNErr
with model:
    xTest2 = pm.Normal('xTest',mu=xTest,sd=xTestErr,shape=Xshape)
    mu = alpha + theano.tensor.dot(xTest2, beta)
    yTest = pm.Normal('yTest', mu=mu, sd=.01, observed=Y_train)
    TestModel = pm.sample_ppc(trace, vars=[yTest, xTest], samples=5000)

I do that in the blog post because I assume there are missing data. If there is no missing data and you are just using a new input (eg from a testing set) to predict new output, then it is easier to use theano.shared and .set_value

But isn’t it slightly more complicated than that as each of the new data have their own associated latent variables?

Oh yes you are right. In that case, what you are doing makes sense, with small edit:

with model:
    xTest2 = pm.Normal('xTest',mu=xTest,sd=xTestErr,shape=Xshape)
    mu = alpha + theano.tensor.dot(xTest2, beta)
    yTest = pm.Normal('yTest', mu=mu, sd=.01, shape=Y_train.shape)
    TestModel = pm.sample_ppc(trace, vars=[yTest, xTest2], samples=5000)

OK. Thanks for clarifying this. There was a mistake before in my ‘mu=…’ line which should contain Xmu not X_modeled. How would I then use the advi mini-batches with this model? I am unsure how we keep track of the correct ‘true’ x variables for each mini-batch. I have tried slicing Xmu but this isn’t producing the correct results.

Xshape = (100,2)
Yshape = (100,1)
X_train_err = 0.1*np.abs(np.random.randn(*Xshape).astype(floatX))
Y_train_err = 0.1*np.abs(np.random.randn(*Yshape).astype(floatX))
X_train = (np.random.normal(scale=X_train_err)+np.random.randn(*Xshape).astype(floatX))
Y_train = (np.random.normal(scale=Y_train_err)+np.dot(X_train,np.array([0.8,0.5]))+0.3).astype(floatX)
indx = np.arange(Xshape[0])

batch_size=10
X_train, Y_train = pm.Minibatch(X_train, batch_size=batch_size), pm.Minibatch(Y_train, batch_size=batch_size)
X_train_err, Y_train_err = pm.Minibatch(X_train_err, batch_size=batch_size), pm.Minibatch(Y_train_err, batch_size=batch_size)
indx = pm.Minibatch(indx, batch_size=batch_size)

# build model
with pm.Model() as model:
    alpha = pm.Normal('alpha', mu=0, sd=10,testval=.5)
    beta = pm.Normal('beta', mu=0, sd=10, shape=(2,),testval=np.array([0.8,0.5]))
    
    Xmu = pm.Normal('Xmu', mu=0, sd=10, shape=Xshape, total_size=Xshape)
    X_modeled = pm.Normal('X', mu=Xmu[indx], sd=X_train_err, 
                           observed=X_train, shape=Xshape, total_size=Xshape)
    mu = alpha + theano.tensor.dot(Xmu[indx], beta)
    y = pm.Normal('y', mu=mu, sd=Y_train_err, 
                  total_size=Yshape, observed=Y_train, shape=Yshape)

The way you have written down make sense to me, but to make slicing more explicit, I will follow the docstring in the minibatch:

which gives something like:

with pm.Model() as model:
    alpha = pm.Normal('alpha', mu=0, sd=10,testval=.5)
    beta = pm.Normal('beta', mu=0, sd=10, shape=(2,),testval=np.array([0.8,0.5]))
    # the true latent variable, but with the total_size removed
    Xmu = pm.Normal('Xmu', mu=0, sd=10, shape=Xshape)
    X_modeled = pm.Normal('X', mu=Xmu[ridx], sd=X_train_err[ridx], 
                           observed=X_train[ridx], total_size=Xshape)
    mu = alpha + theano.tensor.dot(Xmu[ridx], beta)
    y = pm.Normal('y', mu=mu, sd=Y_train_err[ridx], 
                  total_size=Yshape, observed=Y_train[ridx])

OK, thanks for the quick response. You are very patient :slight_smile: So something like this?

Xshape = (100,2)
Yshape = (100,1)
X_train_err = 0.1*np.abs(np.random.randn(*Xshape).astype(floatX))
Y_train_err = 0.1*np.abs(np.random.randn(*Yshape).astype(floatX))
sd = theano.shared(np.copy(Y_train_err))
X_train = (np.random.normal(scale=X_train_err)+np.random.randn(*Xshape).astype(floatX))
Y_train = (np.random.normal(scale=Y_train_err)+np.dot(X_train,np.array([0.8,0.5]))+0.3).astype(floatX)
indx = np.arange(Xshape[0])

batch_size=10
ridx = pm.tt_rng().uniform(size=(batch_size,), low=0, high=Xshape[0]-1e-10).astype('int64')
X_train = theano.shared(X_train)
X_train_err = theano.shared(X_train_err)
Y_train = theano.shared(Y_train)
Y_train_err = theano.shared(Y_train_err)

with pm.Model() as model:
    alpha = pm.Normal('alpha', mu=0, sd=10,testval=.5)
    beta = pm.Normal('beta', mu=0, sd=10, shape=(2,),testval=np.array([0.8,0.5]))
    # the true latent variable, but with the total_size removed
    Xmu = pm.Normal('Xmu', mu=0, sd=10, shape=Xshape)
    X_modeled = pm.Normal('X', mu=Xmu[ridx], 
                          sd=X_train_err[ridx], 
                          observed=X_train[ridx], 
                          total_size=Xshape)
    mu = alpha + theano.tensor.dot(Xmu[ridx], beta)
    y = pm.Normal('y', mu=mu, 
                  sd=Y_train_err[ridx], 
                  observed=Y_train[ridx], 
                  total_size=Yshape)

The final line has a input mismatch error as mu and Y_train[ridx] are not the same size (although they appear to be from the code…)

hmmm Y_train.shape is 100 * 100
You should do

Y_train = (np.random.normal(scale=Y_train_err) +
           np.dot(X_train, np.array([[0.8, 0.5]]).T)+0.3).astype(floatX)
# and later on in the pm.Model block...
    y = pm.Normal('y', mu=mu, 
                  sd=tt.flatten(Y_train_err[ridx]), 
                  observed=tt.flatten(Y_train[ridx]), 
                  total_size=Yshape[0])

In general, you can check the shape of the input to make sure they are as intended by doing some thing like (tt.flatten(Y_train_err[ridx])).tag.test_value