Shape error when making out-of-sample predictions

Hi all,

I am having a dimension mismatch error when trying to perform out-of-sample prediction.

My code is the following:

import pymc as pm
import numpy as np

def feature_test(x,N,M):
    idx = np.arange(M)
    matrix  = np.tile(idx, (N,1))
    return pm.math.sin(x @ matrix)

class CPD:
    def __init__(self,X,y,sqrt_prior_covariance,R,sigma_noise):
        self.y = y
        self.sqrt_prior_covariance = sqrt_prior_covariance
        self.R = R
        self.sigma_noise = sigma_noise
        self.N = np.shape(X)[0]
        self.D = np.shape(X)[1]
        self.M = np.shape(sqrt_prior_covariance)[0]
        self.P = self.M*self.R*self.D
        self.CPD_model = pm.Model()

        with self.CPD_model:
            X = pm.MutableData ('X', X)
            # Define prior
            factors = np.empty(self.D, dtype=object)
            for d in range(self.D):
                factors[d] = (sqrt_prior_covariance / np.power(self.R,1/(2*self.D))) @ pm.Normal(f'factor_{d}',mu=0, sigma = 1, shape = (self.M,self.R))

            # Model output
            output = np.ones((self.N,self.R))
            for d in range(self.D):
                output = output*pm.math.matmul(feature_test(X[:,d],self.N,self.M),factors[d])
            output = pm.Deterministic('CPD_output',pm.math.sum(output,axis=1))

            # Likelihood (observed data)
            likelihood = pm.Normal('y', mu=output, sigma=self.sigma_noise, observed=self.y)

    def sample_prior(self,n_samples):
        with self.CPD_model:
            samples = pm.sample_prior_predictive(samples=n_samples,var_names=['CPD_output']).prior.CPD_output
        return samples
    
    def fit(self,n_samples):
        with self.CPD_model:
            self.trace = pm.sample(n_samples, tune=1000)

    def predict(self,X_test):
        with self.CPD_model:
            self.N = np.shape(X_test)[0]
            pm.set_data({"X": X_test})
            pm.sample_posterior_predictive(self.trace,predictions=True)

I get the error when I try to run:

N = 2
D = 3
X = np.random.normal(0,1, size=(N,D))
y = 10+np.random.normal(0,1, size=(N,))
M = 4
prior_cov = np.identity(M)

X_test = np.random.normal(0,1, size=(3,D))

cpd_model = CPD(X,y,prior_cov,4,0.1)

if __name__ == "__main__":
    cpd_model.fit(10) 
    print("FITTED")  
    cpd_model.predict(X_test)

How can I solve this? Do I have to get rid of the self.N variable? Thank you in advance.

Try to add shape=output.shape to the observed variable. You have to tell PyMC how the shape depends on the parameters if you want to only change the predictors.

Thanks, that is definitely needed, but not the source of the problem. The error I get is the following:

  File "C:\Users\LocalAdmin\anaconda3\envs\pymc_env\Lib\site-packages\pytensor\tensor\math.py", line 1945, in perform
    z[0] = np.asarray(np.dot(x, y))
                      ^^^^^^^^^^^^
ValueError: shapes (1,3) and (2,4) not aligned: 3 (dim 1) != 2 (dim 0)
Apply node that caused the error: dot(ExpandDims{axis=0}.0, [[0 1 2 3] ... [0 1 2 3]])
Toposort index: 15
Inputs types: [TensorType(float64, shape=(1, None)), TensorType(int32, shape=(2, 4))]
Inputs shapes: [(1, 3), (2, 4)]
Inputs strides: [(72, 24), (16, 4)]
Inputs values: [array([[-0.20188156, -0.36012242, -0.21725961]]), 'not shown']
Outputs clients: [[DropDims{axis=0}(dot.0)]]

Backtrace when the node is created (use PyTensor flag traceback__limit=N to make it longer):
  File "c:\Users\LocalAdmin\surfdrive\Code\TensorGP\debug_forum", line 63, in <module>
    cpd_model = CPD(X,y,prior_cov,4,0.1)
  File "c:\Users\LocalAdmin\surfdrive\Code\TensorGP\debug_forum", line 31, in __init__
    output = output*pm.math.matmul(feature_test(X[:,d],self.N,self.M),factors[d])
  File "c:\Users\LocalAdmin\surfdrive\Code\TensorGP\debug_forum", line 7, in feature_test
    return pm.math.sin(x @ matrix)
  File "c:\Users\LocalAdmin\surfdrive\Code\TensorGP\debug_forum", line 63, in <module>
    cpd_model = CPD(X,y,prior_cov,4,0.1)
  File "c:\Users\LocalAdmin\surfdrive\Code\TensorGP\debug_forum", line 31, in __init__
    output = output*pm.math.matmul(feature_test(X[:,d],self.N,self.M),factors[d])

HINT: Use the PyTensor flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.
PS C:\Users\LocalAdmin\surfdrive\Code\TensorGP> 

Basically when predicting, I need to change the shape of the output variable, but I am unable to update self.N. It should be trivial, but I am a beginner.

Shape issues can be anything but trivial. My suggestion is to trim down the model to the simplest form and slowly add complexity. That will help you pinpoint where the problems begin.

Another tool that is very useful is to sprinkle some print(x.eval().shape), where x is any intermediate model variable, to see if the variables are coming out with the shapes you expected.

The problems occurs only at prediction. I have no issues sampling from the prior, posterior, or intermediate random variables. This is because at prediction the shape of X changes, but since I allocate output based on self.N and R, I am unable to update inplace

output*pm.math.matmul(feature_test(X[:,d],self.N,self.M),factors[d])

as output and pm.math.matmul(feature_test(X[:,d],self.N,self.M),factors[d]) have different shapes: output is self.N x R, while pm.math.matmul(feature_test(X[:,d],self.N,self.M),factors[d]) is new_shape_of_X x R. All I need is to correctly update self.N which is the shape of ‘X’ when predicting. But I don’t know what the best practice is to do so.

Your model is rather unusual in it’s definition. Without giving any feedback on whether it makes sense to write it down like that, you would need to make self.N and self.M, both pm.Data variables as well, which you update when doing predictions.

I tried in a simplified scenario:

import pymc as pm
import numpy as np

N = 2
N_test = 3
D = 4

X = np.random.normal(0,1, size=(N,D))
y = np.random.normal(0,1, size=(N,))
X_test = np.random.normal(0,1, size=(N_test,D))

model = pm.Model()
with model:
    X = pm.MutableData ('X', X)
    N = pm.MutableData('N',N)
    N_int = np.int64(N.get_value())
    factors = pm.Normal('factor',mu=0, sigma = 1, shape = (N_int,))
    output = np.ones((N_int,))*factors
    output = pm.Deterministic('output',output)
    likelihood = pm.Normal('y', mu=output, sigma=1, observed=y, shape=output.shape)
    
    if __name__ == "__main__":
        trace = pm.sample()
        pm.set_data({'X': X_test, 'N':np.shape(X_test)[0]})
        samples = pm.sample_posterior_predictive(trace,predictions=True)
        print(samples.predictions)

Print of samples.predictions looks like

Dimensions:  (chain: 4, draw: 1000, y_dim_2: 2)
Coordinates:
  * chain    (chain) int32 0 1 2 3
  * draw     (draw) int32 0 1 2 3 4 5 6 7 8 ... 992 993 994 995 996 997 998 999
  * y_dim_2  (y_dim_2) int32 0 1
Data variables:
    y        (chain, draw, y_dim_2) float64 1.394 -2.583 ... -1.247 -0.0293

which is the wrong shape (it should be 3 not 2). The problem seems to be that I can’t make predictions with a model which is an explicit function of the shape of the data.

Replace N_int = np.int64(N.get_value()) by N_int = N.astype("int")

And output = np.ones((N_int,))*factors to output = pm.math.ones((N_int,)) * factors

Can’t you pass dtype="int" directly to pm.MutableData to avoid having to convert it?

1 Like

Do note that in your simplified model, when you change N for predictions you will loose all the information you gained from sampling the posterior of factors, as PyMC will also resample that variable (in this case resampling will be the same as sampling it from the prior)

When a PyMC variable depends on a MutableData that has changed in value between posterior sampling and sample posterior predictive, it assumes what none of what was learned is still valid.

I assume you don’t want to actually resize factors but only output? In that case you should leave factors with a fixed shape

1 Like

This works! However the main problem is a bit more tricky. In the feature_test function, I am using N in np.tile, so I would need N.eval(). I still thus get problems when performing output = output*pm.math.matmul(feature_test(X[:,d],self.N,self.M),factors[d]).

I could avoid this whole mess if I could write the whole feature_test in pytensor without ever calling .eval()?

There was a bug (unrelated to the problems I mentioned) in feature_test.

This is the “new” version of it.

def feature_test_2(x,N,M):
    idx = np.arange(M)
    matrix  = np.tile(idx, (N.eval(),1))
    a = x[:, np.newaxis]*matrix
    return pm.math.sin(a)

I recon that if I can write the first two lines of it in pytensor, all problems would be solved? All the variables would be symbolic and there would be no need to carry ‘N’ around. I don’t know how to do that though without using numpy, which forces to use N.eval()

@ricardoV94 I managed! Did not know about the basic module.

from pytensor.tensor import basic
def feature_test_2(x,M):
    idx = np.arange(M)
    matrix  = basic.tile(idx, (x.shape[0],1))
    a = x[:, np.newaxis]*matrix
    return pm.math.sin(a)

Thanks a lot for the help!! Can I offer you a coffe donation or so? I have one last problem:


class TT:
    def __init__(self,X,y,sqrt_prior_covariance,R,sigma_noise):
        self.X = X
        self.y = y
        self.sqrt_prior_covariance = sqrt_prior_covariance
        self.R = R
        self.sigma_noise = sigma_noise
        self.N = np.shape(X)[0]
        self.D = np.shape(X)[1]
        self.M = np.shape(sqrt_prior_covariance)[0]
        self.P = self.M* np.sum(self.R[:-1] * self.R[1:])
        self.TT_model = pm.Model()

        with self.TT_model:
            A = pm.MutableData('N',X.shape[0])
            B = A.astype("int")
            X = pm.MutableData ('X', X)
            # Define prior
            cores = np.empty(self.D, dtype=object)
            for d in range(self.D):
                cores[d] = (sqrt_prior_covariance / np.sqrt(np.sqrt(R[d]*R[d+1]))) @ pm.Normal(f'factor_{d}',mu=0, sigma = 1, shape = (self.M,self.R[d]*self.R[d+1]))

            # Model output
            output = []
            for n in range(B.eval()):
                temp = pm.math.matmul(feature_TT(X[n,0],self.M),cores[0]).reshape((self.R[0],self.R[1]))
                for d in range(1,self.D,1):
                    temp = temp @ pm.math.matmul(feature_TT(X[n,d],self.M),cores[d]).reshape((self.R[d],self.R[d+1]))
                output = pm.math.concatenate([output,temp.flatten()])
            output = pm.Deterministic('TT_output',output)

            # Likelihood (observed data)
            likelihood = pm.Normal('likelihood', mu=output, sigma=self.sigma_noise, observed=self.y,shape=output.shape)
    
    def fit(self,n_samples):
        with self.TT_model:
            self.trace = pm.sample(n_samples, tune=1000)

    def predict(self,X_test):
        with self.TT_model:
            self.N = np.shape(X_test)[0]
            pm.set_data({"X": X_test,"N":self.N})
            return pm.sample_posterior_predictive(self.trace,predictions=True)

I essentially have the same issue as before. However here it appears just because I have the for loop over the whole dataset X

     for n in range(B.eval()):
                temp = pm.math.matmul(feature_TT(X[n,0],self.M),cores[0]).reshape((self.R[0],self.R[1]))
                for d in range(1,self.D,1):
                    temp = temp @ pm.math.matmul(feature_TT(X[n,d],self.M),cores[d]).reshape((self.R[d],self.R[d+1]))
                output = pm.math.concatenate([output,temp.flatten()])
            output = pm.Deterministic('TT_output',output)

This causes me the same problem as before.
The for loop over the dataset is necessary as there is no (easy) way to vectorize the operations.

In general you never want .eval, those are just for debugging and should never be in a final model.

PyTensor has utilities to do loops with symbolic expressions which you can read about here: scan – Looping in PyTensor — PyTensor dev documentation

In general PyTensor has alternatives for most numpy functions with the same name that you should use when working with PyMC models.

The conventional import is something like import pytensor.tensor as pt and then wherever you would usually write np.foo(...) you instead do pt.foo(...)

1 Like

Thanks. I am trying to use the scan function as follows:

            cores = np.empty(self.D, dtype=object)
            for d in range(self.D):
                cores[d] = (sqrt_prior_covariance / np.sqrt(np.sqrt(R[d]*R[d+1]))) @ pm.Normal(f'factor_{d}',mu=0, sigma = 1, shape = (self.M,self.R[d]*self.R[d+1]))

            # Model output
            def tt_loop(D, X, M, cores, R):
                temp = pm.math.matmul(feature_TT(X[0], M), cores[0]).reshape((R[0], R[1]))
                for d in range(1, D, 1):
                    temp = temp @ pm.math.matmul(feature_TT(X[d], M), cores[d]).reshape((R[d], R[d + 1]))
                return temp.flatten()

            result, _ = pytensor.scan(fn=tt_loop,
                        sequences=[X],
                        outputs_info=None,
                        non_sequences=[self.D, self.M, cores, self.R])

but I am running into some problems due to how cores is defined, namely as a numpy object array (in fact I get TypeError: Unsupported dtype for TensorType: object). What is the best practice in this case?