Dimension error for Hierarchical model

I am trying to understand how to properly use the ‘dims’ and ‘shape’ in the hierarchical modeling.
I am attempting to build a Hierarchical marketing mix model for 10 promotional channels, 4 control variables across 5 regions throughout a time period of 24 months.
Here is sample code to mimic how I wrote it:

import pymc as pm
import aesara as at
import pandas as pd
import numpy as np

#generate some random data
num_channels= 10
num_cont_vars= 4
num_months= 24
num_regions= 5

dict_1= {
    'Region': np.repeat(np.arange(num_regions), num_weeks),
    'Week': np.repeat(np.arange(num_weeks), num_channels),
    'Sales': np.random.randint(low= 10, high= 100, size= num_weeks*num_regions),
}
dict_2= {f'Channel_{col}': np.random.randint(low=1, high=20, size=(num_weeks*num_regions)) for col in range(num_channels)}
dict_3= {f'cont_var{cont}': np.random.random(size= (num_weeks * num_regions)) for cont in range(num_cont_vars)}
dict_1.update(dict_2)
dict_1.update(dict_3)
df= pd.DataFrame(dict_1)

#defining adstock and saturation functions
def geometric_adstock(x, alpha: float, l: int):
    """Geometric adstock transformation."""
    cycles = [
        at.concatenate(
            [at.zeros(i), x[: x.shape[0] - i]]
        )
        for i in range(l)
    ]
    x_cycle = at.stack(cycles)
    w = at.as_tensor_variable([at.power(alpha, i) for i in range(l)])
    return at.dot(w, x_cycle)


def logistic_saturation(x, lam: float):
    """Logistic saturation transformation."""
    return (1 - at.exp(-lam * x)) / (1 + at.exp(-lam * x)

# Building the model:

coords= {'region' : list(df['Region'].unique())}
with pm.Model(coords= coords) as model:

#defining the hyperpriors
adstock_a= pm.Exponential('adstock_a', lam=10.0)
adstock_b= pm.Exponential('adstock_b', lam=10.0)
sat_lam_lam= pm.Exponential('sat_lam', lam= 10.0)
channel_beta_lam= pm.Exponential('coef_lam', lam=10.0)
cont_beta_mu= pm.Normal('cont_lam', mu=0, sigma=10.0)
cont_beta_sigma= pm.HalfCauchy('cont_sig', beta= 10.0)

#then I loop over the marketing channels and calculate their priors and add them to the response list

response_lst= []
for channel in [channel for channel in df.columns if channel.startswith('Channel')]:
        x= df[channel].values
       adstock_param= pm.Beta(f'{channel_name}_adstock', alpha= adstock_a, beta= 
             adstock_b, dims= 'Regions' )
        sat_lam= pm.Exponential(f'{channel_name}_sat',
                                       lam= sat_lam_lam,
                                       dims= 'Region') 
        channel_b = pm.Exponential(f"{channel_name}_coef",
                                   lam=chan_beta_lam,
                                   dims= 'Region')

        x_new = geometric_adstock(x, adstock_param, 6)            
        saturation_tensor = logistic_saturation(x_new, saturation_lam )  

        channel_contribution= pm.Deterministic(
            f'contribution_{channel_name}',
            channel_b * saturation_tensor)  

        response_mean.append(channel_contribution)   

    for control_var in [var for var in df.columns if var.startswith('cont_var')]:
        x = df[control_var].values
        control_beta = pm.Normal(f"{control_var}_control_coef",
                                 mu=cont_beta_mu,
                                 sigma=cont_beta_sigma,
                                 dims= 'Region')
        control_x = control_beta * x
        response_mean.append(control_x)\


    intercept= pm.Exponential('intercept', lam= 0.1)   


    sigma = pm.Exponential('sigma', lam=0.01)

    likelihood = pm.Normal("outcome", mu = sum(response_mean) + intercept , sigma = sigma,
                           observed = data_transformed[target].values,
                           dims= 'Region'
                          )               

trace= pm.sample(1000, tune= 500)

#running it give me the below error
ValueError: Shape mismatch:

Here my questions are:
1. Am I using the dims args wrong?
2. should I be adding the dims argument to the likelihood equation?
3. what are the use cases of shape?

I am using
pymc v4.4.0
linux system (not sure its specs, its the company VDI)

would appreciate any help explaining this to me…and sorry for the long code, wanted to be as clear as possible

Welcome!

Can you provide the full error message?

@cluhmann This the error message i am getting below

ValueError: Shapes on dimension 1 do not match: (5, 120)

and this the verbose form… its actually longer but more of a repetition of the same error below.

"name": "ValueError",
	"message": "Shapes on dimension 1 do not match: (5, 120)\nApply node that caused the error: Elemwise{mul,no_inplace}(InplaceDimShuffle{x,0}.0, TensorConstant{[[0.002840..67640458]]})\nToposort index: 45\nInputs types: [TensorType(float64, (1, None)), TensorType(float64, (1, 120))]\nInputs shapes: [(1, 5), (1, 120)]\nInputs strides: [(40, 8), (960, 8)]\nInputs values: [array([[-0.07151783, -0.70122294,  0.62348847,  0.32553264,  0.52987376]]), 'not shown']\nInputs type_num: [12, 12]\nOutputs clients: [[Elemwise{Composite{Switch(i0, ((i1 + (i2 * sqr(((i3 - ((i4 * sigmoid((-(i5 * i6))) * (i7 - exp((i5 * i6)))) + (i8 * sigmoid((-i9)) * (i7 - exp(i9))) + (i10 * sigmoid((-i11)) * (i7 - exp(i11))) + (i12 * sigmoid((-i13)) * (i7 - exp(i13))) + (i14 * sigmoid((-i15)) * (i7 - exp(i15))) + (i16 * sigmoid((-i17)) * (i7 - exp(i17))) + (i18 * sigmoid((-i19)) * (i7 - exp(i19))) + i20 + i21 + i22 + i23 + i24 + i25 + i26 + i27)) / i28)))) - i29), i30)}}(Elemwise{gt,no_inplace}.0, TensorConstant{(1, 1) of ..5332046727}, TensorConstant{(1, 1) of -0.5}, TensorConstant{[[27. 42. .. 99. 76.]]}, InplaceDimShuffle{x,0}.0, Dot22Scalar.0, InplaceDimShuffle{x,0}.0, TensorConstant{(1, 1) of 1.0}, InplaceDimShuffle{x,0}.0, Elemwise{Mul}[(0, 0)].0, InplaceDimShuffle{x,0}.0, Elemwise{Mul}[(0, 0)].0, InplaceDimShuffle{x,0}.0, Elemwise{Mul}[(0, 0)].0, InplaceDimShuffle{x,0}.0, Elemwise{Mul}[(0, 0)].0, InplaceDimShuffle{x,0}.0, Elemwise{Mul}[(0, 0)].0, InplaceDimShuffle{x,0}.0, Elemwise{Mul}[(0, 0)].0, Elemwise{Composite{(i0 * sigmoid((-(i1 * i2))) * (i3 - exp((i1 * i2))))}}[(0, 1)].0, Elemwise{Composite{(i0 * sigmoid((-(i1 * i2))) * (i3 - exp((i1 * i2))))}}[(0, 1)].0, Elemwise{Composite{(i0 * sigmoid((-(i1 * i2))) * (i3 - exp((i1 * i2))))}}[(0, 1)].0, Elemwise{mul,no_inplace}.0, Elemwise{mul,no_inplace}.0, Elemwise{mul,no_inplace}.0, Elemwise{mul,no_inplace}.0, InplaceDimShuffle{x,x}.0, InplaceDimShuffle{x,x}.0, Elemwise{log,no_inplace}.0, TensorConstant{(1, 1) of -inf})]]\n\nBacktrace when the node is created (use Aesara flag traceback__limit=N to make it longer)

updating code script as there are some bugs when I was anonymizing it.

import pymc as pm
import aesara
import aesara.tensor as at
import pandas as pd
import numpy as np

#generate some random data
num_channels= 10
num_cont_vars= 4
num_weeks= 24
num_regions= 5
target= 'Sales'

dict_1= {
    'Region': np.repeat(np.arange(num_regions), num_weeks ),
    'Week': np.repeat(np.arange(num_weeks), num_regions),
    'Sales': np.random.randint(low= 10, high= 100, size= num_weeks*num_regions),
}
dict_2= {f'Channel_{col}': np.random.randint(low=1, high=20, size=(num_weeks*num_regions)) for col in range(num_channels)}
dict_3= {f'cont_var{cont}': np.random.random(size= (num_weeks * num_regions)) for cont in range(num_cont_vars)}
dict_1.update(dict_2)
dict_1.update(dict_3)
df= pd.DataFrame(dict_1)

#defining adstock and saturation functions
def geometric_adstock(x, alpha: float, l: int):
    """Geometric adstock transformation."""
    cycles = [
        at.concatenate(
            [at.zeros(i), x[: x.shape[0] - i]]
        )
        for i in range(l)
    ]
    x_cycle = at.stack(cycles)
    w = at.as_tensor_variable([at.power(alpha, i) for i in range(l)])
    return at.dot(w, x_cycle)


def logistic_saturation(x, lam: float):
    """Logistic saturation transformation."""
    return (1 - at.exp(-lam * x)) / (1 + at.exp(-lam * x))


# Building the model:

coords= {'Region' : list(df['Region'].unique())}
with pm.Model(coords= coords) as model:

    #defining the hyperpriors
    adstock_a= pm.Exponential('adstock_a', lam=10.0)
    adstock_b= pm.Exponential('adstock_b', lam=10.0)
    sat_lam_lam= pm.Exponential('sat_lam', lam= 10.0)
    channel_beta_lam= pm.Exponential('coef_lam', lam=10.0)
    cont_beta_mu= pm.Normal('cont_lam', mu=0, sigma=10.0)
    cont_beta_sigma= pm.HalfCauchy('cont_sig', beta= 10.0)

    #then I loop over the marketing channels and calculate their priors and add them to the response list

    response_lst= []
    for channel in [channel for channel in df.columns if channel.startswith('Channel')]:
        x= df[channel].values
        adstock_param= pm.Beta(f'{channel}_adstock', alpha= adstock_a, beta= 
                adstock_b, dims= 'Region' )
        sat_lam= pm.Exponential(f'{channel}_sat',
                                        lam= sat_lam_lam,
                                        dims= 'Region') 
        channel_b = pm.Exponential(f"{channel}_coef",
                                    lam=channel_beta_lam,
                                    dims= 'Region')

        x_new = geometric_adstock(x, adstock_param, 6)            
        saturation_tensor = logistic_saturation(x_new, sat_lam )  

        channel_contribution= pm.Deterministic(
                f'contribution_{channel}',
                channel_b * saturation_tensor)  

        response_lst.append(channel_contribution)   

    for control_var in [var for var in df.columns if var.startswith('cont_var')]:
        x = df[control_var].values
        control_beta = pm.Normal(f"{control_var}_control_coef",
                                    mu=cont_beta_mu,
                                    sigma=cont_beta_sigma,
                                    dims= 'Region')
        control_x = control_beta * x
        response_lst.append(control_x)\


    intercept= pm.Exponential('intercept', lam= 0.1)   


    sigma = pm.Exponential('sigma', lam=0.01)

    likelihood = pm.Normal("outcome", mu = sum(response_lst) + intercept , sigma = sigma,
                            observed = df[target].values,
                            dims= 'Region'
                            )               

    trace= pm.sample(1000, tune= 500)

The issue seems to be this line:

control_x = control_beta * x

because control_beta.shape==5 (one per region) and x.shape==(120,).

@cluhmann Thank you for clarifying it. so how can I fix it to give me the shape 120 but also the dimensions of 5 per region?
do I use

control_beta = pm.Normal(f"{control_var}_control_coef",
                                    mu=cont_beta_mu,
                                    sigma=cont_beta_sigma,
                                    dims= 'Region', 
                                    shape= (5,120)
)

instead of

control_beta = pm.Normal(f"{control_var}_control_coef",
                                    mu=cont_beta_mu,
                                    sigma=cont_beta_sigma,
                                    dims= 'Region')

And I am wondering why that error didn’t occur in the channel_b:

channel_b = pm.Exponential(f"{channel}_coef",
                                    lam=channel_beta_lam,
                                    dims= 'Region')

which is also multiplied by the saturation tensor

channel_contribution= pm.Deterministic(
                f'contribution_{channel}',
                channel_b * saturation_tensor)  

appreciate your support to help me understand this.

thanks :slight_smile:

You can rely on numpy broadcasting semantics:

import numpy as np

x = np.zeros(120)
y = np.zeros(5)
z1 = x[:, None] + y
z2 = x + y[:, None]

assert z1.shape == (120, 5)
assert z2.shape == (5, 120)

Works the same with PyMC variables

Thank you @ricardoV94 for your proposed solution. I tried it but sadly still getting errors.

would the built in ‘shape’ args resolve this? like can I use it as below:


 for control_var in [var for var in df.columns if var.startswith('cont_var')]:
        x = df[control_var].values
        control_beta = pm.Normal(f"{control_var}_control_coef",
                                    mu=cont_beta_mu,
                                    sigma=cont_beta_sigma,
                                    dims= 'Region',
                            -->   shape= (120,)
                                    )
        
        control_x = control_beta * x
        print(control_x.shape)

The distribution shape as to agree with the input shapes. See Distribution Dimensionality — PyMC 5.2.0 documentation for more details

1 Like

@ricardoV94 This is my error I get when I tried your proposed solution

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File c:\Users\pymc_hier\.venv\lib\site-packages\aesara\link\vm.py:414, in Loop.__call__(self)
    411 for thunk, node, old_storage in zip_longest(
    412     self.thunks, self.nodes, self.post_thunk_clear, fillvalue=()
    413 ):
--> 414     thunk()
    415     for old_s in old_storage:

File c:\Users\pymc_hier\.venv\lib\site-packages\aesara\graph\op.py:543, in Op.make_py_thunk..rval(p, i, o, n, params)
    539 @is_thunk_type
    540 def rval(
    541     p=p, i=node_input_storage, o=node_output_storage, n=node, params=None
    542 ):
--> 543     r = p(n, [x[0] for x in i], o)
    544     for o in node.outputs:

File c:\Users\pymc_hier\.venv\lib\site-packages\aesara\tensor\blas.py:1997, in Dot22Scalar.perform(self, node, inp, out)
   1996 try:
-> 1997     z[0] = np.asarray(scalar * np.dot(x, y))
   1998 except ValueError as e:
   1999     # The error raised by numpy has no shape information, we
   2000     # mean to add that

File <__array_function__ internals>:200, in dot(*args, **kwargs)
...
 |TensorConstant{[[15.  5. ..  4.  3.]]} [id K] 
 |TensorConstant{-1.0} [id L] 

I didn’t offer a specific solution, so I don’t know exactly what you are trying.

1 Like

@ricardoV94 I tried using the following:

control_x = control_beta * x[:, None]

I will check the documentation and hopefully can find a clue there…

thank you

1 Like