Unable to parameterize lag parameter

Hi everyone. I am new to PyMC and I am using the 5.10.0 version of this library. In my project below, I want to apply a different lag parameter to each independent variable. Upon running this, I get the error TypeError: 'TensorVariable' object cannot be interpreted as an integer. The reproducible code is as follows.

## Create a simple MMM data 
import pandas as pd
from random import randint
import numpy as np
import pytensor.tensor as tt
import pytensor as pt
import pymc as pm
import pymc.sampling.jax as pmjax
import arviz as az


# Generate date range
dates = pd.date_range(start="2021-01-01", end="2022-01-01")

data = {
    "date": dates,
    "gcm_direct_Impressions": [randint(10000, 20000) for _ in dates],
    "display_direct_Impressions" :[randint(100000,150000) for _ in dates],
    "tv_grps": [randint(30, 50) for _ in dates],
    "tiktok_direct_Impressions": [randint(10000, 15000) for _ in dates],
    "sell_out_quantity": [randint(150, 250) for _ in dates]
}
df = pd.DataFrame(data)
m = max(df['sell_out_quantity'].values)

print(f"Max sales Volume {m}")

channel_columns = [col for col in df.columns if 'Impressions' in col or 'grps' in col]

transform_variables = channel_columns


delay_channels = channel_columns

media_channels = channel_columns

target = 'sell_out_quantity'

### Transform each channel variable

data_transformed = df.copy()

numerical_encoder_dict = {}


for feature in transform_variables:
    # Extracting the original values of the feature.
    original = df[feature].values

    # Calculating the maximum value of the feature.
    max_value = original.max()

    # Dividing each value in the feature by the maximum value.
    transformed = original / max_value

    # Storing the transformed data back into the 'data_transformed' DataFrame.
    data_transformed[feature] = transformed

    # Storing the maximum value used for scaling in the dictionary.
    # This will be used for reversing the transformation if needed.
    numerical_encoder_dict[feature] = max_value



def adstock_transform(x, rate,max_lag):
    """ Apply adstock transformation with PyTensor.
    :param x: PyTensor tensor, original data for the channel
    :param rate: PyTensor tensor, decay rate of the adstock transformation
    :param max_lag: int, maximum lag to consider for the adstock effect
    :return: PyTensor tensor, transformed data
    """
    # Creating a tensor to store transformed values
    adstocked = tt.zeros_like(x)
    
    for i in range(max_lag, x.shape[0]):
        weights = tt.power(rate, tt.arange(max_lag + 1))
        adstocked = tt.set_subtensor(adstocked[i], tt.dot(x[i-max_lag:i+1][::-1], weights))
    
    return adstocked

### Create a model
response_mean = []

with pm.Model() as model_2:
    # Looping through each channel in the list of delay channels.
    for channel_name in delay_channels:
        print(f"Delay Channels: Adding {channel_name}")

        # Extracting the transformed data for the current channel.
        x = data_transformed[channel_name].values

        # Defining Bayesian priors for the adstock, gamma, and alpha parameters for the current channel.
        adstock_param = pm.Beta(f"{channel_name}_adstock", 2, 2)
        saturation_gamma = pm.Beta(f"{channel_name}_gamma", 2, 2)
        saturation_alpha = pm.Gamma(f"{channel_name}_alpha", 3, 1)
        rate = pm.Beta(f'{channel_name}_rate', alpha=1, beta=1)
        lag = pm.DiscreteUniform(f"{channel_name}_lag",lower=0,upper=17)
        
        transformed_X1 = adstock_transform(x,rate,max_lag=lag)
        transformed_X2 = tt.zeros_like(x)
        for i in range(1,len(x)):
            transformed_X2 = tt.set_subtensor(transformed_X2[i],(transformed_X1[i]**saturation_alpha)/(transformed_X1[i]**saturation_alpha+saturation_gamma**saturation_alpha))
        channel_b = pm.HalfNormal(f"{channel_name}_media_coef", sigma = m)
        response_mean.append(transformed_X2 * channel_b)

    intercept = pm.Normal("intercept",mu = np.mean(data_transformed[target].values), sigma = 3)
    sigma = pm.HalfNormal("sigma", 4)
    likelihood = pm.Normal("outcome", mu = intercept + sum(response_mean), sigma = sigma,
                           observed = data_transformed[target].values)

with model_2:
    trace = pmjax.sample_numpyro_nuts(1000, tune=1000, target_accept=0.95)
    
    trace_summary = az.summary(trace)

One solution that I thought of was to replace max_lag in adstock_transform to int(max_lag.eval()). It still gives me the same error. I also tried replacing x = data_transformed[channel_name].values with x = pm.ConstantData("data_{channel_name",data_transformed[channel_name].values) and that did not work either. Should I choose a different distribution for the lag parameter for the sampling to work? Thanks in advance!

I found a trivial workaround.

When using pmjax.sample_numpyro_nuts() for probabilistic modeling I realized that this function is designed for continuous distributions and doesn’t support discrete ones directly. One workaround for this limitation could be to substitute the discreteUniform distribution with a Uniform distribution. However, this substitution results in a continuous output, which may not be suitable for functions like adstock_transform that expect a discrete input. To address this, we can convert the continuous output back to a discrete value by using max_lag = int(max_lag.eval()). This snippet effectively casts the floating-point number into an integer, making it compatible with the adstock_transform function. This answer is open to feedback.

You don’t want to do that. That will fix lag to a random value instead of sampling it.

There’s no workaround if you have a parameter that gets discretized, it will no longer be differentiable wrt to the distribution that uses that parameter.

You can cast the uniform value to an integer with max_lag = max_lag.astype("int") but it may still not be able to sample the discrete variable at all.

A better option if you still want to only use NUTS may be to marginalize your discrete variable with something like MarginalModel — pymc_experimental 0.0.15 documentation

In general though it may be quite hard to infer the lag parameters at all from your data. Commonly it’s just fixed to a constant

Is my response here relevant? The common approach in time series literature is to just include everything up to the highest justifiable order, then strongly bias the coefficients towards zero.

I don’t know the frequency of your data, but at 17 lags you’re probably looking at seasonal effects.

Hi @ricardoV94 Thanks for this. I did run a simple experiment to compare the distributions of random variables with and without eval() and I would love your thoughts on this. The distributions look very similar .The code I used is as follows:

import pymc as pm
import pymc.sampling.jax as pmjax
import numpy as np

x = [i for i in range(200)]
x_np = np.array(x)
with pm.Model() as m:
    
    z = pm.Normal('z',180)
    u = np.roll(x_np,int(z.eval()))
with m:

    idata = pmjax.sample_numpyro_nuts(1000,chains = 4)

#### Without eval()

with pm.Model() as m2:
    
    z = pm.Normal('z',180)

with m2:

    idatax = pmjax.sample_numpyro_nuts(1000,chains = 4)

import matplotlib.pyplot as plt


plt.hist(np.mean(idata.posterior['z'],axis=0),label='With eval()',color='red')
plt.hist(np.mean(idatax.posterior['z'],axis=0),label = 'Without eval()',color='blue')
plt.legend()
plt.show()

The distributions of random variable ‘z’ with and without eval().

In your example it doesn’t matter that you defined u at all, you are sampling a single variable z and there is no observed data linked to it, so the posterior you are getting is the same as the prior.

A simpler comparison would have some variable downstream of u.

with pm.Model() as correct_m:
  x = pm.Normal("x")
  u = x  # You don't need any operation to prove the point, but you can do the int if you want
  z = pm.Normal("z", u, 1e-3, observed=100)

with pm.Model() as broken_m:
  x = pm.Normal("x")
  u = x.eval()  # The eval breaks the relationship between `x` and `z`
  z = pm.Normal("z", u, 1e-3, observed=100)

In the first model, x is going to have a posterior shifted towards 100, and in the second not, because you broke the relationship between z and x with that eval. In fact the posterior for x is going again to be the prior, because it’s not linked to any observed data.

1 Like

Understood. eval() basically takes a “snapshot” rather than sample from the entire space of probable values. This will impact downstream tasks.

Exactly. It’s the same as if you set u = np.random.randint(...), it’s just a constant as far as PyMC is aware

Hi @ricardoV94 @jessegrabowski I found a temporary workout to get the lag parameter for each variable in the model. In the following code, I set a max_lag parameter that is equal to 17. For each variable, lags from 0 to 17 is applied till we get the highest correlation between the lagged variable and the dependent variable. We then proceed to use the lagged variable in the rest of the modeling process. But the only issue here is that I need to apply this function on the PyMC/PyTensor variable, get an index out of bounds error. My reproducible code is as follows.

## Create a simple MMM data 
import pandas as pd
from random import randint
import numpy as np
import pytensor.tensor as tt
import pytensor as pt
import pymc as pm
import pymc.sampling.jax as pmjax
import arviz as az

# # Disable most optimizations
# pt.config.optimizer = 'fast_compile'
# # or completely turn off optimizations
# # pt.config.optimizer = 'None'

# # Increase exception verbosity
# pt.config.exception_verbosity = 'high'

## Functions to get the best lag

def correlation_coefficient(X, Y):
    """
    Calculate the correlation coefficient between two theano tensors.
    """
    X_mean = tt.mean(X)
    Y_mean = tt.mean(Y)

    X_std = tt.std(X)
    Y_std = tt.std(Y)

    covariance = tt.mean((X - X_mean) * (Y - Y_mean))
    return covariance / (X_std * Y_std)

def create_lagged_vector(X, lag):
    # Function to create a lagged version of a vector
    if lag == 0:
        return X
    else:
        return tt.concatenate([tt.zeros(lag), X[:-lag]])

def find_optimal_lag(X1, y, max_lag):
    best_lag = 0
    best_correlation = -np.inf

    for lag in range(max_lag + 1):
        lagged_X1 = create_lagged_vector(X1, lag)
        correlation = correlation_coefficient(lagged_X1, y)

        # Update best lag if this is the highest correlation so far
        best_correlation = pm.math.switch(tt.gt(correlation,best_correlation),correlation,best_correlation)
        best_lag = pm.math.switch(tt.gt(correlation,best_correlation),lag,best_lag)
        # if correlation > best_correlation:
        #     best_lag = lag
        #     best_correlation = correlation

    return best_lag, best_correlation





# Generate date range
dates = pd.date_range(start="2021-01-01", end="2022-01-01")

data = {
    "date": dates,
    "gcm_direct_Impressions": [randint(10000, 20000) for _ in dates],
    "display_direct_Impressions" :[randint(100000,150000) for _ in dates],
    "tv_grps": [randint(30, 50) for _ in dates],
    "tiktok_direct_Impressions": [randint(10000, 15000) for _ in dates],
    "sell_out_quantity": [randint(150, 250) for _ in dates]
}
df = pd.DataFrame(data)
m = max(df['sell_out_quantity'].values)

print(f"Max sales Volume {m}")

channel_columns = [col for col in df.columns if 'Impressions' in col or 'grps' in col]

transform_variables = channel_columns


delay_channels = channel_columns

media_channels = channel_columns

target = 'sell_out_quantity'

### Transform each channel variable

data_transformed = df.copy()

numerical_encoder_dict = {}


for feature in transform_variables:
    # Extracting the original values of the feature.
    original = df[feature].values

    # Calculating the maximum value of the feature.
    max_value = original.max()

    # Dividing each value in the feature by the maximum value.
    transformed = original / max_value

    # Storing the transformed data back into the 'data_transformed' DataFrame.
    data_transformed[feature] = transformed

    # Storing the maximum value used for scaling in the dictionary.
    # This will be used for reversing the transformation if needed.
    numerical_encoder_dict[feature] = max_value



def adstock_transform(x, rate,max_lag):
    """ Apply adstock transformation with PyTensor.
    :param x: PyTensor tensor, original data for the channel
    :param rate: PyTensor tensor, decay rate of the adstock transformation
    :param max_lag: int, maximum lag to consider for the adstock effect
    :return: PyTensor tensor, transformed data
    """
    # Creating a tensor to store transformed values
    adstocked = tt.zeros_like(x)
    
    for i in range(max_lag, x.shape[0]):
        weights = tt.power(rate, tt.arange(max_lag + 1))
        adstocked = tt.set_subtensor(adstocked[i], tt.dot(x[i-max_lag:i+1][::-1], weights))
    
    return adstocked

### Create a model
response_mean = []

with pm.Model() as model_2:
    # Looping through each channel in the list of delay channels.
    for channel_name in delay_channels:
        print(f"Delay Channels: Adding {channel_name}")

        # Extracting the transformed data for the current channel.
        x = data_transformed[channel_name].values

        # Defining Bayesian priors for the adstock, gamma, and alpha parameters for the current channel.
        adstock_param = pm.Beta(f"{channel_name}_adstock", 2, 2)
        saturation_gamma = pm.Beta(f"{channel_name}_gamma", 2, 2)
        saturation_alpha = pm.Gamma(f"{channel_name}_alpha", 3, 1)
        rate = pm.Beta(f'{channel_name}_rate', alpha=1, beta=1)
        ### Getting a adstocked transformed vector
        transformed_X1 = tt.zeros_like(x)
        for xi in range(0, len(x)):
            if xi == 0:
                transformed_X1 = tt.set_subtensor(transformed_X1[xi],x[xi])
            else:

                transformed_X1 = tt.set_subtensor(transformed_X1[xi],(transformed_X1[xi-1]*rate)+x[xi])

        ## Uncover the best lag for each channel        

        max_lag = 17

        y = tt.as_tensor(df['sell_out_quantity'].values)

        best_lag,best_correlation=find_optimal_lag(transformed_X1, y, max_lag)

        lagged_X1 = tt.concatenate([tt.zeros(best_lag),transformed_X1[:-best_lag]])


        
        ### Apply hill transform

        transformed_X2 = tt.zeros_like(x)
        for i in range(1,len(x)):
            transformed_X2 = tt.set_subtensor(transformed_X2[i],(lagged_X1[i]**saturation_alpha)/(lagged_X1[i]**saturation_alpha+saturation_gamma**saturation_alpha))
        channel_b = pm.HalfNormal(f"{channel_name}_media_coef", sigma = m)
        response_mean.append(transformed_X2 * channel_b)

    intercept = pm.Normal("intercept",mu = np.mean(data_transformed[target].values), sigma = 3)
    sigma = pm.HalfNormal("sigma", 4)
    likelihood = pm.Normal("outcome", mu = intercept + sum(response_mean), sigma = sigma,
                           observed = data_transformed[target].values)

with model_2:
    trace = pmjax.sample_numpyro_nuts(1000, tune=1000, target_accept=0.95)
    
    trace_summary = az.summary(trace)

I have added few functions to calculate the best lag that each independent variable should be subjected to.

def correlation_coefficient(X, Y):
    """
    Calculate the correlation coefficient between two theano tensors.
    """
    X_mean = tt.mean(X)
    Y_mean = tt.mean(Y)

    X_std = tt.std(X)
    Y_std = tt.std(Y)

    covariance = tt.mean((X - X_mean) * (Y - Y_mean))
    return covariance / (X_std * Y_std)

def create_lagged_vector(X, lag):
    # Function to create a lagged version of a vector
    if lag == 0:
        return X
    else:
        return tt.concatenate([tt.zeros(lag), X[:-lag]])

def find_optimal_lag(X1, y, max_lag):
    best_lag = 0
    best_correlation = -np.inf

    for lag in range(max_lag + 1):
        lagged_X1 = create_lagged_vector(X1, lag)
        correlation = correlation_coefficient(lagged_X1, y)

        # Update best lag if this is the highest correlation so far
        best_correlation = pm.math.switch(tt.gt(correlation,best_correlation),correlation,best_correlation)
        best_lag = pm.math.switch(tt.gt(correlation,best_correlation),lag,best_lag)
        # if correlation > best_correlation:
        #     best_lag = lag
        #     best_correlation = correlation

    return best_lag, best_correlation

After generating the best_lag, this lag is then applied to the transformed_X1 variable the following way

lagged_X1 = tt.concatenate([tt.zeros(best_lag),transformed_X1[:-best_lag]])

When I use lagged_X1 for further processing I get the following error, which I am unable to solve.

IndexError                                Traceback (most recent call last)
File ~/miniconda3/envs/pymc_env/lib/python3.11/site-packages/pytensor/compile/function/types.py:970, in Function.__call__(self, *args, **kwargs)
    968 try:
    969     outputs = (
--> 970         self.vm()
    971         if output_subset is None
    972         else self.vm(output_subset=output_subset)
    973     )
    974 except Exception:

IndexError: index out of bounds

During handling of the above exception, another exception occurred:

IndexError                                Traceback (most recent call last)
Cell In[3], line 182
    178     likelihood = pm.Normal("outcome", mu = intercept + sum(response_mean), sigma = sigma,
    179                            observed = data_transformed[target].values)
    181 with model_2:
--> 182     trace = pmjax.sample_numpyro_nuts(1000, tune=1000, target_accept=0.95)
    184     trace_summary = az.summary(trace)

File ~/miniconda3/envs/pymc_env/lib/python3.11/site-packages/pymc/sampling/jax.py:662, in sample_numpyro_nuts(draws, tune, chains, target_accept, random_seed, initvals, model, var_names, progressbar, keep_untransformed, chain_method, postprocessing_backend, postprocessing_vectorize, idata_kwargs, nuts_kwargs, postprocessing_chunks)
    659 tic1 = datetime.now()
..
 - Join.0, Shape: (0,), ElemSize: 8 Byte(s), TotalSize: 0 Byte(s)
 TotalSize: 34246.0 Byte(s) 0.000 GB
 TotalSize inputs: 33941.0 Byte(s) 0.000 GB

Is there a way to solve this? Thanks in advance !