Mixing pytensor.dot and dimensions

Hello! Thank you in advance for looking into my question. I created a simple model like this:

example_data = pd.DataFrame(data = {'BMI': np.array([13.43666, 16.061496, 22.998563, 14.499094, 18.248859, 13.811637, 15.061559, 19.873758, 15.436535, 13.186676]),
                                    'WTGAIN': np.array([20, 39, 99, 31, 47, 99, 99, 1, 34, 29]),
                                    'M_Ht_In': np.array([12, 12.8, 12.2, 12.2, 12.8, 13.8, 12.4, 13, 12.8, 13.4]),
                                    'CLUSTERS': np.array([3, 3, 2, 3, 3, 2, 2, 3, 3, 3]),
                                    'DBWT': np.array([2610, 3190, 3232, 2410, 2780, 3033, 2495, 3518, 3381, 2693])})

with pm.Model() as NonHier: 
    # add coordinates
    NonHier.add_coord("num_cols", ['BMI','WTGAIN','M_Ht_In'])
    NonHier.add_coord("obs", example_data.index)
    
    # add data containers
    X = pm.MutableData("X", example_data[['BMI','WTGAIN','M_Ht_In']].values)
    y = pm.MutableData("y", example_data['DBWT'].values)
    
    # make a simple model
    α = pm.Normal("α", mu=0., sigma=3.)
    β = pm.Normal("β", mu=0., sigma=3., dims="num_cols")
    ϵ = pm.Normal("ϵ", mu=0., sigma=3.)
    μ = pyt.tensor.dot(X, β) + α
    yhat = pm.Normal("yhat", mu=μ, sigma=ϵ, observed=y) 
    
    # sample
    trace = pm.sample(
        100,
        tune=25,
        chains=3,
        return_inferencedata=True,
        idata_kwargs={'log_likelihood':True}
    )

which as you can see, uses pytensor.dot to multiply the numerical data by the betas ( which in turn, were specified to have a specific dimension). This is all well and good, but I would like to include the ‘CLUSTERS’ column, too. I would like to essentially create a ‘hyperprior’ such that the data with cluster 2 has a separate hyperprior from the data with cluster 3. I believe that I do this by creating a new dimension.

But I am having trouble extending the multi-level modeling example with .dot(). Does anyone have suggestions as to how I should proceed?

I tried something like:

example_data = pd.DataFrame(data = {'BMI': np.array([13.43666, 16.061496, 22.998563, 14.499094, 18.248859, 13.811637, 15.061559, 19.873758, 15.436535, 13.186676]),
                                    'WTGAIN': np.array([20, 39, 99, 31, 47, 99, 99, 1, 34, 29]),
                                    'M_Ht_In': np.array([12, 12.8, 12.2, 12.2, 12.8, 13.8, 12.4, 13, 12.8, 13.4]),
                                    'CLUSTERS': np.array([3, 3, 2, 3, 3, 2, 2, 3, 3, 3]),
                                    'DBWT': np.array([2610, 3190, 3232, 2410, 2780, 3033, 2495, 3518, 3381, 2693])})

cluster_index, cluster_name = pd.factorize(example_data['CLUSTERS'])

with pm.Model() as Hier: 
    # add coordinates
    Hier.add_coord("num_cols", ['BMI','WTGAIN','M_Ht_In'])
    Hier.add_coord("obs", example_data.index)
    Hier.add_coord("clusters", ["cluster_3", "cluster_2"] ) 
    
    # add data containers
    X = pm.MutableData("X", example_data[['BMI','WTGAIN','M_Ht_In']].values)
    y = pm.MutableData("y", example_data['DBWT'].values)
    
    # hyper-prior
    δ = pm.Normal("δ", mu=0., sigma=3., dims="clusters")
    
    # priors
    ϵ = pm.Normal("ϵ", mu=0., sigma=3.)
    
    # using the hyperpior 'δ' 
    α = pm.Normal("α", mu=δ, sigma=3., dims=("clusters", "num_cols"))
    β = pm.Normal("β", mu=δ, sigma=3., dims=("clusters", "num_cols"))

    # likelihood 
    μ = pyt.tensor.dot(X, β[cluster_index,:]) + α[cluster_index,:]
    
    # response
    yhat = pm.Normal("yhat", mu=μ, sigma=ϵ, observed=y) 
    
    # sample
    trace = pm.sample(
        100,
        tune=25,
        chains=3,
        return_inferencedata=True,
        idata_kwargs={'log_likelihood':True}
    )

which of course, threw a shape mis-match error. I appreciate the help!

pytensor.dot works exactly like numpy.dot. You can call X.shape.eval() and β[cluster_index,:].shape.eval() to see what shapes you are getting and it should be easy to see why it’s not working.

1 Like

Usually after you fancy-index like this, beta will end up with the shape of X, so you have (n, k) @ (n,) instead of the expected (n, k) @ (k,). The solution is just to do the dot “by hand”, since everything already lines up, like this: (X * beta[cluster_index, :]).sum(axis=-1)

1 Like

This was very helpful and solved my issue. I ultimately realized that I had to include my ‘cluster_index’ within X and remove the dimension from \delta.

β = pm.Normal("β", mu=δ, sigma=3., dims= ("clusters", "num_cols"))
μ = (X[cluster_index, :] * β[cluster_index, :]).sum(axis=1)

I need to deepen my understanding of dimensions, but this is providing me with momentum. Thanks!

1 Like

I see it now. I need to include the “clusters” dimension in my \alpha (intercept), but to make it work with the \mu, I need to calculate its mean. Possibly still some tweaking but this is essentially what I was hoping to achieve:

example_data = pd.DataFrame(data = {'BMI': np.array([13.43666, 16.061496, 22.998563, 14.499094, 18.248859, 13.811637, 15.061559, 19.873758, 15.436535, 13.186676]),
                                    'WTGAIN': np.array([20, 39, 99, 31, 47, 99, 99, 1, 34, 29]),
                                    'M_Ht_In': np.array([12, 12.8, 12.2, 12.2, 12.8, 13.8, 12.4, 13, 12.8, 13.4]),
                                    'CLUSTERS': np.array([3, 3, 2, 3, 3, 2, 2, 3, 3, 3]),
                                    'DBWT': np.array([2610, 3190, 3232, 2410, 2780, 3033, 2495, 3518, 3381, 2693])})

cluster_index, cluster_name = pd.factorize(example_data['CLUSTERS'])

with pm.Model() as Hier: 
    # add coordinates
    Hier.add_coord("num_cols", ['BMI','WTGAIN','M_Ht_In'])
    Hier.add_coord("obs", example_data.index)
    Hier.add_coord("clusters", ["cluster_3", "cluster_2"] ) 
    
    # add data containers
    X = pm.MutableData("X", example_data[['BMI','WTGAIN','M_Ht_In']].values)
    y = pm.MutableData("y", example_data['DBWT'].values)
    
    # hyper-prior
    αμ = pm.Normal("αμ", mu=0., sigma=3.,)
    ασ = pm.HalfNormal("ασ", sigma=3.,)
    
    βμ = pm.Normal("βμ", mu=0., sigma=3.,)
    βσ = pm.HalfNormal("βσ", sigma=3.,)
    
    # error
    ϵ = pm.Normal("ϵ", mu=0, sigma=3.)
        
    # using the hyperpior 'δ' 
    α = pm.Normal("α", mu=αμ, sigma=ασ, dims="clusters")
    β = pm.Normal("β", mu=βμ, sigma=βσ, dims= ("clusters", "num_cols"))
    
    # likelihood 
    μ = (X[cluster_index, :] * β[cluster_index, :]).sum(axis=1) + α[cluster_index].mean()
    # μ.shape.eval(), X.shape.eval(), β.shape.eval(), α.shape.eval()
    
    # response
    yhat = pm.Normal("yhat", mu=μ, sigma=ϵ, observed=y)
    
    # sample
    trace = pm.sample(
        250,
        tune=50,
        chains=4,
        return_inferencedata=True,
        idata_kwargs={'log_likelihood':True}
    )
2 Likes

You shouldn’t need to take the mean of alpha. This will give the same intercept value to all the units in your sample – the average of the two cluster intercepts. Indexing X by cluster will also not do what you think, it will duplicate rows 0 and 1 over and over.

Note that mu is the conditional expectation, given the X values and the clusters, so you don’t want to take the average of everything.

Here’s a corrected model:

import pymc as pm
coords = {
    'num_cols':['BMI','WTGAIN','M_Ht_In'],
    'obs':example_data.index,
    'clusters':["cluster_3", "cluster_2"] 
}
with pm.Model(coords=coords) as Hier: 
    # add data containers
    X = pm.MutableData("X", example_data[['BMI','WTGAIN','M_Ht_In']].values)
    y = pm.MutableData("y", example_data['DBWT'].values)
    
    # hyper-prior
    αμ = pm.Normal("αμ", mu=0., sigma=3.,)
    ασ = pm.HalfNormal("ασ", sigma=3.,)
    
    βμ = pm.Normal("βμ", mu=0., sigma=3.,)
    βσ = pm.HalfNormal("βσ", sigma=3.,)
    
    # error
    ϵ = pm.Normal("ϵ", mu=0, sigma=3.)
        
    # using the hyperpior 'δ' 
    α = pm.Normal("α", mu=αμ, sigma=ασ, dims="clusters")
    β = pm.Normal("β", mu=βμ, sigma=βσ, dims= ("clusters", "num_cols"))
    
    μ = (X * β[cluster_index, :]).sum(axis=-1) + α[cluster_index]
    # response
    yhat = pm.Normal("yhat", mu=μ, sigma=ϵ, observed=y)
    
    # sample
    trace = pm.sample(
        250,
        tune=50,
        chains=4,
        return_inferencedata=True,
        idata_kwargs={'log_likelihood':True}
    )
1 Like

Thank you for following up. I hadn’t quite realized that indexing by X would have that behavior. I had also mistakenly believed my alpha should be dimension-less. Thank you again for correcting me!