Data / model structure with hierarchical model and covariates at individual and group level

Dear all,

I am working on a model where the outcome is a daily occurrence for several units of analysis.

So far I have been able to introduce covariates (β here) at the day-to-day level, plus an effect (varying intercepts, δ here) for each of the units of analysis.

with model:
    β = ... priors OK
    δ = pm.Normal('δ', mu = 0, sigma = 1, dims = ['Period', 'Unit'])
    mu = tt.exp(
        δ[id_period, id_unit] +
        β[id_period, 1...] * covariates_individual_level )
    y_pred = pm.NegativeBinomial('y_pred', mu = mu, alpha = α, observed = y)

This works well and I get a δ that is nPeriods * nUnits.

The next step is to model these varying intercepts using another set of variables corresponding to the units of analysis in two periods of time. For this I have either tried with a long and a wide dataset, but with no success:

δ = pm.Normal('δ', mu = μ_δ, sigma = sigma_delta, dims = ['Period', 'Unit'])
μ_δ = (
  θ[id_period_group, 1] * covariate_group_level_1[:,id_period_group] +
  θ[id_period_group, 2] * covariate_group_level_2[:,id_period_group] +
  ...
)
... sigma_delta = ... prior

In this case, covariate_group_level_1 is a [nUnits, nPeriods] array. But the exception is that operands could not be broadcast together with shapes (2,90) (90,180), or similar ones when I try different setups.

It would seem that working with pm.Data or xarray could help me (according to [Hierarchical model and pandas dataframe - #7 by OriolAbril]), but I can’t seem to find documentation on how to properly use them in this context.

What would be the appropriate way to deal with this?

Thank you very much for your support.

Dear @xfim ,

the “operand shape” error seems rather familiar to me. I think the asterisk operator is an element-wise multiplication, whereas you are probably intending a matrix multiplication here.
Since you use pm.Data already, you could probably change the code to use theano.tensor.dot instead of the multiplication. Then the 2x90 and 90x180 tensors will be cast down to a 2x180 tensor like you would expect in matrix multiplication.

Hope this helps!

Thank you, @falk. I am not yet using pm.Data. I just mentioned that probably the solution lies there, as you also suggest. I will have to work on that, but the material that I have checked so far refers to uses of pm.Data where only one level exist. But the use of matrix multiplication is something that I need to do, also. It was on my todo list. Now it seems more necessary.

Quick related question: from a pur speed point of view, is it better to use matrix multiplication?

The answer to your sampling speed question depends on the model. In my experience speed increase is limited (or negative) on small models (the ones I use as examples to get the code right), however there is a clear speed advantage when the model gets more complex (e.g. multivariate blocks).

Enjoy working on the matrix structures! I felt it rewarding when getting a hang on it. Especially the pm.Data, which gives you the invaluable bonus of in-sample and out-of-sample “prediction”. Below the function I currently use for multilevel model components; it is rather specific for my own use case and terminology, but maybe you can generalize. Hope it helps!

import numpy as NP          # numerical analysis
import pandas as PD         # data management
import pymc3 as PM          # basic modeling
import patsy as PA          # formula notation
if PM.__version__ == '4.0':
    import aesara as TH         # efficent tensors
    import aesara.tensor as TT  # tensor control
else:
    import theano as TH         # efficent tensors
    import theano.tensor as TT  # tensor control

# print the shape of a aesara tensor
# https://aholzner.wordpress.com/2016/01/03/printing-a-tensors-shape-in-theano/
PrintShape = lambda label, tensor: TH.printing.Print(label, attrs = [ 'shape' ])(tensor)


def AddMultiLevelComponent( \
                            model: PM.Model \
                          , data: PD.DataFrame \
                          , label: str \
                          , params: list \
                          , level: str \
                          , observables: list \
                          , prior_kwargs: dict = {'mu': 0., 'sd': 1.} \
                          , verbose: bool = False \
                          ):
    # generate a single, multilevel component of a model
    # label: a name for this part of the model
    # params: list of parameters ("predictors") which are multiplied with slopes; e.g. ['x1', 'x2']
    # level must be a PD.Categorical in data PD.DataFrame
    # observables: the dependent model variables (as a list), e.g. ['y1', 'y2']
    # both params and observables are columns in the data
    # returns: (n_observations x n_observables) model component
 
    # I label counts as "n"
    n_observables = len(observables)
    n_params = len(params)

    if verbose:
        print (f"### assembling {label} on {level} ###")
        print ("\tpriors:", prior_kwargs)

    # data matrix
    # to be honest, I haven't tried using multiple "params", but it /should/ work.
    # You can also have ["1"] in the params list to get a multilevel intercept
    # using patsy dmatrix is a bit overkill, you can just grab values from the data frame instead.
    data_matrix = NP.asarray(PA.dmatrix( f" + ".join(['0'] + params) \
                                               , data = data \
                                               , return_type = 'dataframe' \
                                              ))

    # design matrix for group level
    # "level" is a categorical column in the "data"
    # here, the Patsy dmatrix is actually useful because it generates a boolean matrix from the categorical
    group_matrix = NP.asarray( \
                        PA.dmatrix( f'0 + {level}' \
                                    , data = data \
                                    , return_type = 'dataframe' \
                                   ) \
                           )
    n_groups = group_matrix.shape[1]

    if verbose:
        print ("\tdmat:\t", str(group_matrix[:5, :]).replace('\n', '\n\t\t') \
               , "\n\t\t", group_matrix.shape \
               )


    # convert data to a "theano.shared" via the fabulous PM.Data function
    with model:
        shared_dat = PM.Data(f'{label}_data', data_matrix)
        shared_grp = PM.Data(f'{label}_group', group_matrix)

    # loop observables (stacked afterwards) because their parameter blocks are independent
    obslwise = [] # will aggregate the slopes per observable (obsl); stacked below
    for obsl, observable in enumerate(observables):
        with model:
            if label == 'intercept':
                hp_mean = PM.Normal( f'{label}_ml_population_{observable}' \
                                        , shape = n_params \
                                        , **{k: v[obsl] for k, v in prior_kwargs.items()} \
                                       )

                population = PM.Normal( f'{label}_ml_{level}_{observable}' \
                                        , shape = (n_groups, n_params ) \
                                        , mu = hp_mean \
                                        , sd = prior_kwargs.get('sd', [1.]*n_observables)[obsl] \
                                       )

                # # you could leave out the hyperprior
                # population = PM.Normal( f'{label}_ml_{level}_{observable}' \
                #                         , shape = (n_groups, n_params ) \
                #                         , **{k: v[obsl] for k, v in prior_kwargs.items()} \
                #                        )

            else:
                # prepare n_groups x n_params slopes
                population = PM.Normal( f'{label}_ml_{level}_{observable}' \
                                        , shape = (n_groups, n_params ) \
                                        , **{k: v[obsl] for k, v in prior_kwargs.items()} \
                                       )

            # only use the group-specific slope per observation
            groupwise = TT.dot(shared_grp, population)
            # debugging:
            # print(f'{label} {level} {observable}')
            # PrintShape('groupwise', groupwise)

            # multiply data and slopes; then sum over parameters (components are added)
            grpdat_matrix = TT.mul(shared_dat, groupwise)
            # PrintShape('grpdat_M', grpdat_matrix)
            grpdat = TT.sum(grpdat_matrix, axis = 1)
            # PrintShape('grpdat', grpdat)

        # append the observable-wise list
        obslwise.append(grpdat)

    # stack observable-wise column vectors
    with model:
        component = TT.stack(obslwise, axis = 1)
        # PrintShape('component', component)

    if verbose:
        PrintShape(f'component: {component}', component)
        print ("\tdone!")

    return component

1 Like

Wow, @falk, many thanks. I have a lot to study here. I may com to you again in case I need some clarifications. Definitively I need to learn how to properly do the pm.Data().

1 Like

Just to clarify here if someone comes later. This is what has worked for me:

md1 = tt.dot(data_group, θ[0,:])
md1 = tt.dot(data_group, θ[1,:])
μ_δ = pm.Deterministic('μ_δ', tt.stack([md1, md2], axis = 1))
δ = pm.Normal('δ', mu = μ_δ, sigma = sigma_delta, dims = ['Unit', 'Period'])

Not very elegant, but uses the combination of dot product, passing data in the model, and stacking the two groups together in columns.

Thank you again, @falk.

1 Like