Using dot-product in hierarchical or unpooled model

I’ve been following the guide here:

Adapted to a slightly different problem, where city is the random effect and alpha, beta are the fixed effects. I wanted to test an unpooled model, because it’s simple to check the results (e.g. against the individual models).

The following code works,
with pm.Model() as unpooled_model:

    alpha = pm.Normal('alpha', mu=0, sd=10, shape=n_cities)
    beta = pm.Normal('beta', mu=0, sd=10, shape=n_cities) 
    sigma = pm.HalfCauchy('sigma', 1)
    sd_a = pm.HalfCauchy('sd_a', 5)
    sd_b = pm.HalfCauchy('sd_b', 5)

    eq = alpha[city] + beta[city]*X.flatten()
    crime_p = pm.Normal('crime_p', mu=eq, sd=sigma, observed=y)

    unpooled_trace = pm.sample(draws=2000, n_init=2000, tune=5000)

However, if I change the beta[city]*X.flatten() part to a dot-product, it fails.

eq = alpha[city] +[city], X.flatten()) 

The model runs, but the results are nonsense. I feel like these expressions should give the same result. Does anyone know why this doesn’t work please?

You can check the output of the dot product by doing:[city], X.flatten()).tag.test_value
and compare it with the elementwise multiplication:

What likely happened is that the dot product computed the sum along some dimension, which then broadcast to each element of alpha[city] in the addition.

In both cases I get the error:

AttributeError: 'numpy.ndarray' object has no attribute 'type'

Not sure what that means, is it complaining because X a numpy array?

I already tried testing the .dot approach using numpy - I assume that and behave in the same way.

Could you please try casting city to theano: theano.shared(city)

Ok thanks for your help, yes this helped. I also found a good solution on the forums here: Theano error with element wise multiplication

crime_p = (a[city] + b[city] * X).sum(axis=1)

In terms of getting out predictions, I take it most people would use alpha and beta (rather than a and b)? E.g. if I had some random new city that I had no data about.

Yes - for example in a mixed-effect formulation you will be using non-group-specific effect for prediction of a new city. In practice how to compute this depends on the formulation of your model.

To get predictions out is in sensible to introduce an auxiliary variable in my model?

logit = a[group] + (b[group]*mbX).sum(axis=1)
p = logistic(logit)
y_obs = pm.Bernoulli('y_obs', p=p, observed=mb_y, total_size=ytrain.shape[0])

logit_fe = alpha + (beta*mbX).sum(axis=1)
p_fe = logistic(logit_fe)
y_fe = pm.Bernoulli('y_fe', p=p_fe)

Where group is my grouping parameter (e.g. city), a and b are the group-level random effects and alpha, beta are the top-level effects, i.e.

alpha = pm.Normal('alpha', mu=0, sd=1)
a = pm.Normal('a', mu=alpha, sd=a_sd, shape=len(unique_cities))

Or is there a better way to obtain non-group specific predictions?

You can do that - but it is not recommended. Auxiliary variables sometimes introduce unintended side effect eg

I think a better way is to compute the prediction using the MCMC trace after sampling.

Ok I noticed the auxiliary variable was messing around with training. Thanks.

I’m not sure how to use the MCMC trace to compute the non-group specific predictions with pymc3. I’m able to do it in a brute force sort of way, like this,

for this_intercept, this_coef in zip(trace['alpha'], trace['beta'):
    lr = sklearn.linear_model.LogisticRegression()
    lr._intercept = this_intercept
    lr._coef = this_coef

But I feel like there must be a cleaner way of doing this within pymc3.

To do it within PyMC3, you can try creating a new model solely for prediction:

with pm.Model() as prediction:
    alpha = ...
    beta = ...
    logit_fe = alpha + (beta*mbX).sum(axis=1)
    p_fe = logistic(logit_fe)
    y_fe = pm.Bernoulli('y_fe', p=p_fe)
    # plug in the trace from before.
    ppc = pm.sample_ppc(trace, 1000, vars=[y_fe])