Using dot-product in hierarchical or unpooled model

erlendd · June 2, 2018, 2:56pm

I’ve been following the guide here: https://docs.pymc.io/notebooks/GLM-hierarchical.html

Adapted to a slightly different problem, where city is the random effect and alpha, beta are the fixed effects. I wanted to test an unpooled model, because it’s simple to check the results (e.g. against the individual models).

The following code works,
with pm.Model() as unpooled_model:

    alpha = pm.Normal('alpha', mu=0, sd=10, shape=n_cities)
    beta = pm.Normal('beta', mu=0, sd=10, shape=n_cities) 
    sigma = pm.HalfCauchy('sigma', 1)
    sd_a = pm.HalfCauchy('sd_a', 5)
    sd_b = pm.HalfCauchy('sd_b', 5)

    eq = alpha[city] + beta[city]*X.flatten()
    crime_p = pm.Normal('crime_p', mu=eq, sd=sigma, observed=y)

    unpooled_trace = pm.sample(draws=2000, n_init=2000, tune=5000)
    pm.traceplot(unpooled_trace)

However, if I change the beta[city]*X.flatten() part to a dot-product, it fails.

eq = alpha[city] + T.dot(beta[city], X.flatten())

The model runs, but the results are nonsense. I feel like these expressions should give the same result. Does anyone know why this doesn’t work please?

junpenglao · June 2, 2018, 3:05pm

You can check the output of the dot product by doing:
T.dot(beta[city], X.flatten()).tag.test_value
and compare it with the elementwise multiplication:
(beta[city]*X.flatten()).tag.test_value

What likely happened is that the dot product computed the sum along some dimension, which then broadcast to each element of alpha[city] in the addition.

erlendd · June 2, 2018, 3:23pm

In both cases I get the error:

AttributeError: 'numpy.ndarray' object has no attribute 'type'

Not sure what that means, is it complaining because X a numpy array?

I already tried testing the .dot approach using numpy - I assume that np.dot and T.dot behave in the same way.

junpenglao · June 2, 2018, 3:25pm

Could you please try casting city to theano: theano.shared(city)

erlendd · June 3, 2018, 2:38pm

Ok thanks for your help, yes this helped. I also found a good solution on the forums here: Theano error with element wise multiplication

crime_p = (a[city] + b[city] * X).sum(axis=1)

In terms of getting out predictions, I take it most people would use alpha and beta (rather than a and b)? E.g. if I had some random new city that I had no data about.

junpenglao · June 3, 2018, 2:53pm

Yes - for example in a mixed-effect formulation you will be using non-group-specific effect for prediction of a new city. In practice how to compute this depends on the formulation of your model.

erlendd · June 5, 2018, 11:17am

To get predictions out is in sensible to introduce an auxiliary variable in my model?

logit = a[group] + (b[group]*mbX).sum(axis=1)
p = logistic(logit)
y_obs = pm.Bernoulli('y_obs', p=p, observed=mb_y, total_size=ytrain.shape[0])

logit_fe = alpha + (beta*mbX).sum(axis=1)
p_fe = logistic(logit_fe)
y_fe = pm.Bernoulli('y_fe', p=p_fe)

Where group is my grouping parameter (e.g. city), a and b are the group-level random effects and alpha, beta are the top-level effects, i.e.

alpha = pm.Normal('alpha', mu=0, sd=1)
a = pm.Normal('a', mu=alpha, sd=a_sd, shape=len(unique_cities))

Or is there a better way to obtain non-group specific predictions?
Thanks.

junpenglao · June 5, 2018, 11:58am

You can do that - but it is not recommended. Auxiliary variables sometimes introduce unintended side effect eg https://github.com/pymc-devs/pymc3/issues/1990

I think a better way is to compute the prediction using the MCMC trace after sampling.

erlendd · June 5, 2018, 12:06pm

Ok I noticed the auxiliary variable was messing around with training. Thanks.

I’m not sure how to use the MCMC trace to compute the non-group specific predictions with pymc3. I’m able to do it in a brute force sort of way, like this,

for this_intercept, this_coef in zip(trace['alpha'], trace['beta'):
    lr = sklearn.linear_model.LogisticRegression()
    lr._intercept = this_intercept
    lr._coef = this_coef
    lr.predict_proba(...)

But I feel like there must be a cleaner way of doing this within pymc3.

junpenglao · June 5, 2018, 12:22pm

To do it within PyMC3, you can try creating a new model solely for prediction:

with pm.Model() as prediction:
    alpha = ...
    beta = ...
    logit_fe = alpha + (beta*mbX).sum(axis=1)
    p_fe = logistic(logit_fe)
    y_fe = pm.Bernoulli('y_fe', p=p_fe)
    # plug in the trace from before.
    ppc = pm.sample_ppc(trace, 1000, vars=[y_fe])

Topic		Replies	Views
Create model matrix Questions	7	1489	June 23, 2021
Understanding dimensions and groups with dot product in hierarchical model Questions	4	535	February 9, 2023
Matrix multiplication in pymc3 Questions development	8	6329	April 24, 2018
Use of coordinates/dimensions in hierarchical models v5 hierarchical	2	932	November 23, 2022
Why isn't NUTS sampling with tt.dot or pm.math.dot? Questions	5	1264	October 7, 2018

Using dot-product in hierarchical or unpooled model

Related topics