Models with different pooling give very different results

aplavin · December 26, 2017, 11:11am

Hi! I’m trying to fit a function having parametric form y = a + b1 x1^p1 + b2 x2^p2 + c x3 to a dataset containing y, x1, x2, x3 values. I really expect the p1, p2 to be quite close and around 0.1…1, and b1 to be close to -b2 - this is accounted in the model parametrization. There are 41 different objects in the dataset, each having several tens of data points. Ideally, I’d like to use a partial-pooling model for all the parameters a, b1, b2, c, but I started with a complete-pooling model and want to compare its results to a no-pooling one.

The code for complete-pooling model:

with pm.Model() as model:
    a0 = pm.Cauchy('a0', 0, 0.5)    
    b0 = pm.Cauchy('b0', 0, 0.1, shape=2)
    bb = pm.Deterministic('bb', tt.stack([b0[0] - b0[1],
                                          b0[0] + b0[1]]))
    c0 = pm.Cauchy('c0', 0, 0.1)

    p1 = pm.Normal('p1', -1, 2)
    p2 = pm.Normal('p2', 0, 0.1)
    p0 = pm.Deterministic('p0', tt.exp(tt.stack([p1, p1 + p2])))

    y_true = (a0 / p0[0]
              + tt.dot(X12**p0, bb / p0)
              + c0 * X3
              )

    sd = pm.Lognormal('sd', -3, 1)
    nu = pm.Exponential('nu', lam=1 / 30)
    cs = pm.StudentT('y', mu=y_true, sd=sd, nu=nu, 
                     observed=y)

    trace = pm.sample(njobs=5)

And for the no-pooling of a, b1, b2 (c and p1, p2 still are pooled):

with pm.Model() as model:
    a = pm.Cauchy('a', 0, 0.1, shape=len(set(obj_id)))
    b = pm.Cauchy('b', 0, 0.2, shape=(2, len(set(obj_id))))
    bb = pm.Deterministic('bb', tt.stack([b[0] - b[1],
                                          b[0] + b[1]]))
    c0 = pm.Cauchy('c0', 0, 0.1)

    p1 = pm.Normal('p1', -1, 2)
    p2 = pm.Normal('p2', 0, 0.2)
    p0 = pm.Deterministic('p0', tt.exp(tt.stack([p1, p1 + p2])))

    y_true = (a[obj_id] / p0[0]
              + tt.batched_dot(X12**p0, bb[:, obj_id].T / p0)
              + c0 * X3
              )

    sd = pm.Lognormal('sd', -3, 1)
    nu = pm.Exponential('nu', lam=1 / 30)
    cs = pm.StudentT('y', mu=y_true, sd=sd, nu=nu, 
                     observed=y)

    trace = pm.sample(njobs=5)

Here X12 is the n x 2 array of x1, x2 values for each point, X3, y are n-sized array of x3, y values, obj_id is n-sized array of IDs of objects (values from 0 to 40). Note that both a and b values are effectively divided by p - this is done to reduce correlation between these variables in posterior (hopefully improving the sampling process) and doesn’t change much.

I expected that the second model would give posteriors for pooled parameters (c, p1, p2) about the same as the first model, and for a, b parameters - more disperse, but still close to “centered” at the results of the first model. However the resulting values are very different, see the traces for the first:

and for the second model:

Most notably, the estimates for p became significantly larger. However, the traces themselves look good to me, indicating no major problems with sampling.

Is there anything wrong with the models?

junpenglao · December 26, 2017, 2:37pm

The model and the trace looks fine to me. Is there any warning during sampling?

If not, I would first plot the y_true with the observed y and check if the fitting is good (You can also do sample_ppc for that). And then I would compute the mean of a in model2 and compare it with the estimation of a0 in model1; similarly for b and compare it with b0. The two should be quite similar to make sense.

aplavin · December 26, 2017, 3:27pm

There are warnings like The acceptance probability in chain 4 does not match the target. It is 0.903201352786, but should be close to 0.8. sometimes for some chains in both models, with actual acceptance probabilities about 0.85-0.95. As I understand reading some questions here, this isn’t considered a big deal.

I plotted y_true vs observed y (using Deterministic for y_true and getting its values directly from the trace instead of sample_ppc), and the plots look like a straight line + noise. Plots of y_true vs residuals y - y_true also have little visible structure.

y_true vs y for 1st model:

y_true vs y for 2nd model:

y_true for 1st vs 2nd model:

The means (and median) for a and b are reasonably close for both models - certainly much closer than those of p. See the means for 1st model (complete pooling) vs 2nd:
For a -0.03 vs 0.03
For b [0.025, 0.4] vs [0.005, 0.41]
For p [0.28, 0.23] vs [0.52, 0.35]

Still no idea what the issue may be or how to debug it further

Topic		Replies	Views
Difference between unpooled model and partial pooling? version agnostic modeling	6	186	August 14, 2024
Pooling, Unpooling and Partial Pooling where each data point is a series of data v5 modeling	4	676	August 18, 2022
Unexpected results from logistic regression model version agnostic	8	546	February 15, 2023
Not Understanding the Posterior Questions	2	425	December 31, 2020
Divergences in varying intercepts hierarchical model Questions	4	602	November 23, 2020

Models with different pooling give very different results

Related topics