Hi! I’m trying to fit a function having parametric form y = a + b1 x1^p1 + b2 x2^p2 + c x3
to a dataset containing y, x1, x2, x3
values. I really expect the p1, p2
to be quite close and around 0.1…1, and b1
to be close to -b2
- this is accounted in the model parametrization. There are 41 different objects in the dataset, each having several tens of data points. Ideally, I’d like to use a partial-pooling model for all the parameters a, b1, b2, c
, but I started with a complete-pooling model and want to compare its results to a no-pooling one.
The code for complete-pooling model:
with pm.Model() as model:
a0 = pm.Cauchy('a0', 0, 0.5)
b0 = pm.Cauchy('b0', 0, 0.1, shape=2)
bb = pm.Deterministic('bb', tt.stack([b0[0] - b0[1],
b0[0] + b0[1]]))
c0 = pm.Cauchy('c0', 0, 0.1)
p1 = pm.Normal('p1', -1, 2)
p2 = pm.Normal('p2', 0, 0.1)
p0 = pm.Deterministic('p0', tt.exp(tt.stack([p1, p1 + p2])))
y_true = (a0 / p0[0]
+ tt.dot(X12**p0, bb / p0)
+ c0 * X3
)
sd = pm.Lognormal('sd', -3, 1)
nu = pm.Exponential('nu', lam=1 / 30)
cs = pm.StudentT('y', mu=y_true, sd=sd, nu=nu,
observed=y)
trace = pm.sample(njobs=5)
And for the no-pooling of a, b1, b2
(c
and p1, p2
still are pooled):
with pm.Model() as model:
a = pm.Cauchy('a', 0, 0.1, shape=len(set(obj_id)))
b = pm.Cauchy('b', 0, 0.2, shape=(2, len(set(obj_id))))
bb = pm.Deterministic('bb', tt.stack([b[0] - b[1],
b[0] + b[1]]))
c0 = pm.Cauchy('c0', 0, 0.1)
p1 = pm.Normal('p1', -1, 2)
p2 = pm.Normal('p2', 0, 0.2)
p0 = pm.Deterministic('p0', tt.exp(tt.stack([p1, p1 + p2])))
y_true = (a[obj_id] / p0[0]
+ tt.batched_dot(X12**p0, bb[:, obj_id].T / p0)
+ c0 * X3
)
sd = pm.Lognormal('sd', -3, 1)
nu = pm.Exponential('nu', lam=1 / 30)
cs = pm.StudentT('y', mu=y_true, sd=sd, nu=nu,
observed=y)
trace = pm.sample(njobs=5)
Here X12
is the n x 2
array of x1, x2
values for each point, X3, y
are n
-sized array of x3, y
values, obj_id
is n
-sized array of IDs of objects (values from 0 to 40). Note that both a
and b
values are effectively divided by p
- this is done to reduce correlation between these variables in posterior (hopefully improving the sampling process) and doesn’t change much.
I expected that the second model would give posteriors for pooled parameters (c, p1, p2
) about the same as the first model, and for a, b
parameters - more disperse, but still close to “centered” at the results of the first model. However the resulting values are very different, see the traces for the first:
and for the second model:
Most notably, the estimates for
p
became significantly larger. However, the traces themselves look good to me, indicating no major problems with sampling.
Is there anything wrong with the models?