I think it might be better to start with a Mixture model - I spent some time to think about what would be the smart way to pass a transformation to Normal to model this, but at the end maybe it is more straightforward to write a Box-Cox Normal distribution:
from pymc3.distributions.continuous import Continuous
from pymc3.theanof import gradient
class BoxCoxNormal(Continuous):
def __init__(self, mu=0., sd=1., lmbda=1., **kwargs):
self.sd = tt.as_tensor_variable(sd)
self.mu = tt.as_tensor_variable(mu)
self.lmbda = tt.as_tensor_variable(lmbda)
super(BoxCoxNormal, self).__init__(**kwargs)
def inv_boxcox_func(self, x):
return tt.exp(tt.log1p(self.lmbda * x) / self.lmbda)
def boxcox_func(self, y):
return tt.expm1(self.lmbda * tt.log(y)) / self.lmbda
def jacobian_det(self, x):
x = tt.as_tensor_variable(x)
grad = tt.reshape(
gradient(tt.sum(self.boxcox_func(x)), [x]), x.shape)
return tt.log(tt.abs_(grad))
def logp(self, value):
sd = self.sd
mu = self.mu
value_ = self.boxcox_func(value)
return pm.Normal.dist(mu, sd).logp(value_) + self.jacobian_det(value)
I cherry-picked code from the scipy box-cox (and it’s inverse) transformation function, and the jacobian_det computation from pymc3 transformed.
So it works like this:
mu0, sd0 = 3., .05
y = np.random.normal(mu0, sd0, size=200)
lam = -.2
y_tr = special.inv_boxcox(y, lam)
with pm.Model() as m:
mu = pm.Normal('mu', 0., 10.)
sd = pm.HalfNormal('sd', 5.)
y_latent = BoxCoxNormal('y', mu, sd, lmbda=lam, observed=y_tr)
trace = pm.sample(5000, tune=1000)
pm.traceplot(trace, lines=dict(mu=mu0, sd=sd0));
Which it can recover the mu and sd.
To model the mixture, I go with something like below:
with pm.Model() as m:
w = pm.Dirichlet('w', a=np.ones(2))
mus = pm.Normal('mus', 2.8, 5., shape=2)
sds = pm.HalfNormal('sds', .5, shape=2)
mix_logp = [BoxCoxNormal.dist(mus[0], sds[0], lmbda=ld_1),
BoxCoxNormal.dist(mus[1], sds[1], lmbda=ld_2),]
obs = pm.Mixture('y', w, mix_logp, observed=combi_data)
Inference of mixture is pretty difficult with NUTS (you can do a search on the discourse for some related discussion), but for this model using find_MAP seems to performed quite well:
with m:
map1 = pm.find_MAP()
trace = pm.sample(1000, tune=1000, start=map1)
To also inference lambda, however, it’s a bit difficult, as I think the Box-Cox transformation is very sensitive to the lambda parameter. I tried with the model below, but the inference is very slow. I think reparameterization, more informative prior, and/or specify inference for this parameter is needed.
with pm.Model() as m:
w = pm.Dirichlet('w', a=np.ones(2))
mus = pm.Normal('mus', 2.8, 5., shape=2)
sds = pm.HalfNormal('sds', .5, shape=2)
lmbdas = pm.InverseGamma('lambdas', 8., 2., shape=2,
testval=np.asarray([-ld_1, -ld_2]))
mix_logp = [BoxCoxNormal.dist(mus[0], sds[0], lmbda=-lmbdas[0]),
BoxCoxNormal.dist(mus[1], sds[1], lmbda=-lmbdas[1]),]
obs = pm.Mixture('y', w, mix_logp, observed=combi_data)
You can find everything in the notebook here: https://github.com/junpenglao/Planet_Sakaar_Data_Science/blob/master/PyMC3QnA/Box-Cox%20transformation.ipynb