I’m getting very strange numbers in the trace for my error term s
.
I’m using a modified version of HierarchicalLogisticRegression
in pycm3_models.
I changed the code to do a regular regression.
def create_model(self):
"""
Creates and returns the PyMC3 model.
Note: The size of the shared variables must match the size of the training data. Otherwise, setting the shared variables later will raise an error. See http://docs.pymc.io/advanced_theano.html
Returns
----------
the PyMC3 model
"""
model_input = theano.shared(np.zeros([self.num_training_samples, self.num_pred]))
model_output = theano.shared(np.zeros(self.num_training_samples))
model_cats = theano.shared(np.zeros(self.num_training_samples, dtype='int'))
self.shared_vars = {
'model_input': model_input,
'model_output': model_output,
'model_cats': model_cats
}
model = pm.Model()
with model:
# intercept
mu_alpha = pm.Normal('mu_alpha', mu=0, sd=100)
sigma_alpha = pm.HalfNormal('sigma_alpha', sd=100)
alpha = pm.Normal('alpha', mu=mu_alpha, sd=sigma_alpha, shape=(self.num_cats,))
# coefficients
mu_beta = pm.Normal('mu_beta', mu=0, sd=100)
sigma_beta = pm.HalfNormal('sigma_beta', sd=100)
betas = pm.Normal('beta', mu=mu_beta, sd=sigma_beta, shape=(self.num_cats, self.num_pred))
# model
mean = alpha[model_cats] + T.sum(betas[model_cats] * model_input, 1)
s = pm.HalfNormal('s', tau=1)
y = pm.Normal('y', mu=mean, sd=s, observed=model_output)
return model
For the particular traing dataset, there are 50+ features, 10+ milllion rows, 100+ categories.
I’m using batch training as described here with size 100: Big Data with HierarchicalRegression
Mean/std of target is approx .0010 and .02, so there is something clearly not working correctly here.
pymc3.summary(pipeline.named_steps['estimator'].trace).loc['s']
mean 102529.488376
sd 201.365938
mc_error 2.249605
hpd_2.5 102125.888260
hpd_97.5 102915.690672
Name: s, dtype: float64
I tried normalizing y with (y - y.mean()) / y.std()
, but the results are similar. I also tried increasing the batch size to 1000, but the results are worse.
Has anyone else experienced this? Is there something that I’m clearly doing wrong?
Thanks.