Hi,
I am working on a problem to predict a continuous output variable, label
(possible values: [0, infinity)), given a continuous input variable, feature
(possible values: [0, infinity)). Based on previous MCMC work I’ve done with emcee (http://dfm.io/emcee/current/), I have found that the following relationship works reasonably well:
index = feature > threshold
predicted_label[index] = scale * (feature[index] - offset) ** exponent
predicted_label[~index] = 0
With scale ~ 0.43
, offset ~ 31
, exponent ~ 0.66
, and threshold = 40
.
I am now trying to apply the same model using PyMC3, but I am finding that the sampling process gets stuck for long periods of time. Here is sample data and my model:
sample_data.csv (769.1 KB)
sample_data = pd.read_csv(‘sample_data.csv’)
feature = sample_data[‘feature’].values
label = sample_data[‘label’].values
with pm.Model() as model:
threshold = pm.Uniform('threshold', lower=5, upper=50)
scaling = pm.HalfNormal('scaling', sd=0.3)
exponent = pm.Normal('exponent', mu=0.7, sd=0.15)
offset = pm.Uniform('offset', lower=5, upper=50)
model_1 = scaling * (feature - offset) ** exponent
model_2 = np.zeros(len(sample_data))
condition = (feature < threshold) | (feature < offset)
model_ = pm.math.switch(condition, model_2, model_1)
obs = pm.Normal('obs', mu=model_, sd=9, observed=label)
with model:
trace = pm.sample(500, n_init=50000)
I’m currently sitting on sample 37/500. Going from 36/500 to 37/500 took several minutes. Any ideas on what I’m doing wrong would be greatly appreciated!
Thanks,
Shane Bussmann