I am working on a problem to predict a continuous output variable,
label (possible values: [0, infinity)), given a continuous input variable,
feature (possible values: [0, infinity)). Based on previous MCMC work I’ve done with emcee (http://dfm.io/emcee/current/), I have found that the following relationship works reasonably well:
index = feature > threshold predicted_label[index] = scale * (feature[index] - offset) ** exponent predicted_label[~index] = 0
scale ~ 0.43,
offset ~ 31,
exponent ~ 0.66, and
threshold = 40.
I am now trying to apply the same model using PyMC3, but I am finding that the sampling process gets stuck for long periods of time. Here is sample data and my model:
sample_data.csv (769.1 KB)
sample_data = pd.read_csv(‘sample_data.csv’)
feature = sample_data[‘feature’].values
label = sample_data[‘label’].values
with pm.Model() as model: threshold = pm.Uniform('threshold', lower=5, upper=50) scaling = pm.HalfNormal('scaling', sd=0.3) exponent = pm.Normal('exponent', mu=0.7, sd=0.15) offset = pm.Uniform('offset', lower=5, upper=50) model_1 = scaling * (feature - offset) ** exponent model_2 = np.zeros(len(sample_data)) condition = (feature < threshold) | (feature < offset) model_ = pm.math.switch(condition, model_2, model_1) obs = pm.Normal('obs', mu=model_, sd=9, observed=label) with model: trace = pm.sample(500, n_init=50000)
I’m currently sitting on sample 37/500. Going from 36/500 to 37/500 took several minutes. Any ideas on what I’m doing wrong would be greatly appreciated!