I am trying to model the dwell times of buses. For that I have 5 parameters and 2 covariate information.
The parameters are beta_0
, beta_1_ON
, beta_1_OFF
, beta_2_ON
and beta_2_OFF
.
The covariates are ON_shared
and OFF_shared
which are shared theano variables.
The log-likelihood is defined by the logistic distribution,
self.alpha = self.beta_1_ON*self.ON_shared + self.beta_1_OFF*self.OFF_shared + self.beta_0
self.beta = self.beta_2_ON*self.ON_shared + self.beta_2_OFF*self.OFF_shared
self.mu = pm.math.log(self.alpha)
self.s = tt.power(self.beta, -1)
log_DWELLS_obs = pm.Logistic('log_DWELLS_obs_{0}'.format(index), mu=self.mu, s=self.s, observed=pm.math.log(dwell_time))
The initial priors are as follows,
self.beta_1_ON = pm.Normal('beta_1_ON_{0}'.format(index), mu=3, sd=1)
self.beta_1_OFF = pm.Normal('beta_1_OFF_{0}'.format(index),mu=3, sd=1)
self.beta_2_ON = pm.Normal('beta_2_ON_{0}'.format(index), mu=1, sd=0.5)
self.beta_2_OFF = pm.Normal('beta_2_OFF_{0}'.format(index), mu=1, sd=0.5)
self.beta_0 = pm.Normal('beta_0_{0}'.format(index), mu=1, sd=0.5)
self.use_prior = False
For subsequent iterations I discard the burn-in
samples and then define a from_posterior
function to generate a distribution from these samples like,
self.beta_1_ON = from_posterior('beta_1_ON_{0}'.format(index), self.beta_1_ON_samples[index][self.beta_1_ON_samples[index] > 0].values)
self.beta_1_OFF = from_posterior('beta_1_OFF_{0}'.format(index), self.beta_1_OFF_samples[index][self.beta_1_OFF_samples[index] > 0].values)
self.beta_2_ON = from_posterior('beta_2_ON_{0}'.format(index), self.beta_2_ON_samples[index][self.beta_2_ON_samples[index] > 0].values)
self.beta_2_OFF = from_posterior('beta_2_OFF_{0}'.format(index), self.beta_2_OFF_samples[index][self.beta_2_OFF_samples[index] > 0].values)
self.beta_0 = from_posterior('beta_0_{0}'.format(index), self.beta_0_samples[index][self.beta_0_samples[index] > 0].values)
The index here is just the current datapoint in our database of bus dwell times. The from_posterior
is defined as follows,
def from_posterior(param, samples):
# Create a posterior distribution for variable param using the given samples.
# Returns the class From_posterior. Calling its method logp returns the log
# probability of observing a particular value.
class From_posterior(pm.Continuous):
def __init__(self, *args, **kwargs):
self.from_posterior_logp = _from_posterior_logp()
super(From_posterior, self).__init__(*args, **kwargs)
def logp(self, value):
return self.from_posterior_logp(value)
class From_posterior_logp:
def __init__(self, samples):
smin, smax = np.min(samples), np.max(samples)
# use 100 points from the smallest to the largest given sample
self.x = np.linspace(smin, smax, 100)
# Evaluate the Gaussian Kernel Density Estimate (KDE) of the samples
# over the points in self.x
self.y = stats.gaussian_kde(samples)(self.x)
def from_posterior_logp(self, value):
# Return the log probability of observing value.
x, y = self.x, self.y
# y0 = np.min(y) / 100 # what was never sampled should have a small probability but not 0
# Return zero probability for a value smaller than the min sample or
# larger than the max sample observed
y0 = 0 # This works, not sure why
return np.array(np.log(np.interp(value, x, y, left=y0, right=y0)))
from_posterior_logp = From_posterior_logp(samples)
def _from_posterior_logp():
@as_op(itypes=[tt.dscalar], otypes=[tt.dscalar])
def logp(value):
return from_posterior_logp.from_posterior_logp(value)
return logp
return From_posterior(param, testval=np.median(samples))
I then make use of the Metropolis-Hastings sampler. This takes about 10 seconds in each iteration. In order to compare other samplers with this one, I decided on using NUTS and HamiltonianMC as well. Since these are gradient based methods, I defined a new from_posterior
function which makes use of the Interpolated
class.
def from_posterior(param, samples):
smin, smax = np.min(samples), np.max(samples)
width = smax - smin
x = np.linspace(smin, smax, 100)
y = stats.gaussian_kde(samples)(x)
# x = np.concatenate([[x[0] - 3 * width], x, [x[-1] + 3 * width]])
x = np.concatenate([[x[0]], x + 1e-10, [x[-1] + 3 * width]])
y = np.concatenate([[0], y, [0]])
return pm.Interpolated(param, x, y)
The problem I am facing is that both NUTS and HMC are significantly slower than Metropolis-Hastings. They take about ~120 seconds and ~40 seconds respectively. But since these methods are gradient based, I was under the impression that they would be efficient and faster compared to Metropolis-Hastings. I think that my approach might be wrong and since I am new to this library I was hoping if you can let me know if I am not using the library in correct manner.