I think we can celebrate now.
So I initially thought that your likelihood you wrote down in an earlier post was wrong and that your code looked too simple, but I did the math and some terms actually cancel out. That being said, you had one small error. When you calculated the contribution from the likelihood for the data where the maximum didn’t change, you need to use x_dist.logcdf(o_n_minus_1) rather than x_dist.logp(o_n). Can you see why? Here’s what I get when I make this change.
Ok, so a little less biased perhaps. But it still seems like something weird is going on.
One thing you may have noticed is that if you increase the number of observations in your model, it takes a very long time to compile the model (yet sampling is very fast). This is because you are using a for loop in your code. I mentioned earlier that you should use theano.scan, but actually we can just vectorized the whole likelihood calculation like this:
# Sample parameters
N = 1000
mu = 10.
sigma = 1.0
# Sample generation
ts = np.arange(N)
obss = []
obs = -math.inf
for t in ts:
aux = mu + sigma*np.random.randn()
obs = np.max([obs, aux])
obss.append(obs)
obss = np.array(obss)
### First, filter for the data where there were jumps in the running max
jump_idxs = np.where(np.diff(obss) > 0)[0] + 1 # This gives you the indices of the array where the max changed (except for the first element)
jump_idxs = np.insert(jump_idxs, 0, 0) # Include the first element as a jump
jump_idxs_right_before = jump_idxs[1:] - 1 # This gives you the indices right before the change
jump_data = obss[jump_idxs] # Slice out the data for the actual jumps. Serves as our o_n for the jumps
jump_data_right_before = obss[jump_idxs_right_before] # Same but for the right before the jumps (we'll use these as o_n_minus_1)
jump_data_right_before = np.insert(jump_data_right_before, 0, -np.inf) # Correct for first element
### Second, filter for the data where the running max didn't change
flat_idxs = np.setdiff1d(np.arange(len(obss)), jump_idxs)
flat_idxs_right_before = flat_idxs - 1
flat_data = obss[flat_idxs] # Serves as our o_n for the data with no_jumps
flat_data_right_before = obss[flat_idxs_right_before]
# PyMC3 model
def logp(jump_data, flat_data_right_before, mu, sigma):
x_dist = pm.Normal.dist(mu=mu, sigma=sigma)
log_likelihood = pm.math.constant(0.)
# Add likelihood contribution from the jump data
log_likelihood += pm.math.sum(x_dist.logp(jump_data))
# Add likelihood contribution from the flat data
log_likelihood += pm.math.sum(x_dist.logcdf(flat_data_right_before))
return log_likelihood
with pm.Model() as model:
mu = pm.Uniform('mu', 0., 20.)
sigma = pm.Uniform('sigma', 0.1, 10.0)
Ys = pm.DensityDist('Ys', logp, observed = {'jump_data':jump_data, 'flat_data_right_before':flat_data_right_before, 'mu': mu, 'sigma': sigma})
trace = pm.sample(20000, chains=4, return_inferencedata=True, idata_kwargs={"density_dist_obs": False})
Basically, I just used the factorization of the likelihood to vectorized the contributions from both the data where max changed/didn’t change. You’ll find that this code runs significantly faster, especially when you increase your number of observations N. I’m not sure if there was something wrong with the code from earlier, but in this new code, the posterior on sigma actually looks a lot different/better. This is with N = 100:
I’ve found that sometimes you will get chains with a huge number of divergences, whereas sometimes you will get chains with none at all. I’m not sure what might be causing this, but I bet it has something to do with the discrete/continuous nature of this problem. Perhaps we violate an assumption for using NUTS and something like Metropolis is better to use in this case. I’ll have to think more about this.

