I mimicked a segmental linear regression (with three segments by two breakpoints) and built a pymc model to find the breakpoints. However, it failed. Can anyone help to correct my model?
t=np.arange(0,300)
y=np.random.randn(300)
y[0:100]=y[0:100]+10 # breakpoint one is 100
y[200:299]=y[200:299]+20 # breakpoint two is 200
with mod:
bp1=pm.DiscreteUniform('bp1',1,150) # breakpoint 1
bp2=pm.DiscreteUniform('bp2',150,300) # breakpoint 2
mu1=pm.Normal('mu1',10,1)
mu2=pm.Normal('mu2',0,1)
mu3=pm.Normal('mu3',20,1)
# mus=pm.Normal('mu',mu=[15,0,25],sigma=5)
sigma=pm.HalfNormal('sigma',1)
mu=pm.math.switch(y<bp1,mu1,pm.math.switch(y<bp2,mu2,mu3))
ypred=pm.Normal('ypred',mu=mu,sigma=sigma,observed=y)
You might want to try using a MarginalModel, which will allow you to sample models like this using NUTS. This tutorial is a nice introduction to this feature.
There also appears to be an error in your model. The breaks in the mean should be triggered by the value of t, not y.
Finally, your sampler settings are absolutely nutter butters. Asking for 7.5 million draws is insanely overkill. I would recommend not more 500-1000 draws and 6-8 chains.
Iāve got a notebook on this (in the Julia language) here, though coming back to this years later the images arenāt rendering properly. The code might be enough to get the idea though.
EDIT: Looking at your simulated data, the piecewise constant approach might be better than the piecewise linear
Hereās the code for piecewise linear - but your data needs piecewise constant. See if you can give that a go.
import pymc as pm
import numpy as np
import arviz as az
import matplotlib.pyplot as plt
# Simulate data
t = np.arange(0, 300)
y = np.random.randn(300)
y[0:100] += 10 # breakpoint one at 100
y[200:299] += 20 # breakpoint two at 200
# Compute the piecewise linear function based on knots
def piecewise_linear(x, beta_0, beta_1, beta_knots, knots):
# Start with the intercept and slope for the initial region
y = beta_0 + beta_1 * x
# Add terms for each knot
for i, knot in enumerate(knots):
y += beta_knots[i] * T(x, knot)
return y
# Helper function T(x, knot) to determine if x is beyond a given knot
def T(x, knot):
return np.maximum(0, x - knot)
# Define the PyMC model
with pm.Model() as model:
# Define breakpoints (knots)
knots = [100, 200]
# Define parameters
beta_0 = pm.Normal("beta_0", mu=10, sigma=5) # intercept
beta_1 = pm.Normal("beta_1", mu=0, sigma=1) # slope for the first segment
# Coefficients for regions beyond the initial one
beta_knots = pm.Normal("beta_knots", mu=0, sigma=1, shape=len(knots))
# Expected value of y based on the piecewise function
mu = piecewise_linear(t, beta_0, beta_1, beta_knots, knots)
# Likelihood (sampling distribution) of observations
sigma = pm.HalfNormal("sigma", sigma=1)
y_obs = pm.Normal("y_obs", mu=mu, sigma=sigma, observed=y)
# Inference
idata = pm.sample()
idata.extend(pm.sample_posterior_predictive(idata))
# Plotting
plt.figure(figsize=(10, 6))
plt.plot(t, y, "o", markersize=3, label="Observed Data")
# Plot posterior predictive mean
posterior_predictive_mean = idata.posterior_predictive["y_obs"].mean(dim=["chain", "draw"])
plt.plot(t, posterior_predictive_mean, label="Posterior Predictive Mean", color="orange")
# Plot HDI using arviz
az.plot_hdi(t, idata.posterior_predictive["y_obs"], hdi_prob=0.94, color="lightblue", smooth=False)
plt.xlabel("Time")
plt.ylabel("y")
plt.legend()
plt.title("Piecewise Linear Regression with Posterior Predictive and 94% HDI")
plt.show()
def piecewise_constant(x, beta_region, knots):
# Start with the left-most region
y = beta_region[0] * T(x, -np.inf, knots[0])
# Add constants for each middle region
for i in range(len(knots) - 1):
y += beta_region[i + 1] * T(x, knots[i], knots[i + 1])
# Right-most region
y += beta_region[-1] * T(x, knots[-1], None)
No attempt has been made to check the priors or make them robust to other datasets. But this should help hopefully?
Sorry itās been a long day. Youāll want to put priors over the break points. Probably need to enforce ordering. Can take a look in the morning if itās not clear how to do that
Thanks! There is a typo in āmu=pm.math.switch(y<bp1,mu1,pm.math.switch(y<bp2,mu2,mu3))ā . āyā should be ātā. I didnāt notice that. After changing it, it works perfectly! I am so confident about pymc
Thatās really cool. How does it work? Are you able to do this tightly or do you marginalize over the entire density? I was considering doing the latter in Stan since we canāt pull out a Markov blanket, but I thought itād be too slow. I know Matt Hoffman and Maria Gorinova worked on this problem, too, where they looked for cases in the graphical model where they could unfold analytically.
I have a tutorial in the Stan Userās Guide on how to manually marginalize a change-point model like this, although I only go over the single change point case and I donāt think the Stan code can be easily ported to PyMC. The issue with marginalizing is that if there are 300 time points, marginalizing out two cut points is going to cost roughly 300^2 / 2 (about 50K) entire density evaluations, one for each pair of cut point locations. At some point, youāll probably have to resort to discrete sampling with constraints so the change points are ordered properly.
We enumerate over the whole domain to compute the logp of the conditionally dependent variables, not all model variables. If thereās an overlap in the set of conditionally dependent variables of two marginalized variables then you get a quadratic number of evaluations yes.
On the plus side itās all vectorized when possible and you can throw it at a GPU. So that can help somewhat.
And weāre planning to add Laplace integration as well (not sure about the exact term, @theo will know better)
Then we have special cases like the HMM where we dispatch on the well known algorithm.
By the way this started as a special model subclass but weāre refactoring it to be a model transformation you can apply on vanilla models like other transformations: do, observe. So users wonāt have to use a different object on their workflow.
My badāthat was really vague. By ātightlyā, I meant in computation. If you have a joint density p(\theta, \alpha) where \alpha is a discrete parameter, then you can always sample \alpha using the full density p(\alpha \mid \theta) \propto p(\theta, \alpha). This would be easy for us to add to Stan and would give us full generalized Gibbs. But thereās almost always a much much more computationally efficient form of the conditional p(\alpha \mid \theta). For example, in a graphical model, you can compute the (minimal) Markov blanketāthatās what BUGS/JAGS do for generalized Gibbs.
Thanks, I didnāt know the term. I think thatās exactly what we are doing. Our transformation creates a subgraph of the rvs that are conditionally dependent on the marginalized variable with inputs being the set of rvs that are inputs to the marginalized variable + dependent variables.
The logp is then computed for this subgraph alone via enumeration/qmc.
Itās very trivial for us to find this partition (Markov blankets?) from the graph representation of PyMC models
@bob-carpenter We also have nice quality-of-life features that allow you recover the posterior log-probabilities of the marginalised variables, along with sampling from them as a post-processing step. This lets users replicate the analysis in the Stan Userās Guide without needing to do all the equation crunching