Model segmental linear regression

I mimicked a segmental linear regression (with three segments by two breakpoints) and built a pymc model to find the breakpoints. However, it failed. Can anyone help to correct my model?

t=np.arange(0,300)
y=np.random.randn(300)
y[0:100]=y[0:100]+10  # breakpoint one is 100
y[200:299]=y[200:299]+20 # breakpoint two is 200

with mod:
    bp1=pm.DiscreteUniform('bp1',1,150)  # breakpoint 1
    bp2=pm.DiscreteUniform('bp2',150,300) # breakpoint 2
    mu1=pm.Normal('mu1',10,1)
    mu2=pm.Normal('mu2',0,1)
    mu3=pm.Normal('mu3',20,1)
    # mus=pm.Normal('mu',mu=[15,0,25],sigma=5)
    sigma=pm.HalfNormal('sigma',1)
    mu=pm.math.switch(y<bp1,mu1,pm.math.switch(y<bp2,mu2,mu3))
    ypred=pm.Normal('ypred',mu=mu,sigma=sigma,observed=y)
with mod:
    ans=pm.sample(300000,cores=25)
az.summary(ans)

Out[14]:                                                                                                ā”‚
          mean      sd   hdi_3%  hdi_97%  mcse_mean  mcse_sd   ess_bulk   ess_tail  r_hat               
bp1     17.942  22.284    1.000   48.000      1.832    1.298      184.0      175.0   1.08             
bp2    224.997  43.587  150.000  291.000      0.018    0.013  5714501.0  6331070.0   1.00              
mu1      9.624   1.156    8.985   10.644      0.219    0.157       30.0       28.0   2.53               ā”‚
mu2     10.862   3.762   -3.885   12.594      0.499    0.466       37.0       30.0   1.89               
mu3     20.721   2.988   16.398   25.967      0.533    0.384       33.0       28.0   2.20               
sigma    7.323   0.827    7.002    8.350      0.157    0.112       30.0       29.0   2.60

You might want to try using a MarginalModel, which will allow you to sample models like this using NUTS. This tutorial is a nice introduction to this feature.

There also appears to be an error in your model. The breaks in the mean should be triggered by the value of t, not y.

Finally, your sampler settings are absolutely nutter butters. Asking for 7.5 million draws is insanely overkill. I would recommend not more 500-1000 draws and 6-8 chains.

Iā€™d recommend looking into the segmented regression aka piecewise regression

Iā€™ve got a notebook on this (in the Julia language) here, though coming back to this years later the images arenā€™t rendering properly. The code might be enough to get the idea though.

EDIT: Looking at your simulated data, the piecewise constant approach might be better than the piecewise linear

Hereā€™s the code for piecewise linear - but your data needs piecewise constant. See if you can give that a go.

import pymc as pm
import numpy as np
import arviz as az
import matplotlib.pyplot as plt

# Simulate data
t = np.arange(0, 300)
y = np.random.randn(300)
y[0:100] += 10  # breakpoint one at 100
y[200:299] += 20  # breakpoint two at 200

# Compute the piecewise linear function based on knots
def piecewise_linear(x, beta_0, beta_1, beta_knots, knots):
    # Start with the intercept and slope for the initial region
    y = beta_0 + beta_1 * x
    # Add terms for each knot
    for i, knot in enumerate(knots):
        y += beta_knots[i] * T(x, knot)
    return y

# Helper function T(x, knot) to determine if x is beyond a given knot
def T(x, knot):
    return np.maximum(0, x - knot)
        
# Define the PyMC model
with pm.Model() as model:
    # Define breakpoints (knots)
    knots = [100, 200]
    
    # Define parameters
    beta_0 = pm.Normal("beta_0", mu=10, sigma=5)  # intercept
    beta_1 = pm.Normal("beta_1", mu=0, sigma=1)   # slope for the first segment
    
    # Coefficients for regions beyond the initial one
    beta_knots = pm.Normal("beta_knots", mu=0, sigma=1, shape=len(knots))

    # Expected value of y based on the piecewise function
    mu = piecewise_linear(t, beta_0, beta_1, beta_knots, knots)

    # Likelihood (sampling distribution) of observations
    sigma = pm.HalfNormal("sigma", sigma=1)
    y_obs = pm.Normal("y_obs", mu=mu, sigma=sigma, observed=y)

    # Inference
    idata = pm.sample()
    idata.extend(pm.sample_posterior_predictive(idata))

# Plotting
plt.figure(figsize=(10, 6))
plt.plot(t, y, "o", markersize=3, label="Observed Data")

# Plot posterior predictive mean
posterior_predictive_mean = idata.posterior_predictive["y_obs"].mean(dim=["chain", "draw"])
plt.plot(t, posterior_predictive_mean, label="Posterior Predictive Mean", color="orange")

# Plot HDI using arviz
az.plot_hdi(t, idata.posterior_predictive["y_obs"], hdi_prob=0.94, color="lightblue", smooth=False)
plt.xlabel("Time")
plt.ylabel("y")
plt.legend()
plt.title("Piecewise Linear Regression with Posterior Predictive and 94% HDI")
plt.show()

actually, hereā€™s an attemptā€¦

def piecewise_constant(x, beta_region, knots):
    # Start with the left-most region
    y = beta_region[0] * T(x, -np.inf, knots[0])
    
    # Add constants for each middle region
    for i in range(len(knots) - 1):
        y += beta_region[i + 1] * T(x, knots[i], knots[i + 1])
    
    # Right-most region
    y += beta_region[-1] * T(x, knots[-1], None)

No attempt has been made to check the priors or make them robust to other datasets. But this should help hopefully?

Sorry itā€™s been a long day. Youā€™ll want to put priors over the break points. Probably need to enforce ordering. Can take a look in the morning if itā€™s not clear how to do that

Thanks! There is a typo in ā€œmu=pm.math.switch(y<bp1,mu1,pm.math.switch(y<bp2,mu2,mu3))ā€ . ā€œyā€ should be ā€œtā€. I didnā€™t notice that. After changing it, it works perfectly! I am so confident about pymc

This one is more useful, as most piecewise regression needs a continuous joint! I tested it ,and it works!

Here is an intuitional model:

t = np.arange(0, 300)
y = np.random.randn(300)
y[0:100] += 10 # breakpoint one at 100
y[200:299] += 20 # breakpoint two at 200

mod=pm.Model()

with mod:
bk1=pm.DiscreteUniform(ā€˜bk1ā€™,lower=1,upper=200)
bk2=pm.DiscreteUniform(ā€˜bk2ā€™,lower=bk1,upper=300)
slopes=pm.Normal(ā€˜slopesā€™,mu=0,sigma=3,shape=3)
betas=pm.Normal(ā€˜betasā€™,mu=0,sigma=3,shape=3)
res=pm.HalfNormal(ā€˜resā€™,3)

t1=t[0:bk1]
t2=t[bk1+1:bk2]
t3=t[bk2+2:299]

y1=slopes[0]+betas[0]*t1
y2=slopes[1]+betas[1]*t2
y2=slopes[2]+betas[2]*t3
mu=np.hstack([y1,y2,y3])

y_obs=pm.Normal('yobs',mu=mu,sigma=res,observed=y)

TypeError: slice indices must be integers or None or have an index method: t1=t[0:bk1]

I couldnā€™t figure it out.

Thatā€™s really cool. How does it work? Are you able to do this tightly or do you marginalize over the entire density? I was considering doing the latter in Stan since we canā€™t pull out a Markov blanket, but I thought itā€™d be too slow. I know Matt Hoffman and Maria Gorinova worked on this problem, too, where they looked for cases in the graphical model where they could unfold analytically.

I have a tutorial in the Stan Userā€™s Guide on how to manually marginalize a change-point model like this, although I only go over the single change point case and I donā€™t think the Stan code can be easily ported to PyMC. The issue with marginalizing is that if there are 300 time points, marginalizing out two cut points is going to cost roughly 300^2 / 2 (about 50K) entire density evaluations, one for each pair of cut point locations. At some point, youā€™ll probably have to resort to discrete sampling with constraints so the change points are ordered properly.

We enumerate over the whole domain to compute the logp of the conditionally dependent variables, not all model variables. If thereā€™s an overlap in the set of conditionally dependent variables of two marginalized variables then you get a quadratic number of evaluations yes.

On the plus side itā€™s all vectorized when possible and you can throw it at a GPU. So that can help somewhat.

Weā€™re also adding QMC marginalization, with which the user can control the number of eval points although that had marginalization of continuous variables in mind and tends to use an ungodly number of evals as well. Not sure if it can be used for discrete variables and if it allows one to grow more linearly with the number of marginalized variables: Marginalize continuous variable via QMC integration by ricardoV94 Ā· Pull Request #353 Ā· pymc-devs/pymc-experimental Ā· GitHub

And weā€™re planning to add Laplace integration as well (not sure about the exact term, @theo will know better)

Then we have special cases like the HMM where we dispatch on the well known algorithm.

By the way this started as a special model subclass but weā€™re refactoring it to be a model transformation you can apply on vanilla models like other transformations: do, observe. So users wonā€™t have to use a different object on their workflow.

2 Likes

What do you mean by this?

My badā€”that was really vague. By ā€œtightlyā€, I meant in computation. If you have a joint density p(\theta, \alpha) where \alpha is a discrete parameter, then you can always sample \alpha using the full density p(\alpha \mid \theta) \propto p(\theta, \alpha). This would be easy for us to add to Stan and would give us full generalized Gibbs. But thereā€™s almost always a much much more computationally efficient form of the conditional p(\alpha \mid \theta). For example, in a graphical model, you can compute the (minimal) Markov blanketā€”thatā€™s what BUGS/JAGS do for generalized Gibbs.

Thanks, I didnā€™t know the term. I think thatā€™s exactly what we are doing. Our transformation creates a subgraph of the rvs that are conditionally dependent on the marginalized variable with inputs being the set of rvs that are inputs to the marginalized variable + dependent variables.

The logp is then computed for this subgraph alone via enumeration/qmc.

Itā€™s very trivial for us to find this partition (Markov blankets?) from the graph representation of PyMC models

Yep!

@bob-carpenter We also have nice quality-of-life features that allow you recover the posterior log-probabilities of the marginalised variables, along with sampling from them as a post-processing step. This lets users replicate the analysis in the Stan Userā€™s Guide without needing to do all the equation crunching