Model segmental linear regression

liyangsdu · November 5, 2024, 4:00am

I mimicked a segmental linear regression (with three segments by two breakpoints) and built a pymc model to find the breakpoints. However, it failed. Can anyone help to correct my model?

t=np.arange(0,300)
y=np.random.randn(300)
y[0:100]=y[0:100]+10  # breakpoint one is 100
y[200:299]=y[200:299]+20 # breakpoint two is 200

with mod:
    bp1=pm.DiscreteUniform('bp1',1,150)  # breakpoint 1
    bp2=pm.DiscreteUniform('bp2',150,300) # breakpoint 2
    mu1=pm.Normal('mu1',10,1)
    mu2=pm.Normal('mu2',0,1)
    mu3=pm.Normal('mu3',20,1)
    # mus=pm.Normal('mu',mu=[15,0,25],sigma=5)
    sigma=pm.HalfNormal('sigma',1)
    mu=pm.math.switch(y<bp1,mu1,pm.math.switch(y<bp2,mu2,mu3))
    ypred=pm.Normal('ypred',mu=mu,sigma=sigma,observed=y)

with mod:
    ans=pm.sample(300000,cores=25)
az.summary(ans)

Out[14]:                                                                                                │
          mean      sd   hdi_3%  hdi_97%  mcse_mean  mcse_sd   ess_bulk   ess_tail  r_hat               
bp1     17.942  22.284    1.000   48.000      1.832    1.298      184.0      175.0   1.08             
bp2    224.997  43.587  150.000  291.000      0.018    0.013  5714501.0  6331070.0   1.00              
mu1      9.624   1.156    8.985   10.644      0.219    0.157       30.0       28.0   2.53               │
mu2     10.862   3.762   -3.885   12.594      0.499    0.466       37.0       30.0   1.89               
mu3     20.721   2.988   16.398   25.967      0.533    0.384       33.0       28.0   2.20               
sigma    7.323   0.827    7.002    8.350      0.157    0.112       30.0       29.0   2.60

jessegrabowski · November 5, 2024, 1:12pm

You might want to try using a MarginalModel, which will allow you to sample models like this using NUTS. This tutorial is a nice introduction to this feature.

There also appears to be an error in your model. The breaks in the mean should be triggered by the value of t, not y.

Finally, your sampler settings are absolutely nutter butters. Asking for 7.5 million draws is insanely overkill. I would recommend not more 500-1000 draws and 6-8 chains.

drbenvincent · November 5, 2024, 8:13pm

I’d recommend looking into the segmented regression aka piecewise regression

I’ve got a notebook on this (in the Julia language) here, though coming back to this years later the images aren’t rendering properly. The code might be enough to get the idea though.

EDIT: Looking at your simulated data, the piecewise constant approach might be better than the piecewise linear

drbenvincent · November 5, 2024, 8:28pm

Here’s the code for piecewise linear - but your data needs piecewise constant. See if you can give that a go.

import pymc as pm
import numpy as np
import arviz as az
import matplotlib.pyplot as plt

# Simulate data
t = np.arange(0, 300)
y = np.random.randn(300)
y[0:100] += 10  # breakpoint one at 100
y[200:299] += 20  # breakpoint two at 200

# Compute the piecewise linear function based on knots
def piecewise_linear(x, beta_0, beta_1, beta_knots, knots):
    # Start with the intercept and slope for the initial region
    y = beta_0 + beta_1 * x
    # Add terms for each knot
    for i, knot in enumerate(knots):
        y += beta_knots[i] * T(x, knot)
    return y

# Helper function T(x, knot) to determine if x is beyond a given knot
def T(x, knot):
    return np.maximum(0, x - knot)
        
# Define the PyMC model
with pm.Model() as model:
    # Define breakpoints (knots)
    knots = [100, 200]
    
    # Define parameters
    beta_0 = pm.Normal("beta_0", mu=10, sigma=5)  # intercept
    beta_1 = pm.Normal("beta_1", mu=0, sigma=1)   # slope for the first segment
    
    # Coefficients for regions beyond the initial one
    beta_knots = pm.Normal("beta_knots", mu=0, sigma=1, shape=len(knots))

    # Expected value of y based on the piecewise function
    mu = piecewise_linear(t, beta_0, beta_1, beta_knots, knots)

    # Likelihood (sampling distribution) of observations
    sigma = pm.HalfNormal("sigma", sigma=1)
    y_obs = pm.Normal("y_obs", mu=mu, sigma=sigma, observed=y)

    # Inference
    idata = pm.sample()
    idata.extend(pm.sample_posterior_predictive(idata))

# Plotting
plt.figure(figsize=(10, 6))
plt.plot(t, y, "o", markersize=3, label="Observed Data")

# Plot posterior predictive mean
posterior_predictive_mean = idata.posterior_predictive["y_obs"].mean(dim=["chain", "draw"])
plt.plot(t, posterior_predictive_mean, label="Posterior Predictive Mean", color="orange")

# Plot HDI using arviz
az.plot_hdi(t, idata.posterior_predictive["y_obs"], hdi_prob=0.94, color="lightblue", smooth=False)
plt.xlabel("Time")
plt.ylabel("y")
plt.legend()
plt.title("Piecewise Linear Regression with Posterior Predictive and 94% HDI")
plt.show()

drbenvincent · November 5, 2024, 8:38pm

actually, here’s an attempt…

def piecewise_constant(x, beta_region, knots):
    # Start with the left-most region
    y = beta_region[0] * T(x, -np.inf, knots[0])
    
    # Add constants for each middle region
    for i in range(len(knots) - 1):
        y += beta_region[i + 1] * T(x, knots[i], knots[i + 1])
    
    # Right-most region
    y += beta_region[-1] * T(x, knots[-1], None)

No attempt has been made to check the priors or make them robust to other datasets. But this should help hopefully?

Sorry it’s been a long day. You’ll want to put priors over the break points. Probably need to enforce ordering. Can take a look in the morning if it’s not clear how to do that

liyangsdu · November 6, 2024, 1:36am

Thanks! There is a typo in “mu=pm.math.switch(y<bp1,mu1,pm.math.switch(y<bp2,mu2,mu3))” . “y” should be “t”. I didn’t notice that. After changing it, it works perfectly! I am so confident about pymc

liyangsdu · November 6, 2024, 1:40am

This one is more useful, as most piecewise regression needs a continuous joint! I tested it ,and it works!

liyangsdu · November 6, 2024, 4:51am

Here is an intuitional model:

t = np.arange(0, 300)
y = np.random.randn(300)
y[0:100] += 10 # breakpoint one at 100
y[200:299] += 20 # breakpoint two at 200

mod=pm.Model()

with mod:
bk1=pm.DiscreteUniform(‘bk1’,lower=1,upper=200)
bk2=pm.DiscreteUniform(‘bk2’,lower=bk1,upper=300)
slopes=pm.Normal(‘slopes’,mu=0,sigma=3,shape=3)
betas=pm.Normal(‘betas’,mu=0,sigma=3,shape=3)
res=pm.HalfNormal(‘res’,3)

t1=t[0:bk1]
t2=t[bk1+1:bk2]
t3=t[bk2+2:299]

y1=slopes[0]+betas[0]*t1
y2=slopes[1]+betas[1]*t2
y2=slopes[2]+betas[2]*t3
mu=np.hstack([y1,y2,y3])

y_obs=pm.Normal('yobs',mu=mu,sigma=res,observed=y)

TypeError: slice indices must be integers or None or have an index method: t1=t[0:bk1]

I couldn’t figure it out.

bob-carpenter · November 6, 2024, 10:46pm

That’s really cool. How does it work? Are you able to do this tightly or do you marginalize over the entire density? I was considering doing the latter in Stan since we can’t pull out a Markov blanket, but I thought it’d be too slow. I know Matt Hoffman and Maria Gorinova worked on this problem, too, where they looked for cases in the graphical model where they could unfold analytically.

I have a tutorial in the Stan User’s Guide on how to manually marginalize a change-point model like this, although I only go over the single change point case and I don’t think the Stan code can be easily ported to PyMC. The issue with marginalizing is that if there are 300 time points, marginalizing out two cut points is going to cost roughly 300^2 / 2 (about 50K) entire density evaluations, one for each pair of cut point locations. At some point, you’ll probably have to resort to discrete sampling with constraints so the change points are ordered properly.

ricardoV94 · November 7, 2024, 5:43am

We enumerate over the whole domain to compute the logp of the conditionally dependent variables, not all model variables. If there’s an overlap in the set of conditionally dependent variables of two marginalized variables then you get a quadratic number of evaluations yes.

On the plus side it’s all vectorized when possible and you can throw it at a GPU. So that can help somewhat.

We’re also adding QMC marginalization, with which the user can control the number of eval points although that had marginalization of continuous variables in mind and tends to use an ungodly number of evals as well. Not sure if it can be used for discrete variables and if it allows one to grow more linearly with the number of marginalized variables: Marginalize continuous variable via QMC integration by ricardoV94 · Pull Request #353 · pymc-devs/pymc-experimental · GitHub

And we’re planning to add Laplace integration as well (not sure about the exact term, @theo will know better)

Then we have special cases like the HMM where we dispatch on the well known algorithm.

By the way this started as a special model subclass but we’re refactoring it to be a model transformation you can apply on vanilla models like other transformations: do, observe. So users won’t have to use a different object on their workflow.

ricardoV94 · November 7, 2024, 1:44pm

What do you mean by this?

bob-carpenter · November 7, 2024, 3:36pm

My bad—that was really vague. By “tightly”, I meant in computation. If you have a joint density p(\theta, \alpha) where \alpha is a discrete parameter, then you can always sample \alpha using the full density p(\alpha \mid \theta) \propto p(\theta, \alpha). This would be easy for us to add to Stan and would give us full generalized Gibbs. But there’s almost always a much much more computationally efficient form of the conditional p(\alpha \mid \theta). For example, in a graphical model, you can compute the (minimal) Markov blanket—that’s what BUGS/JAGS do for generalized Gibbs.

ricardoV94 · November 7, 2024, 4:01pm

Thanks, I didn’t know the term. I think that’s exactly what we are doing. Our transformation creates a subgraph of the rvs that are conditionally dependent on the marginalized variable with inputs being the set of rvs that are inputs to the marginalized variable + dependent variables.

The logp is then computed for this subgraph alone via enumeration/qmc.

It’s very trivial for us to find this partition (Markov blankets?) from the graph representation of PyMC models

zaxtax · November 9, 2024, 3:33pm

Yep!

@bob-carpenter We also have nice quality-of-life features that allow you recover the posterior log-probabilities of the marginalised variables, along with sampling from them as a post-processing step. This lets users replicate the analysis in the Stan User’s Guide without needing to do all the equation crunching

Topic		Replies	Views
Estimating an ideal point model Questions	1	423	December 4, 2018
Segmentation Fault with pm.MatrixNormal Questions	6	580	August 15, 2019
Calculation based on columns selected from a multivariate prior fails to broadcast v5 modeling	2	399	November 28, 2022
First attempt at linear regressions with pymc Questions	0	439	February 17, 2021
Initial evaluation of model at starting point failed when running a mixture model Questions bug	2	1200	October 8, 2021

Model segmental linear regression

Related topics