Extreme running time for my graduation project

KrijnVermeulen · June 23, 2025, 9:49am

Hi everyone,
I have a question about my code, I want to do some transition matrice estimation, however my code is running forever, like 3/4 hours. Can anybody help me, its for my graduation project and I desperately need your help. Thanks in advance!

import warnings
import numpy as np
import pymc as pm

# === SUPPRESS WARMING NOISE ===
warnings.filterwarnings("ignore", category=UserWarning, message="Initial distribution not specified*")
warnings.filterwarnings("ignore", category=UserWarning, module="pytensor.tensor.rewriting.elemwise")
warnings.filterwarnings("ignore", category=RuntimeWarning)

# === DATA PREP ===
# thresholds: ndarray of shape (q, q-1), may contain ±inf
# covariates_hist: DataFrame or ndarray of shape (T, p)
# counts_hist: ndarray of shape (T, q, q)  (transition counts)

# clip infinite thresholds to large finite bounds
thresholds = np.nan_to_num(thresholds, neginf=-10.0, posinf=10.0)

# align variable names
transitions = counts              # shape (T, q, q)
covariates  = getattr(covariates, "values", covariates)  # shape (T, p)
covariates_mean = covariates.mean(axis=0)  # mean of covariates
covariates = covariates - covariates_mean  # demean covariates

T, q, _ = transitions.shape
p        = covariates.shape[1]

# === MODEL ===
with pm.Model() as belkin_logit:

    # 1) Priors
    beta = pm.Normal('beta', mu=0, sigma=1, shape=p)     # covariate effects
    rho  = pm.Beta('rho', alpha=2, beta=2, shape=q)      # cohort loadings

    # 2) Latent systemic factor Z_t
    Z = pm.Normal(
        'Z',
        mu=pm.math.dot(covariates, beta),
        sigma=1,
        shape=T,
    )

    # 3) Build the time-and-cohort transition probabilities
    P_list = []
    for m in range(q):
        sqrt_r = pm.math.sqrt(rho[m])
        sd_id   = pm.math.sqrt(1 - rho[m])

        # per-category probabilities at each t, for origin m
        probs_m = []
        for j in range(q):
            if j == 0:
                up = (thresholds[m, 0] - sqrt_r * Z) / sd_id
                probs_m.append(pm.math.sigmoid(up))
            elif j < q - 1:
                low = (thresholds[m, j - 1] - sqrt_r * Z) / sd_id
                up  = (thresholds[m, j    ] - sqrt_r * Z) / sd_id
                probs_m.append(pm.math.sigmoid(up) - pm.math.sigmoid(low))
            else:
                low = (thresholds[m, j - 1] - sqrt_r * Z) / sd_id
                probs_m.append(1 - pm.math.sigmoid(low))

        # stack j into shape (T, q), clip & renormalize
        stacked_m = pm.math.stack(probs_m, axis=1)  # → (T, q)
        clipped_m = pm.math.clip(stacked_m, 1e-6, 1 - 1e-6)
        normed_m  = clipped_m / pm.math.sum(clipped_m, axis=1, keepdims=True)

        P_list.append(normed_m)  # list of length q, each (T, q)

    # 4) form full (T, q, q) tensor: axis-1 indexes origin, axis-2 indexes dest
    P_stack = pm.math.stack(P_list, axis=1)  # → (T, q, q)

    # 5) reshape for vectorized Multinomial: (T*q, q)
    all_p   = P_stack.reshape((T * q, q))
    all_n   = transitions.sum(axis=2).reshape((T * q,))
    all_obs = transitions.reshape((T * q, q))

    # 6) observe
    pm.Multinomial(
        'transitions',
        n=all_n,
        p=all_p,
        observed=all_obs
    )
    step_br = pm.NUTS(vars=[beta, rho], target_accept=0.8)
    step_z  = pm.Slice(vars=[Z])  # or pm.Metropolis(vars=[Z])

    trace = pm.sample(
        draws=200,
        tune=200,
        chains=2,
        cores=2,
        step=[step_br, step_z],
        return_inferencedata=False,
    )

bob-carpenter · June 23, 2025, 12:59pm

What was the motivation for slice sampling Z while using NUTS for the other parameters? I would think it would work better using NUTS for all of them.

Given that you have what looks like an ordinal logit embedded in the middle here, this can be hard to identify. You have to make sure not to have extra degrees of freedom in the model.

If you have to clip to have it not fail, then that can be indicative of other problems in the model. One thing you can do in an ordinal model is to have a zero-avoiding prior (like lognormal) on the space between the cutpoints.

One thing that can help is to generate a simple data set according to the model and make sure that fits. Then when that fits, try it on the real data. It helps isolate the problems of (1) coding the model correctly (and having two implementations, one in simulated data and one in the program can help double check that), and (2) the model being well specified for the actual data.

Topic		Replies	Views
MvNormal much slower in pymc5 than pymc3? v5 modeling	10	572	April 30, 2023
Model taking a long time to run Questions	2	435	October 17, 2020
Negative binomial model with exposure version agnostic gaussian_process , modeling	2	290	February 13, 2024
Estimating parameters of Random Volatility Models using `pytensor.scan` and `pm.CustomDist` v5 bug , modeling	5	482	August 11, 2023
Problem using emissions as likelihood for hidden Markov model version agnostic	6	69	January 8, 2025

Extreme running time for my graduation project

Related topics