Extreme running time for my graduation project

Hi everyone,
I have a question about my code, I want to do some transition matrice estimation, however my code is running forever, like 3/4 hours. Can anybody help me, its for my graduation project and I desperately need your help. Thanks in advance!

import warnings
import numpy as np
import pymc as pm

# === SUPPRESS WARMING NOISE ===
warnings.filterwarnings("ignore", category=UserWarning, message="Initial distribution not specified*")
warnings.filterwarnings("ignore", category=UserWarning, module="pytensor.tensor.rewriting.elemwise")
warnings.filterwarnings("ignore", category=RuntimeWarning)

# === DATA PREP ===
# thresholds: ndarray of shape (q, q-1), may contain ±inf
# covariates_hist: DataFrame or ndarray of shape (T, p)
# counts_hist: ndarray of shape (T, q, q)  (transition counts)

# clip infinite thresholds to large finite bounds
thresholds = np.nan_to_num(thresholds, neginf=-10.0, posinf=10.0)

# align variable names
transitions = counts              # shape (T, q, q)
covariates  = getattr(covariates, "values", covariates)  # shape (T, p)
covariates_mean = covariates.mean(axis=0)  # mean of covariates
covariates = covariates - covariates_mean  # demean covariates

T, q, _ = transitions.shape
p        = covariates.shape[1]

# === MODEL ===
with pm.Model() as belkin_logit:

    # 1) Priors
    beta = pm.Normal('beta', mu=0, sigma=1, shape=p)     # covariate effects
    rho  = pm.Beta('rho', alpha=2, beta=2, shape=q)      # cohort loadings

    # 2) Latent systemic factor Z_t
    Z = pm.Normal(
        'Z',
        mu=pm.math.dot(covariates, beta),
        sigma=1,
        shape=T,
    )

    # 3) Build the time-and-cohort transition probabilities
    P_list = []
    for m in range(q):
        sqrt_r = pm.math.sqrt(rho[m])
        sd_id   = pm.math.sqrt(1 - rho[m])

        # per-category probabilities at each t, for origin m
        probs_m = []
        for j in range(q):
            if j == 0:
                up = (thresholds[m, 0] - sqrt_r * Z) / sd_id
                probs_m.append(pm.math.sigmoid(up))
            elif j < q - 1:
                low = (thresholds[m, j - 1] - sqrt_r * Z) / sd_id
                up  = (thresholds[m, j    ] - sqrt_r * Z) / sd_id
                probs_m.append(pm.math.sigmoid(up) - pm.math.sigmoid(low))
            else:
                low = (thresholds[m, j - 1] - sqrt_r * Z) / sd_id
                probs_m.append(1 - pm.math.sigmoid(low))

        # stack j into shape (T, q), clip & renormalize
        stacked_m = pm.math.stack(probs_m, axis=1)  # → (T, q)
        clipped_m = pm.math.clip(stacked_m, 1e-6, 1 - 1e-6)
        normed_m  = clipped_m / pm.math.sum(clipped_m, axis=1, keepdims=True)

        P_list.append(normed_m)  # list of length q, each (T, q)

    # 4) form full (T, q, q) tensor: axis-1 indexes origin, axis-2 indexes dest
    P_stack = pm.math.stack(P_list, axis=1)  # → (T, q, q)

    # 5) reshape for vectorized Multinomial: (T*q, q)
    all_p   = P_stack.reshape((T * q, q))
    all_n   = transitions.sum(axis=2).reshape((T * q,))
    all_obs = transitions.reshape((T * q, q))

    # 6) observe
    pm.Multinomial(
        'transitions',
        n=all_n,
        p=all_p,
        observed=all_obs
    )
    step_br = pm.NUTS(vars=[beta, rho], target_accept=0.8)
    step_z  = pm.Slice(vars=[Z])  # or pm.Metropolis(vars=[Z])

    trace = pm.sample(
        draws=200,
        tune=200,
        chains=2,
        cores=2,
        step=[step_br, step_z],
        return_inferencedata=False,
    )

What was the motivation for slice sampling Z while using NUTS for the other parameters? I would think it would work better using NUTS for all of them.

Given that you have what looks like an ordinal logit embedded in the middle here, this can be hard to identify. You have to make sure not to have extra degrees of freedom in the model.

If you have to clip to have it not fail, then that can be indicative of other problems in the model. One thing you can do in an ordinal model is to have a zero-avoiding prior (like lognormal) on the space between the cutpoints.

One thing that can help is to generate a simple data set according to the model and make sure that fits. Then when that fits, try it on the real data. It helps isolate the problems of (1) coding the model correctly (and having two implementations, one in simulated data and one in the program can help double check that), and (2) the model being well specified for the actual data.

1 Like