Extreme running time for my graduation project

Hi everyone,
I have a question about my code, I want to do some transition matrice estimation, however my code is running forever, like 3/4 hours. Can anybody help me, its for my graduation project and I desperately need your help. Thanks in advance!

import warnings
import numpy as np
import pymc as pm

# === SUPPRESS WARMING NOISE ===
warnings.filterwarnings("ignore", category=UserWarning, message="Initial distribution not specified*")
warnings.filterwarnings("ignore", category=UserWarning, module="pytensor.tensor.rewriting.elemwise")
warnings.filterwarnings("ignore", category=RuntimeWarning)

# === DATA PREP ===
# thresholds: ndarray of shape (q, q-1), may contain ±inf
# covariates_hist: DataFrame or ndarray of shape (T, p)
# counts_hist: ndarray of shape (T, q, q)  (transition counts)

# clip infinite thresholds to large finite bounds
thresholds = np.nan_to_num(thresholds, neginf=-10.0, posinf=10.0)

# align variable names
transitions = counts              # shape (T, q, q)
covariates  = getattr(covariates, "values", covariates)  # shape (T, p)
covariates_mean = covariates.mean(axis=0)  # mean of covariates
covariates = covariates - covariates_mean  # demean covariates

T, q, _ = transitions.shape
p        = covariates.shape[1]

# === MODEL ===
with pm.Model() as belkin_logit:

    # 1) Priors
    beta = pm.Normal('beta', mu=0, sigma=1, shape=p)     # covariate effects
    rho  = pm.Beta('rho', alpha=2, beta=2, shape=q)      # cohort loadings

    # 2) Latent systemic factor Z_t
    Z = pm.Normal(
        'Z',
        mu=pm.math.dot(covariates, beta),
        sigma=1,
        shape=T,
    )

    # 3) Build the time-and-cohort transition probabilities
    P_list = []
    for m in range(q):
        sqrt_r = pm.math.sqrt(rho[m])
        sd_id   = pm.math.sqrt(1 - rho[m])

        # per-category probabilities at each t, for origin m
        probs_m = []
        for j in range(q):
            if j == 0:
                up = (thresholds[m, 0] - sqrt_r * Z) / sd_id
                probs_m.append(pm.math.sigmoid(up))
            elif j < q - 1:
                low = (thresholds[m, j - 1] - sqrt_r * Z) / sd_id
                up  = (thresholds[m, j    ] - sqrt_r * Z) / sd_id
                probs_m.append(pm.math.sigmoid(up) - pm.math.sigmoid(low))
            else:
                low = (thresholds[m, j - 1] - sqrt_r * Z) / sd_id
                probs_m.append(1 - pm.math.sigmoid(low))

        # stack j into shape (T, q), clip & renormalize
        stacked_m = pm.math.stack(probs_m, axis=1)  # → (T, q)
        clipped_m = pm.math.clip(stacked_m, 1e-6, 1 - 1e-6)
        normed_m  = clipped_m / pm.math.sum(clipped_m, axis=1, keepdims=True)

        P_list.append(normed_m)  # list of length q, each (T, q)

    # 4) form full (T, q, q) tensor: axis-1 indexes origin, axis-2 indexes dest
    P_stack = pm.math.stack(P_list, axis=1)  # → (T, q, q)

    # 5) reshape for vectorized Multinomial: (T*q, q)
    all_p   = P_stack.reshape((T * q, q))
    all_n   = transitions.sum(axis=2).reshape((T * q,))
    all_obs = transitions.reshape((T * q, q))

    # 6) observe
    pm.Multinomial(
        'transitions',
        n=all_n,
        p=all_p,
        observed=all_obs
    )
    step_br = pm.NUTS(vars=[beta, rho], target_accept=0.8)
    step_z  = pm.Slice(vars=[Z])  # or pm.Metropolis(vars=[Z])

    trace = pm.sample(
        draws=200,
        tune=200,
        chains=2,
        cores=2,
        step=[step_br, step_z],
        return_inferencedata=False,
    )

What was the motivation for slice sampling Z while using NUTS for the other parameters? I would think it would work better using NUTS for all of them.

Given that you have what looks like an ordinal logit embedded in the middle here, this can be hard to identify. You have to make sure not to have extra degrees of freedom in the model.

If you have to clip to have it not fail, then that can be indicative of other problems in the model. One thing you can do in an ordinal model is to have a zero-avoiding prior (like lognormal) on the space between the cutpoints.

One thing that can help is to generate a simple data set according to the model and make sure that fits. Then when that fits, try it on the real data. It helps isolate the problems of (1) coding the model correctly (and having two implementations, one in simulated data and one in the program can help double check that), and (2) the model being well specified for the actual data.

1 Like

Python loops can be pretty inefficient. One place to often get large speed up is by vectorizing these operations. If rho is a vector, then pm.math.sqrt(rho) is also a vector. There is no need to build up the vector one element at a time.

Similarly, the inner loop can also be vectorized away. There is pm.math.switch which can handle conditional operations and pm.math.eq, pm.math.lt to handle equality or less than. The docs on the pymc website are cryptic but those functions are just aliases for pytensor.tensor.switch and so on

3 Likes

@daniel-saunders-phil is spot on. Small technicality that the speed of python loops is irrelevant for PyMC because we don’t run python code at the end of the day.

But the generated code in this case will likely be much worse because it won’t be properly vectorized even if we’re looping in C or Numba. A recent example was discussed in BUG: InconsistencyError Multiple destroyers of CGemv{no_inplace}.0 · Issue #1420 · pymc-devs/pytensor · GitHub

And I’m using this to motivate pushing for the dims module, with the hope of making it more intuitive to write vectorized code: https://github.com/pymc-devs/pymc/pull/7820

1 Like

Hey all,
Thanks for helping me with this, the running time is now better.
I’ll reach out again if I run into any problems!

1 Like