Issues Converting Stan's `Phi_approx` to PyMC

Hi everyone,

I’m working on converting some Stan code into PyMC and came across a function called Phi_approx. As I understand, it’s a fast approximation of the cumulative standard normal distribution function. You can find more information about it here: Stan Functions Reference.

However, I couldn’t find a direct equivalent function in PyMC, so I wrote one myself. But when I compare the model output between PyMC and Stan, there are significant differences. I suspect the issue might lie in my converted Phi_approx function, but I’m not entirely sure what’s wrong.

Here’s the original Stan code:

ndt_i[i,1] = Phi_approx(ndt_mean[1] + ndt_sd[1] * ndt_i_pr[i,1]) * min(RT_min[i,,1]);
# (where i represents participant number)

And here’s my PyMC translation:

with pm.Model(coords=coords) as model_3:
    ...
    ndt_mean = pm.Normal('ndt_mean', mu=0, sigma=1, dims="time")
    ndt_sigma = pm.Normal('ndt_sigma', mu=0, sigma=1, dims="time")
    ndt_i_pr = pm.Normal('ndt_i_pr', mu=0, sigma=1, dims=["subj","time"])
    rv = pm.Normal.dist(0,1)
    phi = pm.logcdf(rv, ndt_mean + ndt_sigma * ndt_i_pr)
    phi = pm.math.exp(phi)
    ndt_i = pm.Deterministic('ndt_i', phi * rt_min, dims=["subj","time"])
    ...
# (Note: rt_min is an array with a shape of (47, 2))

I’m wondering if there’s a misunderstanding on my part regarding the Phi_approx function or if my definition in PyMC is incorrect. Any advice or suggestions would be greatly appreciated!

Thanks in advance for your help!

This looks good, does it give the right answer?

Our computational backend automatically optimizes models to use the best available algorithms, so we try to avoid exposing functions like fast_Phi to users. I’m not if we do something smart for Probit. Run your model and let me know how the timings seem.

Thanks! I’d also like to provide some clarification about my model in case it helps make the situation clearer.

I’m building a shift lognormal model, where the ndt_i above is used to create a shifted parameter for each participant at a certain time condition. This shifted parameter should multiply with their minimum RT.

The model runs well, but the parameter estimation results are not what I expected. As you can see in the code snippet below, my parameters of interest are rho_R_mu and rho_R_sigma. These parameters are drawn from a multivariate normal distribution and are related to mu. In other models that don’t include this shift parameter, the estimation results match the Stan output exactly. This leads me to believe that the problem arises from my definition or usage of the shift parameter, like:

observed = df_flanker.RT - ndt_i[subj_idx, time_idx].eval()

Since the model is somewhat complex, I’ve included only the key lines here:

coords = {"cond": df_flanker['Condition'].unique(),
          "time": df_flanker['Time'].unique(),
          "subj": df_flanker['subj_num'].unique(),
          "trial": df_flanker['trial_idx']}

with pm.Model(coords=coords) as model_3:
    # Define correlation matrices
    rho_R_mu = pm.LKJCorr('rho_R_mu', n=2, eta=1)
    rho_R_sigma = pm.LKJCorr('rho_R_sigma', n=2, eta=1)
    triu_idx = np.triu_indices(2, k=1)
    mu_corr_upper = pt.set_subtensor(pt.zeros((2, 2))[triu_idx], packed_L_R_mu)
    sigma_corr_upper = pt.set_subtensor(pt.zeros((2, 2))[triu_idx], packed_L_R_sigma)
    R_mu = pt.eye(2) + mu_corr_upper + mu_corr_upper.T
    R_sigma = pt.eye(2) + sigma_corr_upper + sigma_corr_upper.T
    L_R_mu = pt.linalg.cholesky(R_mu)
    L_R_sigma = pt.linalg.cholesky(R_sigma)

    ...
    ndt_mean = pm.Normal('ndt_mean', mu=0, sigma=1, dims="time")
    ndt_sigma = pm.Normal('ndt_sigma', mu=0, sigma=1, dims="time")
    ndt_i_pr = pm.Normal('ndt_i_pr', mu=0, sigma=1, dims=["subj","time"])
    rv = pm.Normal.dist(0,1)
    phi = pm.logcdf(rv, ndt_mean + ndt_sigma * ndt_i_pr)
    phi = pm.math.exp(phi)
    ndt_i = pm.Deterministic('ndt_i', phi * rt_min, dims=["subj","time"])

    ...

    time_idx = pm.MutableData("time_idx", df_flanker.Time, dims="trial")
    subj_idx = pm.MutableData("subj_idx", df_flanker.subj_num, dims="trial")
    cond_idx = pm.MutableData("cond_idx", df_flanker.Condition, dims="trial")
    
    likelihood = pm.LogNormal("likelihood",
                                mu=mu[cond_idx, subj_idx, time_idx], 
                                sigma=sigma[cond_idx, subj_idx, time_idx],
                                observed=df_flanker.RT - ndt_i[subj_idx, time_idx].eval(),
                                dims="trial")

I just ran the model, and here’s the timing message (though I’m not sure if this is exactly what you were asking for :sweat_smile:)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (3 chains in 3 jobs)
NUTS: [L_R_mu, L_R_sigma, mu_mean_base, mu_sd_base, sigma_mean_base, sigma_sd_base, mu_mean_delta, mu_sd_delta, sigma_mean_delta, sigma_sd_delta, mu_i_base_pr, sigma_i_base_pr, mu_i_delta_pr, sigma_i_delta_pr, ndt_mean, ndt_sigma, ndt_i_pr]
Sampling 3 chains for 1_000 tune and 3_000 draw iterations (3_000 + 9_000 draws total) took 2521 seconds.
We recommend running at least 4 chains for robust computation of convergence diagnostics

This won’t be random, it will fix a single calue of ndt_i and fit the model conditional on whatever the random draw happens to be. Is this your intention?

No, ndt_i should be random, so I think I should remove the .eval()? However, when I do that, I get an error:

ValueError: setting an array element with a sequence.

I guess the subtraction isn’t being handled correctly, but I’m not sure how to fix it. Do you have any advice?

You can’t have observed data that changes with every draw (it’s not observed that case!) so you need to adjust your model to get ndt_i into the expected value. Basically it is acting like a location parameter – as the documentation for scipy.stats.lognorm notes:

The probability density above is defined in the “standardized” form. To shift
and/or scale the distribution use the loc and scale parameters.
Specifically, lognorm.pdf(x, s, loc, scale) is identically
equivalent to lognorm.pdf(y, s) / scale with
y = (x - loc) / scale. Note that shifting the location of a distribution
does not make it a “noncentral” distribution; noncentral generalizations of
some distributions are available in separate classes.

We don’t offer that parameterize directly, but you can easily implement it either by using pm.CustomDist – here’s some untested code to get you started

def shifted_lognormal(mu, sigma, loc, size=None):
    return loc + pm.LogNormal.dist(mu=mu, sigma=sigma, size=size)

with pm.Model(coords=coords) as model_3:
    # your model...

    obs = pm.CustomDist("obs",
                        mu=mu[cond_idx, subj_idx, time_idx], 
                        sigma=sigma[cond_idx, subj_idx, time_idx],
                        loc=ndt_i[subj_idx, time_idx],
                        observed=df_flanker.RT,
                        dims="trial")

Thanks a lot! pm.CustomDist solved the problem perfectly. I made a few modifications to your code and am sharing the final working version below as a reference—maybe this will help someone else as well :smile:

def shifted_lognormal(mu, sigma, loc, size=None):
        return loc + pm.LogNormal.dist(mu=mu, sigma=sigma, size=size)

with pm.Model(coords=coords) as model_3:
    # your model...
        obs = pm.CustomDist("obs",
                    mu[cond_idx, subj_idx, time_idx], 
                    sigma[cond_idx, subj_idx, time_idx],
                    ndt_i[subj_idx, time_idx],
                    dist = shifted_lognormal,
                    observed=df_flanker.RT,
                    dims="trial")
1 Like