# Issues Converting Stan's `Phi_approx` to PyMC

Hi everyone,

I’m working on converting some Stan code into PyMC and came across a function called `Phi_approx`. As I understand, it’s a fast approximation of the cumulative standard normal distribution function. You can find more information about it here: Stan Functions Reference.

However, I couldn’t find a direct equivalent function in PyMC, so I wrote one myself. But when I compare the model output between PyMC and Stan, there are significant differences. I suspect the issue might lie in my converted `Phi_approx` function, but I’m not entirely sure what’s wrong.

Here’s the original Stan code:

``````ndt_i[i,1] = Phi_approx(ndt_mean[1] + ndt_sd[1] * ndt_i_pr[i,1]) * min(RT_min[i,,1]);
# (where i represents participant number)
``````

And here’s my PyMC translation:

``````with pm.Model(coords=coords) as model_3:
...
ndt_mean = pm.Normal('ndt_mean', mu=0, sigma=1, dims="time")
ndt_sigma = pm.Normal('ndt_sigma', mu=0, sigma=1, dims="time")
ndt_i_pr = pm.Normal('ndt_i_pr', mu=0, sigma=1, dims=["subj","time"])
rv = pm.Normal.dist(0,1)
phi = pm.logcdf(rv, ndt_mean + ndt_sigma * ndt_i_pr)
phi = pm.math.exp(phi)
ndt_i = pm.Deterministic('ndt_i', phi * rt_min, dims=["subj","time"])
...
# (Note: rt_min is an array with a shape of (47, 2))
``````

I’m wondering if there’s a misunderstanding on my part regarding the `Phi_approx` function or if my definition in PyMC is incorrect. Any advice or suggestions would be greatly appreciated!

This looks good, does it give the right answer?

Our computational backend automatically optimizes models to use the best available algorithms, so we try to avoid exposing functions like `fast_Phi` to users. I’m not if we do something smart for Probit. Run your model and let me know how the timings seem.

Thanks! I’d also like to provide some clarification about my model in case it helps make the situation clearer.

I’m building a shift lognormal model, where the `ndt_i` above is used to create a shifted parameter for each participant at a certain time condition. This shifted parameter should multiply with their minimum RT.

The model runs well, but the parameter estimation results are not what I expected. As you can see in the code snippet below, my parameters of interest are `rho_R_mu` and `rho_R_sigma`. These parameters are drawn from a multivariate normal distribution and are related to `mu`. In other models that don’t include this shift parameter, the estimation results match the Stan output exactly. This leads me to believe that the problem arises from my definition or usage of the shift parameter, like:

``````observed = df_flanker.RT - ndt_i[subj_idx, time_idx].eval()
``````

Since the model is somewhat complex, I’ve included only the key lines here:

``````coords = {"cond": df_flanker['Condition'].unique(),
"time": df_flanker['Time'].unique(),
"subj": df_flanker['subj_num'].unique(),
"trial": df_flanker['trial_idx']}

with pm.Model(coords=coords) as model_3:
# Define correlation matrices
rho_R_mu = pm.LKJCorr('rho_R_mu', n=2, eta=1)
rho_R_sigma = pm.LKJCorr('rho_R_sigma', n=2, eta=1)
triu_idx = np.triu_indices(2, k=1)
mu_corr_upper = pt.set_subtensor(pt.zeros((2, 2))[triu_idx], packed_L_R_mu)
sigma_corr_upper = pt.set_subtensor(pt.zeros((2, 2))[triu_idx], packed_L_R_sigma)
R_mu = pt.eye(2) + mu_corr_upper + mu_corr_upper.T
R_sigma = pt.eye(2) + sigma_corr_upper + sigma_corr_upper.T
L_R_mu = pt.linalg.cholesky(R_mu)
L_R_sigma = pt.linalg.cholesky(R_sigma)

...
ndt_mean = pm.Normal('ndt_mean', mu=0, sigma=1, dims="time")
ndt_sigma = pm.Normal('ndt_sigma', mu=0, sigma=1, dims="time")
ndt_i_pr = pm.Normal('ndt_i_pr', mu=0, sigma=1, dims=["subj","time"])
rv = pm.Normal.dist(0,1)
phi = pm.logcdf(rv, ndt_mean + ndt_sigma * ndt_i_pr)
phi = pm.math.exp(phi)
ndt_i = pm.Deterministic('ndt_i', phi * rt_min, dims=["subj","time"])

...

time_idx = pm.MutableData("time_idx", df_flanker.Time, dims="trial")
subj_idx = pm.MutableData("subj_idx", df_flanker.subj_num, dims="trial")
cond_idx = pm.MutableData("cond_idx", df_flanker.Condition, dims="trial")

likelihood = pm.LogNormal("likelihood",
mu=mu[cond_idx, subj_idx, time_idx],
sigma=sigma[cond_idx, subj_idx, time_idx],
observed=df_flanker.RT - ndt_i[subj_idx, time_idx].eval(),
dims="trial")
``````

I just ran the model, and here’s the timing message (though I’m not sure if this is exactly what you were asking for )

``````Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (3 chains in 3 jobs)
NUTS: [L_R_mu, L_R_sigma, mu_mean_base, mu_sd_base, sigma_mean_base, sigma_sd_base, mu_mean_delta, mu_sd_delta, sigma_mean_delta, sigma_sd_delta, mu_i_base_pr, sigma_i_base_pr, mu_i_delta_pr, sigma_i_delta_pr, ndt_mean, ndt_sigma, ndt_i_pr]
Sampling 3 chains for 1_000 tune and 3_000 draw iterations (3_000 + 9_000 draws total) took 2521 seconds.
We recommend running at least 4 chains for robust computation of convergence diagnostics
``````

This won’t be random, it will fix a single calue of `ndt_i` and fit the model conditional on whatever the random draw happens to be. Is this your intention?

No, `ndt_i` should be random, so I think I should remove the `.eval()`? However, when I do that, I get an error:

ValueError: setting an array element with a sequence.

I guess the subtraction isn’t being handled correctly, but I’m not sure how to fix it. Do you have any advice?

You can’t have observed data that changes with every draw (it’s not observed that case!) so you need to adjust your model to get `ndt_i` into the expected value. Basically it is acting like a location parameter – as the documentation for `scipy.stats.lognorm` notes:

The probability density above is defined in the “standardized” form. To shift
and/or scale the distribution use the `loc` and `scale` parameters.
Specifically, `lognorm.pdf(x, s, loc, scale)` is identically
equivalent to `lognorm.pdf(y, s) / scale` with
`y = (x - loc) / scale`. Note that shifting the location of a distribution
does not make it a “noncentral” distribution; noncentral generalizations of
some distributions are available in separate classes.

We don’t offer that parameterize directly, but you can easily implement it either by using `pm.CustomDist` – here’s some untested code to get you started

``````def shifted_lognormal(mu, sigma, loc, size=None):
return loc + pm.LogNormal.dist(mu=mu, sigma=sigma, size=size)

with pm.Model(coords=coords) as model_3:

obs = pm.CustomDist("obs",
mu=mu[cond_idx, subj_idx, time_idx],
sigma=sigma[cond_idx, subj_idx, time_idx],
loc=ndt_i[subj_idx, time_idx],
observed=df_flanker.RT,
dims="trial")
``````

Thanks a lot! `pm.CustomDist` solved the problem perfectly. I made a few modifications to your code and am sharing the final working version below as a reference—maybe this will help someone else as well

``````def shifted_lognormal(mu, sigma, loc, size=None):
return loc + pm.LogNormal.dist(mu=mu, sigma=sigma, size=size)

with pm.Model(coords=coords) as model_3: