Boosting Efficiency of Skewed-t Models in PyMC

Hey there PyMC community! I’ve been exploring some recent research, where they’ve demonstrated that a skew-t model works rather well for fitting gene enrichment data. I came across a thread here on this forum (here) that discusses how to use PyMC to handle skew-t models with a custom function. In there, they’ve used a logp function named logp_skewt (I’m presuming it’s for log-likelihood?). To understand it better, I created and tested a basic toy example. I ran into a bit of a snag though while referring to another thread (here), which mentions that the sampling process is rather slow. Can anyone help me speed things up? Am I using PyMC correctly here? If not, what am I missing? The complexity of my real data, which includes a hierarchical model for analyzing the 12,000 group models and 35,000 observations, is making me a bit nervous.

Here’s a quick look at my toy example, which includes the data generation process, prior check, and PyMC model:

# Data Generation Process
X ~ NB(mu = 10, alpha = 0.3)
b ~ Normal(mu = 2.5, sigma = 0.2)
y = b * X

For my prior prediction test, I ran the following code with some varying parameters.
ST parametrs:

\begin{align} \nu &\sim Gamma(2,0.1)\\ \sigma &\sim E(\frac{1}{std(y)}\\ \alpha &\sim HalfCauchy(3)\\ \end{align}

Looking at the priors plot, it’s evident that the sigma prior is exerting a significant effect on the data. Can anyone suggest a better choice for the sigma prior in this specific situation? Any input would be greatly appreciated!

Fig_prior_skewt.pdf (19.8 KB)

Here’s a snippet for you:

# Code for prior prediction test
def min_mean_max(arr):
    return {0:np.mean(arr),1:np.min(arr),2:np.max(arr)}
def skewT_test(sigma_value,nu_value,alpha_value,X_rand,Y_rand):
    mu = pt.scalar("mu")
    nu = pt.scalar("nu")
    sigma = pt.scalar("sigma")
    alpha = pt.scalar("alpha")
    value = pt.scalar("value")

    rv_t = pm.logp(pm.StudentT.dist(nu, mu=mu, sigma=sigma), value)
    rv_T = pm.logcdf(pm.StudentT.dist(nu+1, mu=mu, sigma=sigma), alpha*value*pm.math.sqrt((nu+1)/(mu**2+nu)))
    pdf = []
    for x,y in zip(X_rand,Y_rand):
        pdf_temp = pm.math.log(sigma_value).eval() + rv_t.eval({nu: nu_value, mu: x, sigma: sigma_value, value:y }) + rv_T.eval({nu: nu_value, mu: x,sigma: sigma_value,alpha:alpha_value, value:y}) - pm.math.log(sigma_value).eval()
        pdf.append(np.exp(pdf_temp))
    df = pd.DataFrame({'x': Y_rand, 'y': pdf})
    df = df.sort_values(by=['y'])
    return df

X_rand = pm.draw(pm.NegativeBinomial.dist(mu=10, alpha=0.3),draws=50, random_seed=RANDOM_SEED)
b = pm.Normal.dist(2.5,1,shape = 50)
Y_rand = X_rand*b.eval()

nu_dist = pm.Gamma.dist(2,0.1,shape = 50)
nu_dict = min_mean_max(nu_dist.eval())

sigma_value = pm.Exponential.dist(1/np.std(X_rand)**2,shape = 50)
sigma_dict = min_mean_max(sigma_value.eval())

alpha_value = pm.HalfCauchy.dist(3,shape = 50)
alpha_dict = min_mean_max(alpha_value.eval())

fig, axes = plt.subplots(3, 3,figsize=(11.5, 5))
x = np.arange(0,20)
for i in range(3):
    for j in range(3):
        df = skewT_test(sigma_value = sigma_dict[j],nu_value = nu_dict[i],alpha_value = alpha_dict[i],X_rand = X_rand,Y_rand = Y_rand)
        axes[i,j].bar(df['x'].values, df['y'].values)
        axes[i,j].set_title(f'alpha:{np.round(alpha_dict[i],1)} \n nu:{np.round(nu_dict[i],1)} \n sigma {np.round(sigma_dict[j],1)}')
plt.subplots_adjust(wspace = 0.5, hspace=1)

And here is the PyMC model:

# The PyMC model
def logp_skewt(value: TensorVariable, nu: TensorVariable, mu: TensorVariable, sigma: TensorVariable, alpha: TensorVariable) -> TensorVariable:
    return (pm.math.log(2) + 
        pm.logp(pm.StudentT.dist(nu, mu=mu, sigma=sigma), value) + 
        pm.logcdf(pm.StudentT.dist(nu, mu=mu, sigma=sigma), alpha*value) - 
        pm.math.log(sigma))
with pm.Model() as model1:
    beta = pm.Normal('beta',2.5,1)
    mu = beta*X_rand
    sigma = pm.Exponential("sigma",1/np.std(Y_rand))
    nu = pm.Gamma("nu",2,0.1)
    alpha = pm.HalfCauchy("alpha",3)
    count = pm.CustomDist("count", nu,mu,  sigma, alpha,logp=logp_skewt,observed=Y_rand)
    trace = pm.sample(1000, tune=1000, target_accept=0.95, random_seed=RANDOM_SEED)

So, eager to hear your thoughts and advice on this. Thanks, everyone!

Just an update - I’ve attempted to optimize the priors using find_constrained_prior. Below is an example of what I did:

opt_Nu = pm.find_constrained_prior(
    pm.Gamma,
    lower=1,
    upper=5,
    mass=0.9,
    init_guess={"alpha": 2,"beta":0.1},
    mass_below_lower=0
)

After the optimization, I noticed a dimension incompatibility while running the likelihood function using the parameters supplied by my model.

mu_ = pt.scalar("mu_")
nu_ = pt.scalar("nu_")
sigma_ = pt.scalar("sigma_")
alpha_ = pt.scalar("alpha")
value_ = pt.scalar("value_")
rv_t = pm.logp(pm.StudentT.dist(nu_, mu=mu_, sigma=sigma_), value_)
rv_T = pm.logcdf(pm.StudentT.dist(nu_+1, mu=mu_, sigma=sigma_), alpha_*value_)
pm.math.log(sigma.eval()).eval() + rv_t.eval({nu_: nu.eval(), mu_: mu.eval(), sigma_: sigma.eval(), value_:Y_rand }) + rv_T.eval({nu_: nu.eval(), mu: mu_.eval(),sigma: sigma_.eval(),alpha:alpha_.eval(), value:Y_rand}) - pm.math.log(sigma.eval()).eval()

To address this, I broadcasted the parameter into an array and voila, problem solved.

med_y = np.median(Y_rand)
auxMat = np.repeat(1,len(Y_rand))
with pm.Model() as model1:
    beta = pm.Normal('beta',2.1,0.1)
    mu = beta*X_rand_float
    sigma = med_y + pm.Exponential("sigma",0.5)
    nu = pm.Gamma("nu",9.6,2.7)
    alpha = pm.HalfNormal('alpha',3)
    count = pm.CustomDist("count", nu.eval()*auxMat, mu, sigma.eval()*auxMat, alpha.eval()*auxMat
                          ,logp=logp_skewt,observed=Y_rand)
    trace = pm.sample(1000, tune=1000, target_accept=0.95, random_seed=RANDOM_SEED)

Regarding the time consumption, running a model with approximately 12,000 parameters took around 6 hours using JAX, which I think is quite reasonable.

However, I’m a bit uncertain about whether multiplying by the auxiliary matrix is the best method. There’s a chance that other methods might further decrease the time requirement. Any suggestions or guidance on this would be really appreciated!