How to implement a GLM with a personalized link function different from the Identity function?

Hi everybody, I’m new to PyMC3 library and Bayesian modeling so please be a bit patient with me. I’ve read how to create my first GLM model with PyMC3 with this procedure:

with Model() as model: # model specifications in PyMC3 are wrapped in a with-statement
    # Define priors
    sigma = HalfCauchy('sigma', beta=10, testval=1.)
    intercept = Normal('Intercept', 0, sd=20)
    x_coeff = Normal('x', 0, sd=20)
    
    # Define likelihood
    likelihood = Normal('y', mu=intercept + x_coeff * x, 
                        sd=sigma, observed=y)
    
    # Inference!
    trace = sample(progressbar=False) # draw posterior samples using NUTS sampling

My question is: how can I use a personalized link function g()? I would like something like:

    likelihood = Normal('y', mu=g^-1(intercept + x_coeff * x), 
                        sd=sigma, observed=y)

but I’m not sure how to correctly write the function. For example, if I load the expit() function from scipy.special and use it directly in the code like this:

    likelihood = Normal('y', mu=expit(intercept + x_coeff * x), 
                        sd=sigma, observed=y)

I get an error. Can anyone please help me? Thank you :pray:

1 Like

I’ve tried to write a expit (logistic function) function in Theano in this way:

import theano
import theano.tensor as tt

x = tt.vector('x')
out = tt.exp(x)/(1 + tt.exp(x))
expit = theano.function([x], [out])

and write something like:

mu = expit(intercept + x_coeff * x)
# Define likelihood
likelihood = Normal('y', mu= mu, 
                        sd=sigma, observed=y)

but it returns a TypeError:

TypeError: Bad input argument to theano function with name "<ipython-input-85-4eaa907c93f2>:3" at index 0 (0-based).
Expected an array-like object, but found a Variable: maybe you are trying to call a function on a (possibly shared) variable instead of a numeric array?

Anyone who can give me advice on how to proceed would be extremely helpful. Thank you :pray:

1 Like

Hi everybody, I just tried to follow the tutorial on “using a blackbox likelihood function” at https://app.reviewnb.com/pymc-devs/pymc-examples/commit/ec49d7931db9b0be004dd90df48b35edefcd60f3/ . The problem is that when I run the cell with the function:

# use a DensityDist (use a lamdba function to "call" the Op)
    pm.DensityDist("likelihood", lambda v: logl(v), givens={"v": theta})

I get the following error:


TypeError: __init__() got an unexpected keyword argument 'givens'

My version of PyMC3 is v3.11.2 and I ran the code on a M1 MacBook Pro.

Any suggestions on why this happened? Thanks a lot! @junpenglao my last hope :pensive: :pray:

1 Like

IMO, your best bet is to just implement the inverse of your link function using Theano tensor operations. You were on the right track earlier, but you don’t need to actually define a tensor and a Theano function. For your case, just writing

   likelihood = Normal('y', mu=1/(1+tt.exp( - (intercept + x_coeff * x) ) ), 
                        sd=sigma, observed=y)

should work. Not sure, but in your earlier attempt either you were overloading the variable x or PyMC3 wasn’t playing nice with the tensor you defined outside the model.

The above should solve your problem, but I’ll also address your approach using DensityDist. You got that error since givens is not a keyword argument to DensityDist. I think you should be able to just swap givens with observed to get it to work. See this thread for some examples. However, think twice before you do this! If you’re calling scipy.special.expit() when you defined logl(), PyMC3 won’t be able to use the NUTS sampler since it can’t “see” the gradient of an outside black-box function and will fall back on using Metropolis. However, if you use Theano ops when defining logl() then I think it can get the gradients and you wouldn’t have this issue.

In either case, your link function is simple enough that you really just should use the one-liner

likelihood = Normal('y', mu=1/(1+tt.exp( - (intercept + x_coeff * x) ) ), sd=sigma, observed=y)

instead of a DensityDist.

2 Likes

Completely agree with @jlindbloom about not using DensityDist. However, for completeness about how to use it I want to add that the linked thread is outdated. Many of the examples won’t work without using density_dist_obs=False as shown in Using a random variable as observed - #5 by OriolAbril.

2 Likes

Thank you so much guys, your help was really appreciated! :smiley: I applied the function
1/(1+tt.exp( - (intercept + x_coeff * x) ) )
and it works! As a further step I then tried to write a generalized logistic function as in the following equation:
$$ gexpit = ( 1 - 1 / ( 1 + e^x )^lamda )^alpha $$
with alpha and lamda >= 0
And so following the answer of @jlindbloom I wrote:

with Model() as model: # model specifications in PyMC3 are wrapped in a with-statement
    # Define priors
    sigma = HalfCauchy('sigma', beta=10, testval=1.)
    intercept = Normal('Intercept', 0, sd=20)
    x_coeff = Normal('x', 0, sd=20)
    alpha = Uniform('alpha', lower=0.1, upper = 10)
    lamda = Uniform('lamda', lower=0.1, upper = 10)
    
    mu = tt.pow(1 - 1/tt.pow(1 + tt.exp((intercept + x_coeff * x)),lamda), alpha)
        
    
    # Define likelihood
    likelihood = Normal('y', mu=mu, 
                        sd=sigma, observed=y)
    
    # Inference!
    trace = sample(progressbar=True) # draw posterior samples using NUTS sampling

I got the following warning:

trace = sample(progressbar=False) # draw posterior samples using NUTS sampling
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [lamda, alpha, x, Intercept, sigma]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 287 seconds.
There were 51 divergences after tuning. Increase `target_accept` or reparameterize.
There were 207 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.5721068659553271, but should be close to 0.8. Try to increase the number of tuning steps.
There were 95 divergences after tuning. Increase `target_accept` or reparameterize.
There were 148 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.7184942410901761, but should be close to 0.8. Try to increase the number of tuning steps.
The rhat statistic is larger than 1.2 for some parameters.
The estimated number of effective samples is smaller than 200 for some parameters.

Furthermore, if I try to change the likelihood function from a Normal to a Beta(mu,sigma), and write the following code:

with Model() as model: # model specifications in PyMC3 are wrapped in a with-statement
    # Define priors
    sigma = HalfCauchy('sigma', beta=10, testval=1.)
    intercept = Normal('Intercept', 0, sd=20)
    x_coeff = Normal('x', 0, sd=20)
    
    mu = pm.math.invlogit(intercept + x_coeff * x)
        
    
    # Define likelihood
    likelihood = Beta('y', mu=mu, 
                        sigma=sigma, observed=y)
    
    # Inference!
    trace = sample(progressbar=True) # draw posterior samples using NUTS sampling

I get the following error:


SamplingError: Initial evaluation of model at starting point failed!
Starting values:
{'sigma_log__': array(0.), 'Intercept': array(0.), 'x': array(0.)}

Initial evaluation results:
sigma_log__   -2.76
Intercept     -3.91
x             -3.91
y              -inf
Name: Log-probability of test_point, dtype: float64

I think the problem is that the beta distribution is bounded from 0 to 1 but the sampler use values outside these boundaries. Thank you a lot for your time. I think probabilistic programming and bayesian statistics is way more elegant and complete than frequentist statistics and pymc3 library is really important to implement this idea.

1 Like

Hi, so I made a pretty detailed Google Collab notebook addressing these two issues you ran into. You can find it here. TL;DR for your first issue your link function didn’t make sense given your data (or at least the data implied by your priors), and for your second issue you have a problem with the relationship between the mu and sigma parameters of your Beta likelihood. The sigma parameter can’t be chosen independently from the mu parameter like it can with other likelihoods (e.g., Normal).

Perhaps the issue is that you are treating the sigma of the Beta likelihood as if it’s related to the variance of the observational noise? If this is the case, include the noise as an additional term in the invlogit, i.e.

noise = pm.Normal('noise', mu=0, sigma=sigma)
mu = pm.math.invlogit(intercept + x_coeff * x + noise)

where the sigma of noise is the HalfCauchy prior you had earlier, and isn’t the sigma for the Beta likelihood.

I don’t think this quite fixes the issue involving the inequality that mu and sigma must satisfy, you might need to work with a different parameterization of the Beta likelihood. Possibly see this link for some help on this: https://cran.r-project.org/web/packages/betareg/vignettes/betareg.pdf. This would be pretty straightforward to convert between, this parameterization just requires 0 < mu < 1 and phi > 0.

Hope this helps!

3 Likes

Hi @jlindbloom thank you very much for your detailed answer. I have a few questions about what’s going on: For the first problem, the issue is that the data used are incompatible with the chosen link function because the range of the response y is much larger than the mapping of the link function between 0 and 1. Is that correct? But what if I have a dataset that is scaled between 0 and 1 for the x and y variables (or at least for the y variable), and in particular I write:


x = np.linspace(-1, 1, 25)
x = (x - np.min(x))/(np.max(x)-np.min(x))

with pm.Model() as model:

    # Define priors
    sigma = pm.HalfCauchy('sigma', beta=10)
    intercept = pm.Normal('Intercept', 0, sd=20)
    x_coeff = pm.Normal('x', 0, sd=20)
    alpha = pm.Uniform('alpha', lower=0.1, upper = 10)
    lamda = pm.Uniform('lamda', lower=0.1, upper = 10)
    alpha=1.0
    lamda=1.0
    
    mu = tt.pow(1 - 1/tt.pow(1 + tt.exp((intercept + x_coeff * x)),lamda), alpha)
        
    # Define likelihood
    likelihood = pm.Normal('y', mu=mu, sd=sigma, shape=len(x))

    # Get some samples of y
    ppc = pm.sample_prior_predictive(samples=200)
    y = ppc['y'][0,:]

y = (y - np.min(y))/(np.max(y)-np.min(y))
y[y == 1] = 1 - 1e-6 
y[y == 0] = 0 + 1e-6

with pm.Model() as model:

    # Define priors
    sigma = pm.HalfCauchy('sigma', beta=10)
    intercept = pm.Normal('Intercept', 0, sd=20)
    x_coeff = pm.Normal('x', 0, sd=20)
    alpha = pm.Uniform('alpha', lower=0.1, upper = 10)
    lamda = pm.Uniform('lamda', lower=0.1, upper = 10)

    mu = pm.Deterministic('mu', tt.pow(1 - 1/tt.pow(1 + tt.exp((intercept + x_coeff * x)),lamda), alpha))
        
    # Define likelihood
    likelihood = pm.Normal('y', mu=mu, sd=sigma, observed=y)

    # Inference!
    trace = pm.sample(progressbar=True)

In this case unfortunately there are still divergences but I can say that the data and the link function chosen are now “compatible”? The problem with divergences is so linked with the fact that the Hyperparameters alpha and lamda can vary? I rewrite the formula of the generalized link function for convenience:

gexpit = ( 1 - \frac{1}{( 1 + e^x )^{\lambda}} )^{\alpha}

For \alpha = 1 and \lambda = 1 we have that:

gexpit = expit = \frac{e^x}{1 + e^x}

And so we have the logistic function often used.

For the second problem, my idea was in fact to implement two regularized “Ridge” model with the generalized link function that I rewrote now. One with a Normal likelihood and one with a beta likelihood as presented in the link that you posted and that has also a bayesian implementation in https://core.ac.uk/download/pdf/19485102.pdf. The data are scaled in (0,1). In order to have a sort of Ridge regularization I then used a Normal prior for the parameters and a “non informative” Uniform prior for the hyperparameters \alpha and \lambda that must be positive. One of my mistakes was that I supposed that the sigma parameter of the Beta distribution implemented in pymc3 was the “\phi” precision parameter as proposed in the article. So in this case I can directly change the pymc3 beta distribution or I have to use a personalized density distribution? The link between the variance and the mean of y is:

VAR(y) = \frac{ \mu(1-\mu)}{(1+\phi)}

So I must write in the specification of the model something like:


# Define priors:

intercept = pm.Normal('Intercept', 0, sd=20)
x_coeff = pm.Normal('x', 0, sd=20)
alpha = pm.Uniform('alpha', lower=0.1, upper = 10)
lamda = pm.Uniform('lamda', lower=0.1, upper = 10)
    
mu = tt.pow(1 - 1/tt.pow(1 + tt.exp((intercept + x_coeff * x)),lamda), alpha)

# Define the precision phi:

phi = pm.Uniform('phi', lower = 1e-6, upper = 20)

# Define prior on sigma
sigmas = pm.Uniform('sigmas', lower=0.0, upper=tt.sqrt(mu*(1-mu)/(1 + phi)), shape=len(x))

# Define likelihood
likelihood = pm.Beta('y', mu=mu, 
                        sigma=sigmas, shape=len(x))

Is that correct? Sorry for the dumb questions, but I’m really new to all of this stuff and I’m trying to understand well all the advices and suggestions.

1 Like