Hi @jlindbloom thank you very much for your detailed answer. I have a few questions about what’s going on: For the first problem, the issue is that the data used are incompatible with the chosen link function because the range of the response y is much larger than the mapping of the link function between 0 and 1. Is that correct? But what if I have a dataset that is scaled between 0 and 1 for the x and y variables (or at least for the y variable), and in particular I write:
x = np.linspace(-1, 1, 25)
x = (x - np.min(x))/(np.max(x)-np.min(x))
with pm.Model() as model:
# Define priors
sigma = pm.HalfCauchy('sigma', beta=10)
intercept = pm.Normal('Intercept', 0, sd=20)
x_coeff = pm.Normal('x', 0, sd=20)
alpha = pm.Uniform('alpha', lower=0.1, upper = 10)
lamda = pm.Uniform('lamda', lower=0.1, upper = 10)
alpha=1.0
lamda=1.0
mu = tt.pow(1 - 1/tt.pow(1 + tt.exp((intercept + x_coeff * x)),lamda), alpha)
# Define likelihood
likelihood = pm.Normal('y', mu=mu, sd=sigma, shape=len(x))
# Get some samples of y
ppc = pm.sample_prior_predictive(samples=200)
y = ppc['y'][0,:]
y = (y - np.min(y))/(np.max(y)-np.min(y))
y[y == 1] = 1 - 1e-6
y[y == 0] = 0 + 1e-6
with pm.Model() as model:
# Define priors
sigma = pm.HalfCauchy('sigma', beta=10)
intercept = pm.Normal('Intercept', 0, sd=20)
x_coeff = pm.Normal('x', 0, sd=20)
alpha = pm.Uniform('alpha', lower=0.1, upper = 10)
lamda = pm.Uniform('lamda', lower=0.1, upper = 10)
mu = pm.Deterministic('mu', tt.pow(1 - 1/tt.pow(1 + tt.exp((intercept + x_coeff * x)),lamda), alpha))
# Define likelihood
likelihood = pm.Normal('y', mu=mu, sd=sigma, observed=y)
# Inference!
trace = pm.sample(progressbar=True)
In this case unfortunately there are still divergences but I can say that the data and the link function chosen are now “compatible”? The problem with divergences is so linked with the fact that the Hyperparameters alpha and lamda can vary? I rewrite the formula of the generalized link function for convenience:
gexpit = ( 1 - \frac{1}{( 1 + e^x )^{\lambda}} )^{\alpha}
For \alpha = 1 and \lambda = 1 we have that:
gexpit = expit = \frac{e^x}{1 + e^x}
And so we have the logistic function often used.
For the second problem, my idea was in fact to implement two regularized “Ridge” model with the generalized link function that I rewrote now. One with a Normal likelihood and one with a beta likelihood as presented in the link that you posted and that has also a bayesian implementation in https://core.ac.uk/download/pdf/19485102.pdf. The data are scaled in (0,1). In order to have a sort of Ridge regularization I then used a Normal prior for the parameters and a “non informative” Uniform prior for the hyperparameters \alpha and \lambda that must be positive. One of my mistakes was that I supposed that the sigma parameter of the Beta distribution implemented in pymc3 was the “\phi” precision parameter as proposed in the article. So in this case I can directly change the pymc3 beta distribution or I have to use a personalized density distribution? The link between the variance and the mean of y is:
VAR(y) = \frac{ \mu(1-\mu)}{(1+\phi)}
So I must write in the specification of the model something like:
# Define priors:
intercept = pm.Normal('Intercept', 0, sd=20)
x_coeff = pm.Normal('x', 0, sd=20)
alpha = pm.Uniform('alpha', lower=0.1, upper = 10)
lamda = pm.Uniform('lamda', lower=0.1, upper = 10)
mu = tt.pow(1 - 1/tt.pow(1 + tt.exp((intercept + x_coeff * x)),lamda), alpha)
# Define the precision phi:
phi = pm.Uniform('phi', lower = 1e-6, upper = 20)
# Define prior on sigma
sigmas = pm.Uniform('sigmas', lower=0.0, upper=tt.sqrt(mu*(1-mu)/(1 + phi)), shape=len(x))
# Define likelihood
likelihood = pm.Beta('y', mu=mu,
sigma=sigmas, shape=len(x))
Is that correct? Sorry for the dumb questions, but I’m really new to all of this stuff and I’m trying to understand well all the advices and suggestions.