Problem with log link in fractional polynomial model

The model is to estimate the Weibull and Gompertz survival functions. There is a variable P1 that you set to 0 for Weibull and 1 for Gompertz.

I think the model that I coded is set up correctly. When P=0 it works, and matches a published result. however, when I set P1=1, it crashed.

The problem statement is probably:


(h=pm.math.exp(Alpha[s,a,0]+Alpha[s,a,1]*timen1/12) does not crash the model)

I think when the sample starts the coefficients Alpha[s,a,1] are too big, and it blows up the exp function. When I scale timen1 the model works.

I could divide timen1 by its max value (72), or by 12 to change times from months to years. Both work in my case.

However, rather than modify the data, I would rather have smaller coefficients that don’t crash the model. I’m looking for a way to get the model to work without re-scaling the data.

Model with data.

First order Fractional Polynomial.txt (9.2 KB)

Code without import data, plus error message:


MEAN=np.array([0.0, 0.0]) 
PREC=np.array([[1.00000E-04, 0],[0.0, 1.00000E-04]])





coords = {"treatments":TREATMENTS,"study":STUDY}


with pm.Model(coords=coords) as femodel:


    # Arm 1

     # Arm 2



Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
SamplingError                             Traceback (most recent call last)
Cell In[140], line 30
     27 p=1-pm.math.exp(-h*dt)
     29 y=pm.Binomial("y",n=z,p=p,observed=r)
---> 30 fedata=pm.sample(10000,tune=5000)

File ~\anaconda3\Lib\site-packages\pymc\sampling\, in sample(draws, tune, chains, cores, random_seed, progressbar, step, nuts_sampler, initvals, init, jitter_max_retries, n_init, trace, discard_tuned_samples, compute_convergence_checks, keep_warning_stat, return_inferencedata, idata_kwargs, nuts_sampler_kwargs, callback, mp_ctx, model, **kwargs)
    720 ip: dict[str, np.ndarray]
    721 for ip in initial_points:
--> 722     model.check_start_vals(ip)
    723     _check_start_shape(model, ip)
    725 # Create trace backends for each chain

File ~\anaconda3\Lib\site-packages\pymc\model\, in Model.check_start_vals(self, start)
   1678 initial_eval = self.point_logps(point=elem)
   1680 if not all(np.isfinite(v) for v in initial_eval.values()):
-> 1681     raise SamplingError(
   1682         "Initial evaluation of model at starting point failed!\n"
   1683         f"Starting values:\n{elem}\n\n"
   1684         f"Logp initial evaluation results:\n{initial_eval}\n"
   1685         "You can call `model.debug()` for more details."
   1686     )

SamplingError: Initial evaluation of model at starting point failed!
Starting values:
{'d_': array([[-6.00971201e-01,  2.92427765e-01],
       [-7.05074601e-01, -7.98760624e-01],
       [-7.13572854e-05,  7.06728906e-01]]), 'mu': array([[-1.19734297e-01, -5.04778866e-01],
       [-7.94497852e-01, -8.42447101e-01],
       [ 6.72822358e-01, -6.63009855e-01],
       [ 6.14768849e-02,  5.03936542e-01],
       [-5.47026089e-01, -5.17259998e-02],
       [-1.88918133e-02,  6.32461311e-01],
       [-3.68636218e-02,  8.74631714e-01],
       [ 7.26899759e-01,  4.72501935e-04]])}

Logp initial evaluation results:
{'d_': -33.14, 'mu': -88.39, 'y': -inf}
You can call `model.debug()` for more details.

What does model.debug() output?

I think the problem is in your p going negative or to zero/one and being incompatible with some of the observations

Can I give the model starting values? I’m getting this message: “SamplingError: Initial evaluation of model at starting point failed!”. I think that if the model started sampling at a better place it might not happen. Like with the other model that I posted here, I’m starting with Winbugs code and trying to make it work with PyMC. The windbugs code does not crash, so I assume that this particular condition gets caught.

Did you put any special code in the pymc.invlogit function to make it more stable.

The winbugs code that I am trying to replicate is:


I think that when you see a function on the left hand side it means that the items on the right hand side are going through the inverse function, in this case exp.

Again, the model works when I divide my time variable by a scalar value (in my case both 12, and 72 worked, and some smaller numbers including 1 did not work.

pm.sample() takes an initvals argument. But as the documentation states, the initial values provided can be overwritten/ignored by various NUTS initialization routines (e.g., the default jitter+adapt_diag routine). You can try other initialization routines (e.g., adapt_diag), but if your model is that brittle, you should probably consider reparameterizing to avoid such problems.

Thanks for the reply. I don’t know how to reparametrize the model without doing something different. Looking at the Bambi code there is a log link function and I see the functions force_within_unit_interval, force_greater_than_zero, and def expit. So it looks like they are worrying about stability and trying to put some bounds on possible return values.

When I use the inverse logit function the model does not cash. But It is the wrong function. So what I need is the right function with some safeguards to keep it in bounds.

The results look healthy when I rescale by dividing though by the maximum time point.

I still think the model should work without crashing given two constraints:

  1. Assume the model is correct, so no re-parametrization
  2. Expect coefficients to adjust rather than conditioning the data.

It looks to me like the model fails leaving the gate and never gets the chance to start sampling in the vicinity of the solution. Dividing by 72 works, but it feels like a hack.

Hmm I would agree that p is the most likely culprit. However funny enough when I run your model (the code you provided without any modifications), it runs without problems. I have checked the time covariates they don’t seem to be scaled in the code. The trace plot I get (with smaller tune=500 and draws=1000) looks fine though not exactly similar to yours:


If the coded you posted above is the problematic one, which version of pymc are you running I ran it on 5.10.4. Have you tried inputting different seeds? You can do it via

rng = np.random.default_rng(seed)
pm.sample(..., random_seed=rng)

I also ran this on 5.13 but it was really slow (estimate 2 hours when I killed it after about 15 mins) as opposed to like 3-5 minutes in 5.10.4, I will investigate that more later. @ricardoV94 Does that have to do with the warning:

FutureWarning: ConstantData is deprecated. All Data variables are now mutable. Use Data instead.

Thanks for looking into it. I’m starting with Winbugs code. for the code in the text file I have p1 set to zero. This worked for me. Now I want to set it to p1 to 1.0. These two setting correspond to estimating a Weibull survival curve (P1=0.0) vs. a Gompertz survival curve (p1=1.0).

When I set p1 to 1.0, it caused the error, and when I divided time by the maximum value (72) it worked and the estimates looked well behaved.

I’m using pymc version 5.11.0

The winbugs code below has both the main items that I’m using plus they estimated other items along with the main model.

Multiple Myeloma example, survival outcomes


This code is part of Dias, Ades, Welton, Jansen and Sutton (2018) Network Meta-Analysis for Decision Making. 
This work should be cited whenever the code is used whether in its standard form or adapted.

#Fixed effects 1st order fractional polynomial model (e.g. Weibull (P1=0) and Gompertz (P1=1))
model{                         	  # *** PROGRAM STARTS
for (j in 1:N){   							   # LOOP THROUGH EVENTS

# time in months transformed according to power P1 
  timen1[j]<-(equals(P1,0)*log(timen[j])+(1-equals(P1,0))*pow(timen[j],P1) ) 

  r[j]~dbin(p[j], z[j])      		  # likelihood according to eq. 
  p[j]<-1-exp(-h[j]*dt[j])   		  # hazard rate in each interval standardized by unit of time

#Fixed effects model
# hazard over time according to FP

for (i in 1:ns){                  # LOOP THROUGH STUDIES
  for (k in 1:na[i]){             # LOOP THROUGH ARMS
    Alpha[i,k,1]<-mu[i,1]+d[t[i,k],1]-d[t[i,1],1] # model for linear predictor of alpha_0 
    Alpha[i,k,2]<-mu[i,2]+d[t[i,k],2]-d[t[i,1],2] # model for linear predictor of alpha_1
for (i in 1:ns){        		      # LOOP THROUGH STUDIES
  mu[i,1:2] ~ dmnorm(mean[1:2],prec[,])   # vague priors for all trial baselines 
d[1,1]<-0        				         # alpha_0 treatment effect is zero for reference treatment
d[1,2]<-0       					        # alpha_1 treatment effect is zero for reference treatment

for (k in 2:nt){			            # LOOP THROUGH TREATMENTS
  d[k,1:2] ~ dmnorm(mean[1:2],prec[,])  # vague priors for treatment effects 

for (m in 1:maxt){     			     # create time points for output 
  time1[m]<-(equals(P1,0)*log(m) + (1-equals(P1,0))*pow(m,P1)   ) 

#Hazard ratios over time for all possible contrasts
for (c in 1:(nt-1)){
  for (k in (c+1):nt){
    for (m in 1:maxt){

# Provide estimates of survival probabilities over time by treatment 
for (k in 1:nt){
  alpha0[k]<-mu[8,1]+d[k,1]       # alpha_0 by treatment using baseline from study 8 
  alpha1[k]<-mu[8,2]+d[k,2]       # alpha_1 by treatment using baseline from study 8 
  for (m in 1:maxt){
    log(HAZARD[k,m])<-alpha0[k]+alpha1[k]*time1[m]  #hazard over time by treatment
	  CUM_H[k,m]<-sum(HAZARD[k,1:m]) # cumulative hazard over time by treatment
    T[k,m]<-1-exp(-CUM_H[k,m])	  # mortality over time by treatment
	  S[k,m]<-1-T[k,m]              # survival over time by treatment
}                                 # *** PROGRAM ENDS

Ok, I was able to replicate the blowup with P=1. I think the problem is indeed the p parameter and initialization. I think the exact initial values do not violate the constraint but the init method is jitter+adap_diag where jitter adds some jitter to values which seems to create the problem.

    fedata=pm.sample(1000,tune=500, init="adapt_diag")

the model samples with following trace and posterior plots:


I suspect if you were able to choose your initial values so that p values were closer to 0.5 (or possibly h is somehwhat positive) then the model would also sample with jitter+adapt_diag.

I am suprised this runs though. Going more into detail, h in the exponential is determined by:


where as p is determined by


So you are exping h twice. I am not surprised weird stuff happens since np.exp(np.exp(6)) is bigger than the number of atoms in the known universe and the inputs to the first exp above are not particularly small. With just an order of increase in magnitudes for h things might blowup spectacularly and that is what happens when you change P=0 to P=1, some timen1 covariates roughly increase hundred fold. I am not experienced with debugs but my understanding of the code you posted suggest that is indeed also what is being done there. Don’t really know the model enough to say if that makes sense or not. Though even without the double exponential, some arguments to the remaining exponential are still as large as couple hundreds (in absolute value), so removing double exponential will not solve the jitter problem for you.

The basic models were done 8-10 years ago. As someone new to this, I don’t want to say that the models that appear in textbooks and papers are not that great. They may not be, and some of the ideas maybe dated. For instance, the use of vague priors (var=10,000) might be a bit much.

As an out, I can divide the time variable though by a constant. I tried using 72, which was the largest time, and 12, which converts from months to years. Both worked.