Multi-Nonlinear Robust Regression

Dear all, I would like to kindly ask for your support. I am working on the regression. I add one of the many examples I tried to fit. I am afraid I have something fundamentally wrong in my model.
I am runnig on PyMC v5.3.0.

Perfect fit is performed with the equation:

y_ideal = 14*x1**2/x2**3 + 32

Fig: on the left side y_ideal, with y_observed is depicted, on right side the distribution of y_observed is displayed

practically, all chains out of 8 finds the correct parameters in equation. Please see, the code of the model bellow

Model which fail to fit:

y_ideal = -14*x1**2/x2**3 + 32

Fig: on the left side the y_ideal, with the y_observed is depicted, on right side the distribution of y_observed is displayed. The only difference is the modified sign.

If I add the minus, I am lucky if at least one or two chains are successful. The model works better (but still not satisfactory) if I turn the gumble to true. Using the gumble is not the best solution. I believe the optimizer should be independent on the sign, right ? Can you help me please, where is my logical error in the model ?

#define random see

#define random inputs
samples = 50
x1  = rng1.uniform(0.3, 3,  size=samples)
x2  = rng2.uniform(0.3, 3,  size=samples)
x3  = rng3.uniform(0.3, 3, size=samples)

#define noise
noise = rng3.uniform(0., 1.0, size=samples)
noise  = noise - np.mean(noise)

#define regression function
y_ideal = 14*x1**2/x2**3 + 32
y_observed = y_ideal + 0.2*noise

#plot observed data and observed distribution
plt.figure(figsize = (20,4))
plt.plot(y_observed ,"c--")

with pm.Model() as model_test:
    #prior for uknowns
    gumble = False
    alpha_   = 0
    alpha_amp= mean_alpha_
    beta_    = std_beta_
    amp0     = pm.Cauchy(f"amp0"  , alpha=alpha_amp, beta=beta_)
    beta0   = pm.Cauchy(f"beta0", alpha=alpha_   , beta=beta_)
    beta1   = pm.Cauchy(f"beta1", alpha=alpha_   , beta=beta_)

    if gumble:
        alpha   = pm.Gumbel("alpha" , mu=np.mean(y_observed), beta=np.std(y_observed))
        alpha    = pm.Cauchy("alpha",  alpha=alpha_, beta=beta_)

    epsilon = pm.HalfCauchy("epsilon", beta=0.5)

    mju_  = pm.Exponential("mju_", lam = 1/31)
    mju   = mju_+1
    mju   = pm.Deterministic("mju",  mju)
    #posterior for observed
    mean = amp0*x1**beta0/x2**beta1 + alpha
    #mean = amp0*x1**(beta0) - amp1*x2**beta1  + amp2*x3**(beta2) + alpha
    mean = pm.Deterministic("mean", mean)
    ypred = pm.StudentT("y_pred", mu=mean, sigma=epsilon, nu=mju, observed = y_observed) 

    #inference button
    step = pm.NUTS(target_accept=0.8, max_treedepth=18)
    trace = pm.sample(draws=500, chains=8, cores=12, tune=500, return_inferencedata=True, random_seed=[1,2,3,4,5,6,7,8], \
                                         step=step, discard_tuned_samples=True, jitter_max_retries=1)

#plot results
amp0  = trace["posterior"]["amp0"]. mean(["draw"]).values[chain]
#amp1  = trace["posterior"]["amp1"]. mean(["draw"]).values[chain]
#amp2  = trace["posterior"]["amp2"]. mean(["draw"]).values[chain]
beta0 = trace["posterior"]["beta0"].mean(["draw"]).values[chain]
beta1 = trace["posterior"]["beta1"].mean(["draw"]).values[chain]
#beta2 = trace["posterior"]["beta2"].mean(["draw"]).values[chain]
alpha = trace["posterior"]["alpha"].mean(["draw"]).values[chain]
pred = amp0*x1**beta0/x2**beta1 + alpha 
#pred = amp0*x1**(beta0) - amp1*x2**beta1  + amp2*x3**(beta2) + alpha

Successful fit of the equation with positive parameters:

Fig: dashed blue line represents prediction, red line represents observation

Un-successful fit of the equation with negative parameters:

Fig: dashed blue line represents prediction, red line represents observation

I was able to reproduce the problem locally thanks to your code. I don’t think there is anything logically wrong if your model per se. It’s just a hard inference problem - the functional form you are trying to estimate with multiple nested exponents is tricky and your priors are working against finding the correct parameters.

The cauchy prior on amp0 is pretty tight around 0. The real parameter is fairly far away, at -14 or 14. The tails of the cauchy are pretty wide too so once we get far away from 0, there is a huge range of parameters that have nearly equal prior probability. So if the chain doesn’t locate -14 quickly, it will start looking for weird combinations of parameters that can work. It turns out that if you crank down the nu parameter on the StudentT, there a bunch of weird combinations of parameters that work. I found that when my chains failed to find -14, they would also anchor on nu parameters at nearly 1. This causes the likelihood distribution to become nearly flat - getting the mean close to the data doesn’t improve the likelihood very much when the studentT is superflat.

Why does it show up in the negative case but not the positive case? Well, sometimes it does show up in the positive case. So i think just bad luck or bad random seeds were leading to this odd behaviour.

There are a bunch of ways you might fix this. The priors could give amp0 a better clue of where to look. Or you could switch away from a studentT outcome distribution. Your data isn’t very noisy so the extra flexibility of the studentT hurts you rather than helps you.


Dear Daniel, many thanks for your kind reply with the problem explanation. I will try to modified the model as you suggested. I belive I will come back next week with few additional questions. Once again, many thanks for your support.

1 Like