Chains converge to different results

I’m running a polynomial regression between a multimodal temperature variable “temperature” and a normally-distributed variable “gdp”. I am having trouble understanding why the sampling chains in my regression analysis converge to two distinct values for the regression coefficients. Model code below:

with pm.Model() as model:
    
    temp_prior = pm.Normal("temp_prior", 0, 30, shape=(1,3))
    temp_std = pm.HalfNormal("temp_std", 30, shape=3)
    temp_mixture_weights = pm.Dirichlet("temp_mixture_weights", np.array([1]*3))
    temperature = pm.NormalMixture(
        "temperature", 
        temp_mixture_weights, 
        temp_prior, 
        temp_std, 
        observed=temperature_observed
    )
   
    coef1 = pm.Normal('temp_gdp_change_coef',0,10)
    coef2 = pm.Normal('temp_gdp_change_coef2',0,10)
    intercept = pm.Normal('intercept',0,10)
   
    gdp_prior = pm.Deterministic(
        "gdp_prior",
        intercept + 
        (coef1 * temperature) + 
        (coef2 * pt.sqr(temperature))
    )
    gdp_std = pm.HalfNormal('gdp_std', sigma=10)
    gdp = pm.Normal('gdp', mu=gdp_prior, sigma=gdp_std, observed=gdp_observed)

The traceplot shows that, for both the coef1 and coef2 variables, the chains converge on two different distributions. An independent regression analysis shows that for the upper plot, the rightward, more positive distribution is the “correct” answer, whereas for the lower plot, the leftward, more negative distribution is “correct”.

I’m wondering if anybody can help me to

  1. build intuition as to how different chains could converge to different values given the same problem
  2. help me to improve my model so that the chains can hopefully all converge to a single value that matches the result of the independent regression

Much appreciated!

My strong suspicion is that each solution the posterior finds is associated with a mixture component of temperature. You can dig into this by looking at az.plot_pair, coloring the scatter plots by chain.

You will also notice that the regression solution corresponds to the “peakier” of the two components. So I wouldn’t call that solution correct; it’s just finding the posterior mode and discarding all the information about the other regions of high probability.

It’s hard to suggest improvements without knowing more about the data, the underlying scientific model, and the type of question you’re trying to answer. Nevertheless, here’s some spitballing:

  • Why is temperature a mixture at all? If you just want to know the effect of temperature on GDP, why not just run that regression directly?

  • Is a mixture the best way to model the multi-modality in temperature, or is there a single indicator variable you can use to capture it (before/after a date? region?)?

  • If it’s too complex to be captured by simple indicator variables, maybe a GP prior over a continuous variable that can absorb the multi-modality, like time or longitude/latitude?

  • Maybe temperature and GDP should be jointly modeled as a SEM using a multivariate normal?

  • If the best way to model temperature really is with a mixture of components, maybe GDP should be such a mixture as well, and the whole model should be a mixture of two latent models?

Hi @jessegrabowski,

Thank you for this very thorough response. I modeled temperature as a mixture because my observed data is distributed with three distinct peaks. A unimodal Normal doesn’t seem to fit the data as well as the mixture, as seen in the graphic below.

However, doing so certainly brings the result more in line with the regression result (see traceplots below), but does not fit the data as well.

As you’ve alluded to, this temperature data is aggregated across various years and regions, which may be the reason for the multimodality. However, I’m interested in the global effect of temperature on GDP across time, which is why I am using this observed data.

I’m wondering what negative downstream effects using a model fit that looks as “off” as the first figure will have. Even if the sampling result now agrees with the regression, isn’t this approach missing out on fully capturing the distribution of the data?

Why are you interested in explicitly modeling an exogenous input to your regression?

@jessegrabowski I’m interested in recreating the results of the paper Global non-linear effect of temperature on economic production by Burke et al, but by learning distributions rather than point estimates in order to have a better idea of the uncertainty of the estimates as well as be able to combine this analysis as part of a larger, Bayesian-network type model. My data comes from the same source as was used in this paper.
Do you think that this approach is for some reason ill-suited for the type of data I am working with? I am new to statistical modeling and so any feedback or comments on this idea/approach is really appreciated and helpful.