Normal Mixture of correlated RVs?

Hi all!

I am trying to build a Bayesian model that combines a few Uniform RVs with topographical data (a m \times n matrix named Z) to derive two Deterministic RVs which then serve as the \mu and \sigma inputs for a Normal Mixture distribution. The ultimate goal of the model is to determine which parameter values better replicate mineral ages (named thermo\_ages) found in sand grains.

The priors for the model are defined as:

a_{br} \sim \text{Uniform}(0, 60)
h_c \sim \text{Uniform}(-1, 4)
e_0 \sim \text{Uniform}(0.001, 1)
e_1 \sim \text{Uniform}(0.001, 1)
s \sim \text{Uniform}(0.05, 0.5)

To generate an age b(z, \theta) at a point z in the topography, these uniformly distributed priors (grouped in the vector \theta) are combined as follow:

b(z,\theta)=\begin{cases} (z+h_c)/e_0, \ \ b < a_{br} \\ (z+h_{c_1})/e_1, \ \ b > a_{br} \\ \end{cases}

where h_{c_1} = h_c + a_{br}*(e_0 - e_1)

Defining \sigma = s \times b, the likelihood of observing an age a at a point z in Z, given the parameters \theta is:
p_b(a | z, \theta, \sigma) = \frac{1}{\sqrt{2\pi}\sigma} exp \big( -\frac{1}{2} {\big(\frac{a-b(z, \theta)}{\sigma}\big)}^2 \big)

The likelihood of observing an age a in a sand grain over the watershed Z is then defined as:

p_d(a) = \int_Z p_b(a|z)p(z) dz

And finally, the likelihood of observing thermo \_ages would be:

p(D) = \prod_{i = 1}^{n} p_d(a_i)

The code that generates such a model can be found here: Google Colab

However, when running the model, I am unable to get similar results as the ones of the published model. I think my problem might rely on the fact that:

  1. There are zeros in the sigma_age variable that translate to NaNs in my NormalMixture dist.
  2. Variables in age are not independently distributed.

In any case, I have been stuck there for a few days with no idea on how to solve this…

Best,
BenjamĂ­n

BTW you need to share your colab (you can set it to “Anyone on the internet with this link can view”

From my understanding, and the paper you shared with me offline
Avdeev et al. (2011).pdf (1.3 MB), the model for \mu is a step linear model (b(z, \theta) in the post). Thus, you need to be careful when computing mean or sum as the shape doesnt seems to be correct in your model currently.
One way to check is to look at the shape of mean (you can do mean.tag.test_value.shape in v3, will change in v4), which should be either [len(thermo_age)] or [num_mixture_component, len(thermo_age)] depending on what likelihood you are using. Current that doesnt seems to be correct as mean is a scalar in your model.

I suspect that after you compute b(z, \theta)

    ...
    b = tt.switch((b_0 < slope_break), b_0, b_1)

You should index to b to get the linear prediction of thermo_age (which then you can plug into a pm.Normal), because thermo_age are samples from some geolocation/site, which should correspondent to some coordinate in the map Z.

Side note: your manual computation of mixture likelihood is not correct, as mixture combines the likelihood but not the random variable (it’s not the same as weighting the mean and standard deviation)

Hi Junpeng,

Thanks for your quick response. I am a bit lost when you say step linear model and linear predictions; I tried to implement your suggestion but it didn’t work (error is at the end of the post).
Maybe my description of the model is confusing… The idea is:

  1. Parameters \theta = [a_{br}, h_c, e_0, e_1, s] generate certain bedrock ages in a catchment Z with a probability of p_b(a|z, \theta, \sigma) for every z in Z. According to the model, older ages will be at higher elevations and younger ages will be at lower elevations in the catchment Z.
  2. Erosion is uniform across the catchment’s bedrock; the probability of observing a sediment grain in the catchment’s mouth, sourced from a (bedrock) point z in the catchment Z (i.e., for each point in the catchment), is p(z) = 1/basin\_size.

In other words, my observed thermo\_ages (sediment grain ages found at the catchment’s mouth) are modeled as the likelihood of generating each of these ages at any given (bedrock) point in the catchment (p_b) and the probability of eroding such (bedrock) point in the catchment (p(z)) (i.e., a Normal Mixture:\ p(a) =\int_Zp_b(a|z)p(z)dz for every age a in thermo\_ages), and then combining these probabilities in p(D) = \prod_{i=1}^n p_d(a_i).

With that in mind, I (slightly) changed my code and indexed to b(z,\theta) (and in another attempt to age) as you suggested:

with detrital_thermo:

    p_D = pm.NormalMixture('p_D', w = tt.flatten(weigths), mu = tt.flatten(b), 
                           # alternatively: mu = tt.flatten(age)
                           sd = tt.flatten(sigma_age), shape = len(thermo_age), 
                           observed = thermo_age)
     

Which raises an error when sampling:

Traceback (most recent call last):

  File "G:\Shared drives\Ben_Greg\Thesis\ZHe_data\Benjamin\Shilong Basin\detrital_MCMC.py", line 93, in <module>
    trace = sample(draws = 500, tune = 100, cores = 1, step = Metropolis(),

  File "C:\Users\bguer\anaconda3\envs\pymc_test\lib\site-packages\pymc3\sampling.py", line 428, in sample
    check_start_vals(model.test_point, model)

  File "C:\Users\bguer\anaconda3\envs\pymc_test\lib\site-packages\pymc3\util.py", line 237, in check_start_vals
    raise SamplingError(

SamplingError: Initial evaluation of model at starting point failed!
Starting values:
{'slope_break_interval__': array(0.), 'erosion_0_interval__': array(0.), 'erosion_1_interval__': array(0.), 'closure_depth_interval__': array(0.), 'std_dev_interval__': array(2.22044605e-16)}

Initial evaluation results:
slope_break_interval__     -1.39
erosion_0_interval__       -1.39
erosion_1_interval__       -1.39
closure_depth_interval__   -1.39
std_dev_interval__         -1.39
p_D                         -inf
Name: Log-probability of test_point, dtype: float64

Am I indexing correctly? Should b.shape be (m, n, len(thermo\_ages)) instead? If so, how could I concatenate/reshape that?

Best,
BenjamĂ­n

I see, so the measure from the text file Avdeev_detrital.txt (column Age) corresponding to one measure for all the information in Z (from file Shillong_DEM.mat)? In that case yes a Mixture should be used and you dont need to do indexing as the mixture components are all the points in Z (so your formulation is largely correct, just need some fixing of the weight)

Alright, so modeling like this should work, but sampling is very slow (also, strongly recommend NOT to use metropolis):

Z_exclude_nan = Z[np.isfinite(Z)]

with pm.Model() as detrital_thermo:
    slope_break = pm.Uniform('slope_break', lower=0, upper=60)
    erosion_0 = pm.Uniform('erosion_0', lower=0.01, upper=1)
    erosion_1 = pm.Uniform('erosion_1', lower=0.01, upper=1)
    closure_depth = pm.Uniform('closure_depth', lower=-1, upper=4)
    std_dev = pm.Uniform('std_dev', lower=0.05, upper=0.5)
    #alpha = Uniform('alpha', lower=0, upper=1) # isotherm topo-deflection coeff
    closure_depth_1 = closure_depth + slope_break * (erosion_0 - erosion_1)
    
    # Bedrock cooling age
    # changed minus to plus sign in age calculation; error in Avdeev's?
    b_0 = ((Z_exclude_nan + closure_depth) / erosion_0)
    b_1 = ((Z_exclude_nan + closure_depth_1) / erosion_1)
    age = tt.switch((b_0 < slope_break), b_0, b_1) 
    sigma_age = age * std_dev
    
    weight = np.ones_like(Z_exclude_nan)
    weight = weight / weight.sum()
    # p_D = pm.NormalMixture('p_D', w=weight, mu=age, sigma=sigma_age, 
    #                        observed=thermo_age)
    comp_loglike = pm.Normal.dist(age, sigma_age).logp(thermo_age[..., None])
    mixture_likelihood = pm.Potential(
        'mixture_likelihood', 
        tt.sum(pm.math.logsumexp(comp_loglike+tt.log(weight[None, ...]), axis=-1)))

Edit:
Manage to further clean up the model, see Planet_Sakaar_Data_Science/detrital_model.ipynb at master · junpenglao/Planet_Sakaar_Data_Science · GitHub
Some explanation:

  1. Inference is very slow originally, because the mixture likelihood is sampling uniformly from a very large tensor. What I did is to find the unique value in the tensor Z, and instead of uniform sampling I can do weighted sample by changing the weight vector in mixture likelihood
  2. I used pymc v4, but since v4 does not have mixture likelihood yet (refactoring is in progress) I rewrite the mixture likelihood in hope that it will be a bit faster (very similar if you use pymc v3)
  3. Inference is still problematic as you have a lot of divergence, you could likely improve that with more informative prior. The initial metropolis sampling probably is giving wrong/bias estimate with uniform prior as you have few data points compare to the number of parameters.
1 Like

Thanks for such a reply!
What you did with weights in your GitHub repository was an elegant work around and the model seems to be working as it should.