Logp questions, synthetic dataset to evaluate modeling


I am running into some issues when running a model on pymc v5

We are attempting to generate a model and are first testing it on synthetic data in order to be aware of the bias and convergence. I have some questions:

  1. Is it possible to get, much like with model.initial_values() any other chain element? In the same manner (dictionary)?
  2. If the argument “initval” is passed within the model, then both the tuning steps AND the draws are initialized in those values, right?
  3. Is there any way to have something like discard_tuned_samples=False with ‘’‘pm.sampling_jax.sample_numpyro_nuts’‘’?
  4. If the sampling goes away from the given true values as initialization steps, would that always be an indication of poor modelling, or am I missing something? Since the priors were defined as informative around the true values, I am leaning towards something being amiss.
  5. In this case, the model has a Dirichlet prior. When this happens, the initial_values report a dirichlet_simplex__, and that is what is needed in order to run ```model.compile_logp’‘’. How can I go from the variable’s concentration values to the simplex that is needed? (for example, in any other of the chains’ values, as asked in 1)
  6. I am analyzing the logprobability in different stages.
    On one hand, I have the inference data sample_stats.lp, and I can get the mean value along all sample chains.
    Then, I have the initial value (which is in fact the true value with which the data was generated) and I get the logp using ‘’‘model.compile_logp’‘’
    Finally, I use ‘’‘pm.find_MAP(model)’‘’ in order to have another dictionary and use those values with ‘’‘model.compile_logp’‘’.
    I have two issues in this case: the initial values have lower logp values than the chains’ means. Then, the MAP values have the lowest of them all. Also, the reported logp while running find_MAP differs from the one reported from compile_logp()(MAP_point).
    From this, I then have two questions:

6a) To ensure I am reading them right: the logp is maximised, meaning, desired to be as large and positive as possible?

6b) Also, are all the logp calculations offered the same, except for some constant value or are they obtained in different manners?

Thanks in advance!

pymc==5.0.1 (I also use pymc.sampling_jax)

1 Like

That’s quite a few questions, so I guess I wills start at the beginning.

Can you say more about what exactly you want here? Other parameter values from the chain? That is, samples from the posterior? Those values are in idata.posterior, but I’m not entirely sure what you mean by " in the same manner (dictionary)".

My understanding is that it’s just used to initialize tuning. The sampling is initialized based on the tuning.

Tuning samples are discarded by default. Not sure if they can be retained.

Yes, exactly, posterior samples from the chain, and I can access them and draw them from idata.posterior, but I would rather get them in the dictionary format that model.initial_values() reports, so that I can then compare the sample_stats.lp in that point in particular and the compile_logp()(posterior_sample_dictionary) value.

Yes, I was afraid of that. pm.sample had the discard_tuned_samples=False that not always worked, but with pm.sampling_jax.sample_numpyro_nuts there’s no such thing (or rather, I couldn’t find it).

As I stated earlier, I am having issues getting a model to converge its parameters’ true values, even when giving informed priors and the actual values as init_val, so I am trying to verify how the log probability and the chains behave.

You can try doing this sort of thing to get a list of dictionaries if that’s easier to work with:


It very much depends on what you mean by “goes away from the true values”. It is certainly possible that there is something pathological with your model. On the other hand, sampling isn’t optimization, so you should only expect draws to resemble the true values to some extent.

Thank you for this idea!

I didn’t have it on my radar. Unfortunately, there’s two things that make it impossible in my case. Firstly, the memory usage, and secondly the fact that this does not replicate, that I saw, the RV_simplex__ variable that both initial_value and find_MAP generate that are then needed in to compile the logp.

This happens because find_MAP ignores the jacobian. You should get the same if you do compile_logp(jacobian=False)(map)

Yes, although the default sampler adds an amount of jittering around these values. You can choose another init method that does not.

Hmm we should probably add a utility to do that. For now you would need to do something manual like

import pymc as pm

with pm.Model() as m:
    x = pm.Uniform("x")
    y = pm.Dirichlet("y", [x, x])
transformed_rvs = []
for free_rv in m.free_RVs:
    transform = m.rvs_to_transforms.get(free_rv) 
    if transform is None:
        transformed_rv = transform.forward(free_rv, *free_rv.owner.inputs)
fn = m.compile_fn(inputs=m.free_RVs, outs=transformed_rvs)
res = fn({"x": .75, "y": [0.3, 0.7]})
res_dict = {
    "x_interval__": res[0],
    "y_simplex__": res[1],
# {'x_interval__': array(1.09861229), 'y_simplex__': array([-0.42364893])}

# Going the other way around is easier:
# unobserved_value_vars includes the original variables and deterministics, so you have
# to pick the outputs manually. Or you could look into the source code and adapt to return only what you need.
fn_inv = m.compile_fn(outs=m.unobserved_value_vars)
res = fn_inv(res_dict)
res_dict = {
    "x": res[2],
    "y": res[3]
# {'x': array(0.75), 'y': array([0.3, 0.7])}

Alternatively if you don’t care about transforms after you fit, you could recreate your model with all variables (transform=None), so that things like the simplex__ don’t show up. Then you will be able to use the traces to evaluate the logp from model.compile_logp() out of the box.

1 Like

To give furhter context, I am generating a biased dataset on purpose using the next code:

import scipy as sp
import numpy as np

n = 1000 #number of observations
concentration = 0.1
n_gen = 2

variables = ["var1","var2”, “var3"]

# Matrix that relates generators to observations. Then beta and gamma vectors that relate those to the target

w_true = np.array([[1,0,0], [1,1,1]], dtype= float)
gamma_true = np.array([ 1,0 ] , dtype = float)
beta_true = np.array([0,-0.5,-0.5], dtype = float)

# Generators
gen= sp.stats.dirichlet(alpha = concentration*np.ones(n_gen)).rvs(size=n, random_state = 4)

# Observations
rng = np.random.RandomState(12)
obs = np.dot(gen,w_true)

# Target
target = np.dot(obs, beta_true) + np.dot(gen, gamma_true)

I then pose a simplified model (after the complex one failed) as follows:

num_datapoints, data_dim = obs.shape
coords ={
    "observations": np.arange(num_datapoints),
    "variables": variables,
    "gens": ["z0", "z1"],    

# pose a model with informed priors AND initializsing their values
with pm.Model(coords = coords) as init_informed:
    z = pm.Dirichlet("z_dirichlet", np.array([0.1, 0.1]), 
                     dims = ("observations", "gens"), 
                     initval= gen)
    w = pm.Normal('w', mu = w_true,
                  sigma = 0.1,
                  dims = ("gens", "variables"), initval= w_true)
    mu_likelihood = pm.Deterministic("mu_likelihood", pm.math.dot(z,w))
    x = pm.Normal('x', mu = mu_likelihood,
                         sigma = (0.1, 0.1, 0.5),
                         dims = ("observations", "variables"), 
                         observed = obs)

After sampling with

pm.sampling_jax.sample_numpyro_nuts(draws = 4000,  tune = 4000,
                                    chains = 4, idata_kwargs={'log_likelihood': False,'dense_mass':  True,'max_tree_depth':15},
                                    target_accept = 0.9, random_seed = None)

The posteriors distributions differ from the true values used to generate the observations (true_values vertical orange lines)

Thank you @ricardoV94, I’ll try these ideas and come back to update. I missed this notification while writing the expansion, sorry!

I find it strange that it diverges from the true values, even when given all this information as a reference :thinking: I’ll keep gnawing at it. Thanks again!

Recovering weight of 0 or 1 with a Dirichlet prior is very hard, because they exist at the boundary of the parameter space (the sampler would need to approach -inf or +inf in the latent space).

Do you see a bias if you generate less extreme true values (like 0.05 vs 0.95)?
Otherwise if you are truly interested in extreme values you can increase the samples and you should see the sampler gets closer (but never there because it can’t!)

Edit: You already mucked with the max_tree_depth, which is a way to let the sampler take more and more small steps towards that boundary, but this is just postponing the inevitable failure.

If you had another model where you simulate data with 0 noise, and then try to infer noise with say a HalfNormal prior, you will see similar issues. The sampler can’t ever get to the true extreme value of 0 (which is the boundary for this example)

Oh, right! With those concentration values, I do have mainly 0 and 1 within the gen (z) variables.
I had thought of changing the z’s priors but then wondered how to include the restriction that they should add up to 1 within the model.

The biased posteriors I presented where of the W variable, but they ARE related, so I do have the same issue for gen , only it was easier/clearer to show this graph.

I am interested in a case where there could be either extreme or mid-range values, but the number of observations could be limited.
I will try to run the model with higher draws (is there really any point in increasing the tuning steps further than 4000?)

EDIT: I will actually be interested in a case where I do have noisy data, but that one had even bigger issues so I came to this simplified version in order to try and solve them.

The problem is that your simplified case might have ended up too simple. You need some noise in the model/data to have room to sample.

Thinking probabilistically, you would need inifinite data to ever conclude probabilities of one or zero right? Otherwise there could always be a future observation that invalidates that belief.

1 Like