Averaging over model output that sometimes returns null values?

Hi everyone,

I am trying to figure out how pymc3 deals with model outputs that are sometimes physical (real-valued) and sometimes not (NaN, Null, None etc.), depending parameters sampled from the distribution. I am interested in two cases:

  1. When the model output is deterministic. In this case I am trying to propagate parameters through a model for which I don’t have corresponding observed data.

  2. When the model output has a likelihood distribution and some observed data.

In the first case, I would like my averaging over the output parameters to not include the unphysical data. In the second case, I want to make sure the observed data is used properly in the regression.

I’m not quite sure how to more concisely ask this question, and consequently I’m not sure how best to use google to search for any documentation/discussions that touches on these topics. I would appreciate any help addressing the technical question, rephrasing the question in a better way, or identifying places I should go to look for more information related to this question. I’ve put more details of my particular case below. Thanks in advance for your help!



I have a model function that takes two parameters: (1) a coefficient and (2) a value at which to evaluate the function. The function is more complicated than, but similar to the following:

f(x, y) = g(x) if x < y, else NaN

where g(x) is some other function of x.

Eventually I will use f(x, y) with observed data as part of the regression, but for now I am trying to test propagation of variation in y through f(x, y) over a fixed grid of x-values.

When samples are drawn from the distribution of y’s, different amounts of the x-values in the fixed grid of x-values end up greater than the sampled ys. If I set f(x, y | x>y) = np.nan, the NaN seems to propagate through the averaging and turn any of the statistics at this x value to NaN. Using f(x, y| x>y) = None gives errors trying to add floats and None when constructing statistics.

I’m sorry, but I don’t understand what you are asking yet. What do you mean by model output? Maybe it would be easier if you said what you are trying to model, and also post some code (the stuff that gives errors about floats for example).


Thanks for taking the time to help me clarify my question.

I am modeling the thermodynamics of miscibility gaps in solid materials. A miscibility gap is a region of phase space where material does not mix and instead separates into two distinct phases. Oil and water not mixing is an example of this phenomenon in liquids. This behavior is temperature-dependent, and there is a critical temperature, T_c, above which the system will mix (and form 1 phase) and below which the system will separate into 2 phases. These phases are characterized by having different chemical compositions.

I have created a jupyter notebook to illustrate what I am trying to do with the simplest model of such a system. This actually runs fairly well. (I don’t think the Monte Carlo is converged, but that isn’t the point of the example.) The first part of the notebook describes the problem statement. The last part of the notebook sets up the model in pymc3 and runs it. The very last plot shows the main problem I’m trying to solve.

The last plot of the notebook compares observed data for the composition of a phase below the critical temperature with the posterior-predicted values (feel free to correct me if I’m using the wrong words here) of the composition. The critical temperature of the underlying system from which the data were generated is 1740 K. Above this temperature, my model (tt_tielines(Ts, alpha) , defined in cell 11) just returns values of 0.5 for the composition, which leads to the vertical line at x = 0.5 at T > 1740 K.

I tried using other return values than 0.5 when the input T is > the critical temperature. I can complete Monte Carlo sampling when I use np.nan, -np.inf, or np.inf, however each of them give strange results in my trace averaging. Using None gives issues with the Monte Carlo, I believe.

My question may be more of a statistics question than a pymc3-specific question. What is the appropriate thing for the function tt_tielines to return when the inputs to the function give physically unreasonable outputs to keep the Monte Carlo results reliable?

Please let me know if you would like further clarification, and thanks again for your help!



I think you should return nans in the tt_tielines when the input in invalid. None can’t work, since you promise theano to return an array of doubles, and None just isn’t one.
Then there is the question of what you want to do if you get any nans in the tielines. In the first example I guess you can just ignore those, you just need to be careful during plotting to handle them in a reasonable way.
In the second example I think you want to reject the proposed parameters if that happens. Since you already use the tielines as mu in pm.Normal this will happen automatically, but you could do it explicitly if you want with

pm.Potential('invalid_alpha', tt.switch(tt.any(tt.isnan(x_lines)), -np.inf, 0.))

If you do this, you will get a useless trace from Metropolis with the default settings, because it starts with alpha=0, which is already an invalid point, so metropolis can’t move away from that. You can fix that by changing the initial guess for alpha to maybe 350 or so. (add testval=350 to the definition of alpha)

However, there is an easier and cleaner way in this case. The fact that you have a measurement with a temperature of T_msg.max() already tells you that alpha > 2 * BOLTZCONST * T_msg.max(). You can avoid all this trouble by telling this to pymc:

alpha = pm.Normal('alpha', lower=2 * BOLTZCONST * T_msg.max(), upper=...)

This should always be the first thing to try: Find a parametrization where all possible parameter values lead to logp > 0. If you’re using NUTS this is the only way to deal with the problem properly btw.


Thanks so much for your reply. Your point about finding parameterizations where all parameter values give log§ > 0 is well taken. For the simple case in the notebook I shared this is trivial. However, in the actual model I am working on there are 4 parameters which together determine the critical temperature and the critical temperature needs to be calculated numerically. I’ll think about if it’s possible for me to re-parameterize the model to allow for better selection of priors.

Thanks for providing info on how to approach the problem of constrains generally. The Potential object looks like it will be useful for approaching this problem generally. I realized that it only makes sense to use the Potential object in my second model, where the x_mg_obs data part of the fitting.

In the second case, when I use both the H_obs data and the x_mg_obs data, the Potential works very well, and my posterior predictive distributions look pretty reasonable.

In the first case, when I only use H_obs as observable data, and treat the tt_tielines as a deterministic function, I’m not sure that using Potential makes sense. I used np.nanmean and np.nanpercentile to exclude the values of x_a that are returned as np.nan by tt_tielines. This more-or-less solves the problem, though at very high temperatures the averages are dominated by just a few samples. This approach to excluding nan’s from my post-sample averages probably makes the averages very sensitive to the MC sampling and I should make sure to exclude the burn-in time from my averaging, etc.

Here is an updated version of the notebook, showing the use of Potential. I’ll give this a try on my actual model and see how it goes.

Thanks again.



I think you understood what I meant anyway, but the logp > 0 should have been logp > -inf of course…

What you’re saying all makes sense to me. Just keep in mind that if you apply this to a problem with more parameters, Metropolis will get impractical very soon. (It scales terribly with the dimensionality and also hides this well sometimes). All of this should translate to nuts pretty easily though, you can compute the derivatives using the implicit function theorem. The rejections because of the Potential will lead to divergences, but I think those particular divergences won’t introduce bias, but only make the sampling somewhat less efficient. They could hide real divergences however. There’s no way to tell those apart with only the public API, but there are ways around that, I think. (Just ask if you run into this again :slight_smile: )

@aseyboldt ,

I would prefer to use NUTS to sample, rather than Metropolis. I was using Metropolis because I don’t think I can write down the gradients of tt_tielines. I’m not familiar with the implicit function theorem. Do you have any examples of how that is used in practice (or even in principle, I suppose) to compute derivatives in pymc3?



I don’t think we have an example of that anywhere yet. I’ve had plans to add something like that to the theano intro for some time now, with a bit of luck I’ll find some time for this tomorrow. :slight_smile:

That would be great! I look forward to reading it when it’s ready.


Took a bit longer…
I just opened a pull request on github: https://github.com/pymc-devs/pymc3/pull/2583
@jeffwdoak If something is hard to understand, please say so :slight_smile:

1 Like


Thanks for sharing the tutorial on implicit differentiation to obtain symbolic gradients. I haven’t been able to work on this for a while, but I finally got a chance to implement a gradient in my toy problem. Your description was very straightforward to follow. I ran into trouble when I forgot that the gradient method needed to define the symbolic gradient through theano variables rather than the numerical gradient through python/numpy variables. But once I realized my mistake, implementing the gradient went fairly smoothly.

Here’s an updated version of my notebook, implementing a TheanoOp with a gradient. For this toy problem, I don’t think it changes things very much. However, I am hopeful that it will improve my sampling in my full 6-dimensional problem.

In order to get the NUTS sampling working on this problem, I had to restrict the range of my prior over alpha to be >= 300, based on the data that I had generated. Allowing alpha to go below 300 causes some of the model predictions for my data to be NaN. If I allowed the range of prior alpha’s to go lower, the sampling raised a PositiveDefinteError. My brief search for info on this suggests that it is a Theano error caused by poor NUTS initialization. Looking forward to my full problem, where I can’t easily specify ranges of my model parameters that guarantee no NaNs will be returned, do you have any recommendations or literature I could look at on best practices for NUTS sampling and initialization?



I don’t think this really is a problem with the nuts initialisation but with the model and maybe some numerical trouble in tieline. One thing that helps me when I run into modeling trouble like this is to write a function without using pymc, that generates datasets in the required form, mostly to keep my thoughts straight and not because there is a great need for such a function. :slight_smile:
The input to the function should be all values that the experimenter controls. If I understand your experiment properly, that would be the grid of values for the atomic fraction and a grid of values for temperatures. I’ll also add a standard error for the temperatures (how precisely you can control the temperature, this helps get rid of the bounds problems with alpha. The standard error can be pretty small if you don’t actually care about errors in the temperature.):

def generate_data(atomic_fractions, temperatures, temp_error):
    # What is up with negative values for alpha?
    # They seem to lead to negative critical temperatures...
    alpha = np.abs(1000 * np.random.randn())

    H_true = H(atomic_fractions, alpha)
    # You might have some better prior information about this error...
    H_error = np.abs(2.5 * stats.t(3).rvs())
    H_observed = H_true + H_error * np.random.randn(len(atomic_fractions))

    T_critical = T_c(alpha)
    # We model the true temperatures explicitly, normally distributed
    # tightly around the expected values, but always larger
    # then the critical temperature. This way bad alpha values
    # get a very bad logp, because the true temperatures are very
    # far from the expected values, but the logp is still > -inf, so our
    # model is still well defined for all values of alpha.
    temperatures_true = np.exp(
        + temp_error * np.random.randn(len(temperatures)))
    temp_ok = temperatures_true < T_critical

    logit_x_true = np.where(
        special.logit([tieline(t, alpha) for t in temperatures_true]),
    x_error = np.abs(0.5 * stats.t(3).rvs())
    # What error model is appropriate?
    x_observed = special.expit(
        + x_error * np.random.randn(len(temperatures)))

    return {
        'alpha': alpha,
        'H_true': H_true,
        'H_error': H_error,
        'T_critical': T_critical,
        'H_observed': H_observed,
        'temperatures_true': temperatures_true,
        'x_observed': x_observed,
        'x_error': x_error

Note, that this returns nan for some of the x_observed values some of the time. This would be something like: “I ran the experiment at a certain temperature, but couldn’t measure x, because the temperature was below the critical temperature.” If this case doesn’t occur in the dataset, the model definition in pymc3 is pretty straight forward:

with pm.Model() as model:
    # What is going on with negative alphas? They lead to
    # negative critical temperatures...
    alpha = pm.HalfNormal('alpha', 1000)
    H_true = H(atomic_fraction, alpha)
    #H_error = pm.HalfStudentT('H_error', 3, 2.5)
    #pm.Normal('H_observed', H_true, H_error, observed=H_observed)
    T_critical = T_c(alpha)
    # the upper=-0.01 instead of upper=0 is for numerical stability
    BoundNormal = pm.Bound(pm.Normal, upper=-0.01)
    # How far are the true temperatures below the critical temperature?
    log_temperatures_diff = BoundNormal(
        mu=np.log(temperatures) - tt.log(T_critical),
    temperatures_true = tt.exp(log_temperatures_diff + tt.log(T_critical))
    pm.Deterministic('temperatures_true', temperatures_true)

    x_mu = TielineOp()(temperatures_true, alpha)
    logit_x_mu = pm.math.logit(x_mu)
    pm.Deterministic('logit_x_mu', logit_x_mu)
    x_error = pm.HalfStudentT('x_error', 3, 0.5)
    pm.Normal('logit_x_observed', logit_x_mu, sd=x_error,

I used a logit error model for x, I thought that would make more sense for values that seem to be naturally constrained to [0, 1]. But I really don’t know if this is right…

There is then some numerical trouble left to deal with. tieline doesn’t work well if x is very small. One way to fix that would be to work on logit(x) instead. A very quick and dirty solution could look a bit like that:

from scipy import optimize, special

def dGd_logit_x(logit_x, T, alpha):
    return alpha*(1.-2.*special.expit(logit_x)) + T*BOLTZCONST*logit_x

def tieline(T, alpha):
    Tc = T_c(alpha)
    if T <= Tc:
        result = optimize.bisect(dGd_logit_x, -1000, -0.1, args=(T, alpha), disp=False)
        return special.expit(result)
        return np.nan

But I think you can improve that a lot my also computing the gradient on a logit scale and using a root finder that also uses the derivative of dGd_logit_x.

1 Like

For the fun of it I gave the solver another shot. This should hopefully work properly:

def func(x, gamma):
    return np.tanh(x) - gamma * x

def logit_tieline(T, alpha):
    gamma = 2 * T * BOLTZCONST / alpha
    if gamma >= 1:
        raise ValueError('Invalid paramters T and alpha.')
    lower = np.log1p(np.sqrt(1 - gamma)) - np.log(gamma) / 2
    upper = 1 / gamma
    z = optimize.brentq(func, lower, upper, args=(gamma,))
    return - 2 * z[0]
1 Like