Using a random variable as observed

It appears to be impossible to use a random variable as the observed for another variable. For example, consider this simple model:

import pymc3 as pm
import matplotlib.pyplot as plt
import scipy.stats as stats
import numpy as np

raw_observed = 0.3
with pm.Model() as model:
    alpha = pm.Exponential('alpha', 0.2)
    beta = pm.Exponential('beta', 0.2)
    raw_with_slight_adjustment = pm.Normal(
        'raw_with_slight_adjustment', raw_observed, 0.0001) 
    wpp = pm.Beta(
        'wpp', alpha=alpha, beta=beta, 
        observed=raw_with_slight_adjustment) 

This model raises a TypeError:

If the observed is a deterministic instead of a RV, the model can be defined. For example this variation of the above does not raise an error:

raw_observed = 0.3
with pm.Model() as model:
    alpha = pm.Exponential('alpha', 0.2)
    beta = pm.Exponential('beta', 0.2)
    slight_adjustment = pm.Normal('slight_adjustment', 0, 0.0001)
    raw_with_slight_adjustment = pm.Deterministic(
        'raw_with_slight_adjustment', 
        raw_observed + slight_adjustment)
    wpp = pm.Beta(
        'wpp', alpha=alpha, beta=beta, 
        observed=raw_with_slight_adjustment) 

But this new model does raise an error when the prior predictives are sampled:

Is using a variable as observed for another variable just a bad idea? Or is this a sensible feature that just has not yet been implemented?

PyMC v3.9.3

1 Like

It is something I am still trying to understand intuitively, but it seems the consensus is that we should treat our data as being completely deterministic, and any other type of noise should be explicitly modeled outside the likelihood.

Within the PyMC3 API in particular, the Deterministic should definitely not be used for observed, it is just an edge case that is not checked for.

Does this achieve the same goal?

    alpha = pm.Exponential('alpha', 0.2)
    beta = pm.Exponential('beta', 0.2)
    latent_raw = pm.Beta('latent_raw', alpha=alpha, beta=beta) 
    wpp = pm.Normal('wpp', mu=latent_raw, sigma=0.0001, observed=raw_observed) 
2 Likes

Just to echo Ricardo’s point I can’t offhand think of a conventional model (GLM, timeseries etc) where you’d want to evidence against an RV: you’d typically estimate that variance explicitly. That said, I’ve a nagging doubt that this could be useful as part of a belief network, to evidence at different nodes for message passing…

2 Likes

To @ricardoV94’s point, this simple example could be reparameterized without the RV in the observed. But my actual model is both much more complex, and not a conventional model. I was attempting to reparameterize it around another issue, and hit upon the ill-advised idea of using an RV in the observed, only to discover that it is not possible in pymc3.

I have two ideas on how to trick pymc3 into using an observed variable as observed, I am not sure if it’s a good idea though. It may also be important to note that if PyMC3 were to allow random variable as observed, that would break after sampling when trying to convert to ArviZ and compute convergence chechs.

Option 1

Use a pm.Potential. I believe something like pm.Potential("wpp", pm.Beta.dist(alpha=alpha, beta=beta).logp(raw_with_slight_adjustment)) will add the right term to the joint probability.

Option 2

Use the cursed DensityDist. There are at least 5 options to do that with a densitydist, 3 of which should work in this situation, afaik, all 5 should work when using proper observed values (and many won’t need the density_dist_obs thingy when doing so). Remember to use idata_kwargs={"density_dist_obs": False}

2.1
DensityDist(
    "wpp", 
    pm.Beta.dist(alpha=alpha, beta=beta).logp, 
    observed=raw_with_slight_adjustment
)

This won’t work because if observed is not a dict, DensityDist has the same check all other distributions do, and doesn’t allow FreeRVs to be passed.

2.2
DensityDist(
    "wpp", 
    pm.Beta.dist(alpha=alpha, beta=beta).logp, 
    observed={"value": raw_with_slight_adjustment}
)

This should now trigger the dict checks, and avoid the FreeRV one. I still don’t know if this is a feature or a bug, at the very least feels wrong to me. One can see in the docs that the logp method of beta takes a value argument, so this should be public knowledge, what I don’t think is documented anywhere official is the array vs dict input trick. I have lost count of how many discourse posts and issues I have answered explaining that and saying density_dist_obs=False should be used.

2.3
DensityDist(
    "wpp", 
    lambda a, b, raw: pm.Beta.dist(alpha=a, beta=b).logp(raw), 
    observed={"raw": raw_with_slight_adjustment, "a": alpha, "b": beta}
)
2.4
def logp_fun(raw):
    pm.Beta.dist(alpha=alpha, beta=beta).logp(raw)
pm.DensityDist("wpp", logp_fun, observed=raw_with_slight_adjustment)

Again, won’t work because no dict input

2.5
def logp_fun(raw):
    pm.Beta.dist(alpha=alpha, beta=beta).logp(raw)
pm.DensityDist(
    "wpp", logp_fun, observed={"raw": raw_with_slight_adjustment}
)

Disclaimer: I don’t have the slightest idea of which of these approaches is the one DensityDist was designed for, if there are any differences between them when it comes to preformance nor if one of these (or a 6th alternative) should be the “recommended” approach. I personally prefer the Potential option, but this opinion is only based on style and conceptual coherence in the use of observed

4 Likes

Good idea with the Potential. I always forget about it! Only issue is that you can’t do (automatic) predictive sampling that way

4 Likes

@OriolAbril: Like @ricardoV94, I always forget about Potential. And I did not even know about DensityDist. Will try both. Meanwhile marked as solved, betting the odds.

1 Like

You know, in light of @OriolAbril’s answer I must not have interpreted your question properly and/or not thinking straight!

You can indeed use a Potential in this situation and I actually have, and quite recently too. Not quite the same situation, but using a distribution to transform observed data, then evidencing that transformed data against another rv. Used as part of a copula transform.

So +1 for Potential :slight_smile:

1 Like

For future travellers, please note this hack sadly no longer works

See: https://github.com/pymc-devs/pymc/blob/92278278d4a8b78f17ed0f101eb29d0d9982eb45/pymc/distributions/distribution.py#L957

@OriolAbril Unfortunately I’d really like to use data transformed by an RV as the observed… The potential works okay, but then I lose an auto-generated pointwise log-likelihood. Perhaps I could manually create a log-likelihood? I need to think about this differently - all suggestions welcome!

Can you share more details of what transformation exactly? Does this model have a generative form?

This shouldn’t be very hard. Just something like m.compile_fn(inputs=m.value_vars, output=m.logp([potential], sum=False), on_unused_input="ignore")

Then evaluate it for each entry in the trace (I have been thinking a utility to do this trace mapping would be useful). You can check the source code on compute_log_likelihood for inspiration.

1 Like

It’s a copula model that does something a little like:

  1. Create covariance:
\begin{align} L &\sim \text{LKJCholesky}(2), \; R \sim \text{LKJCorr}(2) \\ \sigma &\sim \text{InverseGamma}(\alpha, \beta) \\ \Sigma &\sim LL^{T} = diag(\sigma) * R * diag(\sigma) \end{align}
  1. Transform actually observed marginals via their CDFs:
\begin{align} \mathbf{m1u}_{y} &= \mathfrak{m1}_{y}\Phi(\mathbf{m1}_{y}) \\ \mathbf{m2u}_{y} &= \mathfrak{m2}_{y}\Phi(\mathbf{m2}_{y}) \end{align}
  1. Transform the now-uniform-transformed marginals via a Normal InvCDF:
\begin{align} (\mathbf{m1n}, \mathbf{m2n})_{y} &= \text{MvNormal}(\mu=0, \sigma=1)\Phi^{-1}([\mathbf{m1u}_{y}, \mathbf{m2u}_{y}]) \end{align}
  1. Evaluate likelihood at the copula:
\begin{align} copula &\sim \text{MvNormal}(\mu=0, \Sigma, observed=(\mathbf{m1n}, \mathbf{m2n})_{y} \end{align}

Thanks, I was digging through compute_log_liklihood that yesterday but was traveling and zombied. Will take a look later today, and thanks for the example!

Well that was easier than I expected :smiley: Coming back to it after a couple of days, an hours reading and figuring out what’s going on in the code, I have something that appears to work, with only a very very minor change:

I’ve taken a verbatim copy of compute_log_likelihood and changed one line

from

if not set(observed_vars).issubset(model.observed_RVs):

to

if not set(observed_vars).issubset(model.observed_RVs + model.potentials):

and then call it like this to operate on the relevant potential in my model ('pot_y_cop')

mdl.idata.add_groups(dict(log_likelihood=compute_log_likelihood_for_potential(
                    idata=mdl.idata, var_names=['pot_y_cop'], model=mdl.model, 
                    extend_inferencedata=False)))

… Seems to be job done!

Yeah, we considered adding a flag to Potential so users can say if it’s a likelihood term to be automatically included in log_likelihood computations

1 Like

I think that would be helpful, since everything else seems to work as planned

Any chance someone could create a gist with a simple implementation of what @jonsedar has done? I am struggling to replicate without knowing what mdl is, where to import compute_log_likelihood_for_potential, etc.

FWIW - my use case here is the Holsten-Laubach-Williams model in economics. In Equation 4, the “observed” \tilde{y} is a function of another latent variable, y^*. My model worked in pymc3 but I’ve so far been unable to port to pymc5.

Here’s an implementation of the model as a linear statespace, which side-steps the problem of transformed observables by using an observation equation to back observed log GDP out of y^\star and \tilde{y}. I think I might have implemented it wrong, because z_t and \tilde{y}_t end up not identified. But maybe you can spot something obviously wrong?

1 Like

This thread might help "log_likelihood" not found in InferenceData - #4 by karthur

What I’ve done is (probably too) simple:

After calling

idata = pm.sample()

I manually append the log_likelihood calculated using a modified copy of the usual function as I described above: compute_log_likelihood_for_potential

rvs_names = ['potential1']  # etc

idata.add_groups(
    dict(
        log_likelihood=compute_log_likelihood_for_potential(
            idata=idata,
            model=model,
            var_names=rvs_names,
            extend_inferencedata=False,
        )
    )
)

That’s pretty much it.

For later use in az.compare, you’ll also want to make sure the log_likelihood variables have the same string name as the observed_data variables

2 Likes

I was being dense - turns out I just needed to use pm.Potential.

@jonsedar Thank you - I will try this when I need the pointwise log-likelihood.

@jessegrabowski For identifying the model, try setting r^*=z_t. One of the (many, imo) problems with HLW is that because there are no constraints on z_t, the model with r^*=g_t + z_t is observationally equivalent to the model with r^*=z_t. Kiley 2015 does this as well.

2 Likes