Using a random variable as observed

bridgeland · April 6, 2021, 2:17pm

It appears to be impossible to use a random variable as the observed for another variable. For example, consider this simple model:

import pymc3 as pm
import matplotlib.pyplot as plt
import scipy.stats as stats
import numpy as np

raw_observed = 0.3
with pm.Model() as model:
    alpha = pm.Exponential('alpha', 0.2)
    beta = pm.Exponential('beta', 0.2)
    raw_with_slight_adjustment = pm.Normal(
        'raw_with_slight_adjustment', raw_observed, 0.0001) 
    wpp = pm.Beta(
        'wpp', alpha=alpha, beta=beta, 
        observed=raw_with_slight_adjustment)

This model raises a TypeError:

If the observed is a deterministic instead of a RV, the model can be defined. For example this variation of the above does not raise an error:

raw_observed = 0.3
with pm.Model() as model:
    alpha = pm.Exponential('alpha', 0.2)
    beta = pm.Exponential('beta', 0.2)
    slight_adjustment = pm.Normal('slight_adjustment', 0, 0.0001)
    raw_with_slight_adjustment = pm.Deterministic(
        'raw_with_slight_adjustment', 
        raw_observed + slight_adjustment)
    wpp = pm.Beta(
        'wpp', alpha=alpha, beta=beta, 
        observed=raw_with_slight_adjustment)

But this new model does raise an error when the prior predictives are sampled:

Is using a variable as observed for another variable just a bad idea? Or is this a sensible feature that just has not yet been implemented?

PyMC v3.9.3

ricardoV94 · April 6, 2021, 3:22pm

It is something I am still trying to understand intuitively, but it seems the consensus is that we should treat our data as being completely deterministic, and any other type of noise should be explicitly modeled outside the likelihood.

Within the PyMC3 API in particular, the Deterministic should definitely not be used for observed, it is just an edge case that is not checked for.

Does this achieve the same goal?

    alpha = pm.Exponential('alpha', 0.2)
    beta = pm.Exponential('beta', 0.2)
    latent_raw = pm.Beta('latent_raw', alpha=alpha, beta=beta) 
    wpp = pm.Normal('wpp', mu=latent_raw, sigma=0.0001, observed=raw_observed)

jonsedar · April 7, 2021, 2:08am

Just to echo Ricardo’s point I can’t offhand think of a conventional model (GLM, timeseries etc) where you’d want to evidence against an RV: you’d typically estimate that variance explicitly. That said, I’ve a nagging doubt that this could be useful as part of a belief network, to evidence at different nodes for message passing…

bridgeland · April 8, 2021, 12:09pm

To @ricardoV94’s point, this simple example could be reparameterized without the RV in the observed. But my actual model is both much more complex, and not a conventional model. I was attempting to reparameterize it around another issue, and hit upon the ill-advised idea of using an RV in the observed, only to discover that it is not possible in pymc3.

OriolAbril · April 8, 2021, 1:02pm

I have two ideas on how to trick pymc3 into using an observed variable as observed, I am not sure if it’s a good idea though. It may also be important to note that if PyMC3 were to allow random variable as observed, that would break after sampling when trying to convert to ArviZ and compute convergence chechs.

Option 1

Use a pm.Potential. I believe something like pm.Potential("wpp", pm.Beta.dist(alpha=alpha, beta=beta).logp(raw_with_slight_adjustment)) will add the right term to the joint probability.

Option 2

Use the cursed DensityDist. There are at least 5 options to do that with a densitydist, 3 of which should work in this situation, afaik, all 5 should work when using proper observed values (and many won’t need the density_dist_obs thingy when doing so). Remember to use idata_kwargs={"density_dist_obs": False}

2.1

DensityDist(
    "wpp", 
    pm.Beta.dist(alpha=alpha, beta=beta).logp, 
    observed=raw_with_slight_adjustment
)

This won’t work because if observed is not a dict, DensityDist has the same check all other distributions do, and doesn’t allow FreeRVs to be passed.

2.2

DensityDist(
    "wpp", 
    pm.Beta.dist(alpha=alpha, beta=beta).logp, 
    observed={"value": raw_with_slight_adjustment}
)

This should now trigger the dict checks, and avoid the FreeRV one. I still don’t know if this is a feature or a bug, at the very least feels wrong to me. One can see in the docs that the logp method of beta takes a value argument, so this should be public knowledge, what I don’t think is documented anywhere official is the array vs dict input trick. I have lost count of how many discourse posts and issues I have answered explaining that and saying density_dist_obs=False should be used.

2.3

DensityDist(
    "wpp", 
    lambda a, b, raw: pm.Beta.dist(alpha=a, beta=b).logp(raw), 
    observed={"raw": raw_with_slight_adjustment, "a": alpha, "b": beta}
)

2.4

def logp_fun(raw):
    pm.Beta.dist(alpha=alpha, beta=beta).logp(raw)
pm.DensityDist("wpp", logp_fun, observed=raw_with_slight_adjustment)

Again, won’t work because no dict input

2.5

def logp_fun(raw):
    pm.Beta.dist(alpha=alpha, beta=beta).logp(raw)
pm.DensityDist(
    "wpp", logp_fun, observed={"raw": raw_with_slight_adjustment}
)

Disclaimer: I don’t have the slightest idea of which of these approaches is the one DensityDist was designed for, if there are any differences between them when it comes to preformance nor if one of these (or a 6th alternative) should be the “recommended” approach. I personally prefer the Potential option, but this opinion is only based on style and conceptual coherence in the use of observed

ricardoV94 · April 8, 2021, 5:18pm

Good idea with the Potential. I always forget about it! Only issue is that you can’t do (automatic) predictive sampling that way

bridgeland · April 9, 2021, 8:57pm

@OriolAbril: Like @ricardoV94, I always forget about Potential. And I did not even know about DensityDist. Will try both. Meanwhile marked as solved, betting the odds.

jonsedar · April 10, 2021, 2:09am

You know, in light of @OriolAbril’s answer I must not have interpreted your question properly and/or not thinking straight!

You can indeed use a Potential in this situation and I actually have, and quite recently too. Not quite the same situation, but using a distribution to transform observed data, then evidencing that transformed data against another rv. Used as part of a copula transform.

So +1 for Potential

jonsedar · September 16, 2023, 12:39pm

For future travellers, please note this hack sadly no longer works

See: https://github.com/pymc-devs/pymc/blob/92278278d4a8b78f17ed0f101eb29d0d9982eb45/pymc/distributions/distribution.py#L957

@OriolAbril Unfortunately I’d really like to use data transformed by an RV as the observed… The potential works okay, but then I lose an auto-generated pointwise log-likelihood. Perhaps I could manually create a log-likelihood? I need to think about this differently - all suggestions welcome!

ricardoV94 · September 16, 2023, 3:12pm

Can you share more details of what transformation exactly? Does this model have a generative form?

ricardoV94 · September 16, 2023, 3:15pm

This shouldn’t be very hard. Just something like m.compile_fn(inputs=m.value_vars, output=m.logp([potential], sum=False), on_unused_input="ignore")

Then evaluate it for each entry in the trace (I have been thinking a utility to do this trace mapping would be useful). You can check the source code on compute_log_likelihood for inspiration.

jonsedar · September 17, 2023, 8:00am

It’s a copula model that does something a little like:

Create covariance:

\begin{align} L &\sim \text{LKJCholesky}(2), \; R \sim \text{LKJCorr}(2) \\ \sigma &\sim \text{InverseGamma}(\alpha, \beta) \\ \Sigma &\sim LL^{T} = diag(\sigma) * R * diag(\sigma) \end{align}

Transform actually observed marginals via their CDFs:

\begin{align} \mathbf{m1u}_{y} &= \mathfrak{m1}_{y}\Phi(\mathbf{m1}_{y}) \\ \mathbf{m2u}_{y} &= \mathfrak{m2}_{y}\Phi(\mathbf{m2}_{y}) \end{align}

Transform the now-uniform-transformed marginals via a Normal InvCDF:

\begin{align} (\mathbf{m1n}, \mathbf{m2n})_{y} &= \text{MvNormal}(\mu=0, \sigma=1)\Phi^{-1}([\mathbf{m1u}_{y}, \mathbf{m2u}_{y}]) \end{align}

Evaluate likelihood at the copula:

\begin{align} copula &\sim \text{MvNormal}(\mu=0, \Sigma, observed=(\mathbf{m1n}, \mathbf{m2n})_{y} \end{align}

jonsedar · September 17, 2023, 8:05am

Thanks, I was digging through compute_log_liklihood that yesterday but was traveling and zombied. Will take a look later today, and thanks for the example!

jonsedar · September 19, 2023, 12:12pm

Well that was easier than I expected Coming back to it after a couple of days, an hours reading and figuring out what’s going on in the code, I have something that appears to work, with only a very very minor change:

I’ve taken a verbatim copy of compute_log_likelihood and changed one line

github.com

pymc-devs/pymc/blob/92278278d4a8b78f17ed0f101eb29d0d9982eb45/pymc/stats/log_likelihood.py#L68C27-L68C27


      
          if not set(observed_vars).issubset(model.observed_RVs):

from

if not set(observed_vars).issubset(model.observed_RVs):

to

if not set(observed_vars).issubset(model.observed_RVs + model.potentials):

and then call it like this to operate on the relevant potential in my model ('pot_y_cop')

mdl.idata.add_groups(dict(log_likelihood=compute_log_likelihood_for_potential(
                    idata=mdl.idata, var_names=['pot_y_cop'], model=mdl.model, 
                    extend_inferencedata=False)))

… Seems to be job done!

ricardoV94 · September 19, 2023, 12:26pm

Yeah, we considered adding a flag to Potential so users can say if it’s a likelihood term to be automatically included in log_likelihood computations

jonsedar · September 19, 2023, 4:33pm

I think that would be helpful, since everything else seems to work as planned

DanWeitzenfeld · October 18, 2023, 9:55pm

Any chance someone could create a gist with a simple implementation of what @jonsedar has done? I am struggling to replicate without knowing what mdl is, where to import compute_log_likelihood_for_potential, etc.

FWIW - my use case here is the Holsten-Laubach-Williams model in economics. In Equation 4, the “observed” \tilde{y} is a function of another latent variable, y^*. My model worked in pymc3 but I’ve so far been unable to port to pymc5.

jessegrabowski · October 19, 2023, 2:34am

Here’s an implementation of the model as a linear statespace, which side-steps the problem of transformed observables by using an observation equation to back observed log GDP out of y^\star and \tilde{y}. I think I might have implemented it wrong, because z_t and \tilde{y}_t end up not identified. But maybe you can spot something obviously wrong?

jonsedar · October 19, 2023, 2:47pm

This thread might help "log_likelihood" not found in InferenceData - #4 by karthur

What I’ve done is (probably too) simple:

After calling

idata = pm.sample()

I manually append the log_likelihood calculated using a modified copy of the usual function as I described above: compute_log_likelihood_for_potential

rvs_names = ['potential1']  # etc

idata.add_groups(
    dict(
        log_likelihood=compute_log_likelihood_for_potential(
            idata=idata,
            model=model,
            var_names=rvs_names,
            extend_inferencedata=False,
        )
    )
)

That’s pretty much it.

For later use in az.compare, you’ll also want to make sure the log_likelihood variables have the same string name as the observed_data variables

DanWeitzenfeld · October 20, 2023, 2:08am

I was being dense - turns out I just needed to use pm.Potential.

@jonsedar Thank you - I will try this when I need the pointwise log-likelihood.

@jessegrabowski For identifying the model, try setting r^*=z_t. One of the (many, imo) problems with HLW is that because there are no constraints on z_t, the model with r^*=g_t + z_t is observationally equivalent to the model with r^*=z_t. Kiley 2015 does this as well.

Topic		Replies	Views
TypeError: Invalid Use of Observed Data Variable v5 modeling	9	640	October 6, 2023
How Does the Sampler Use Covariate Observed Data? version agnostic	9	281	March 15, 2024
Observed Deterministic Questions	12	9740	June 19, 2022
Random variable as observation Questions	9	1431	January 20, 2023
Observed data dependent on random variables Questions	8	641	December 30, 2020

Using a random variable as observed

Option 1

Option 2

2.1

2.2

2.3

2.4

2.5

Related topics