Sampling one set of parameter values from the posterior with guardrails

randomson · September 5, 2022, 7:29am

I am wondering what would be the best way to sample an set of parameter values from the posterior with guardrails on the distributions, given an arviz inference data object.

Guardrails refering to the interval of the credible interval.that we want to sample from.

The parameters may be drawn from both univariate and multivariate parameter distributions where as the following codsnippet would not work(the listcomprehension in def getsampledval would throw valueerror):


idata = pm.sample(500,
                  cores=1,
                  return_inferencedata=True)

lowerguardrail = 0.25
upperguardrail = 0.75
guardrails = idata.posterior.quantile((lowerguardrail, upperguardrail),
                                                  dim=('chain', 'draw'))

def getsampledval(trace, coeffname, guardrails):
    allsampled = [x for x in getattr(trace.posterior, coeffname).data[0]
                    if getattr(guardrails, coeffname).values[0] <= x <= getattr(guardrails,coeffname).values[1]]
    return allsampled[-1]

sampledparameterval = getsampledval(trace=idata, coeffname="paramname", guardrails=guardrails)

ricardoV94 · September 7, 2022, 6:00am

You can use an interval transform to keep proposed values inside the “guards”.

from pymc.distrobution.transforms import Interval

x = pm.Normal(0, 1, transform=Interval(lower, upper))

This is not possible for discrete variables yet, but there’s aproppsal to allow it fpr different reasons: Implement transforms for discrete variables by ricardoV94 · Pull Request #6102 · pymc-devs/pymc · GitHub

Do note that such bounds will not be respected in prior and posterior predictive, only in pm.sample

randomson · September 7, 2022, 12:45pm

I am sorry for not making this clear in the question.

What i want to do is to sample solely one sample from the posterior for each variable whereas the corresponding distributions for the variables has been truncated such that all values lie within a lower and upper bound of the credible interval for that variable.

I.e 0 <= lowerbound <= 1, 0 <= upper bound <= 1, lowerbound <= upperbound

I would prefer to do it on an inferencedata object.
E.g the quantile function gives us the corresponding credible interval for these bounds but not the sampled values within that credible interval.
If we could get those sampled values within that interval then we could use the extract_dataset function to retrieve the single sample.

To sum it up.
We retrieve the posterior distribution, we truncate it by our lower and upper bounds such that the following distribution lie within this credible interval.
We then sample a single sample from this truncated distribution.
Note that the variables may have both uni and multivariate distributions.

As i understand the proposed Interval function it could not do this?

randomson · September 7, 2022, 12:48pm

If we could filter our inferencedata object by the quantiles provided here:

idata.posterior.quantile((lowerguardrail, upperguardrail), dim=('chain', 'draw'))

And then draw a single sample from that resulting dataset(maybe by the extract_dataset function) i think this should correspond to what im looking for. Please correct me if im wrong.

I cant find a way however do perform that filtering.

My english is not up to par whereas my response may have led to even more confusion… please ask me to specify things which are not clear.

cluhmann · September 7, 2022, 1:12pm

I think you basically have it. Here is the lower tail from one parameter (named param):

param = idata.posterior["param"].to_numpy().flatten()

# figure out where to cut the tail
lowerguardrail = .25
lower_quant = np.quantile(param, lowerguardrail)

# filter out everything but the lower tail
lowertail = param[param<lower_quant]

# select 1 sample
rng = np.random.default_rng()
lowertail_sample = rng.choice(lowertail, size=1)

randomson · September 7, 2022, 1:46pm

thanks alot for answering.
One more question and then i declare it as a solution.
How would we do this in the multivariate case.
Consider e.g an RV declared as

 pm.Laplace(name=b_fourier_weekday",
                     mu=0,
                     b=2,
                     dims=6)

which would have the following shape in the idata.posterior(sampled from 2 chains and 100 samples):
(2, 100, 6)
Notice that we now have a extra dimension.
How would one draw samples from a credible interval for that variable?
Notice that the flatten in numpy would skew this up.

cluhmann · September 7, 2022, 1:58pm

My understanding is that HDIs for multivariate posteriors require post-processing beyond the samples themselves (e.g., clustering of samples), but I don’t have much first-hand experience. Not sure what arviz’s hdi() method does in such cases. @OriolAbril ?

ricardoV94 · September 7, 2022, 10:44pm

Just note that the posterior truncated samples are unlikely to correspond to the draws you would obtain from a proper truncated model. I have no idea what your final goal is so it could be perfectly fine.

randomson · September 9, 2022, 7:04am

i cant see how we can truncate it by percentages of the credible interval but i may be missing out on something? since we dont know what the following posterior would be i cant really provide bounds for the actual resulting sampling values but solely percentages of the credible interval

randomson · September 9, 2022, 9:33am

makes alot of sence… i can see myself come up with some quick heuristic to the problem by just using some clusterer such as k-means… but that solution does not have any sort of theoretic ground to stand on.
i found a stack-thread tackling this problem: bayesian - How to get multivariate credible interval estimate(s) / highest density regions (HDR) after MCMC - Cross Validated (stackexchange.com)

there is some valuable information in there.
Seems quite strange however that this is not builtin in the package… maybe i will make an dev request

cluhmann · September 9, 2022, 12:55pm

To me, HDIs themselves have always seemed like a pretty hacky way of collapsing (marginal) distributions down to an interval. They are meaningful/interpretable (cf. frequentist confidence intervals), but you’re on shaky ground as soon as you start throwing away information, so I wouldn’t be surprised to learn that existing solutions are “approximate” and/or “heuristic”.

Topic		Replies	Views
Generating population level samples for heirarchical parameters Questions	5	481	January 22, 2020
Model shape and best practice for sampling prior predictive on model without observed data v5	5	1160	April 3, 2023
2N+2 parameter inference: sampling all parameters as one? v5	5	151	March 28, 2024
Suggestion about sampling methods Development	2	594	August 7, 2018
Is there an efficient way to get all of my linear models parameters distributions mean? I.e. analogue to sklearn's .coef_? Questions	5	564	March 26, 2021

Sampling one set of parameter values from the posterior with guardrails

Related topics