I would like to make a Bayesian update scheme similar to Kalman filter. I have a dynamical system which has certain update equations with noisy input, as well as noisy observables. At every timestep, I first propagate the state distribution using system equations, then do posterior sampling to reconcile new distribution with the current observable values.
In between every iteration, I obtain a bunch of samples from the distribution of interest, but later I have to sample from the same distribution. From what I understand, my options are either to approximate the sampled distribution by some standard distribution, or to use a KDE. I don’t mind either, as long as it works. For now, I need any solution that works, to test if the closed loop behaves as expected. So quadratic approximation would be fine.
My main challenge is constraints. My state is given by 4 parameters. All parameters are real numbers, that must always lie within [0, 1]. Further, it must be satisfied that p1 <= p2, and p3 <= p4.
Questions:
- Is it possible to sample posterior given multivariate gaussian prior for 4 variables, while respecting above constraints?
- If you have other suggestions on how to improve this procedure, please tell me.
Edit: I have considered removing constraints via domain transform
$$ [0, 1] \rightarrow [-\infty, \infty] $$
for example via a logit function. However, my parameters are allowed to be at the extremes. For example, p1 = 0 is a perfectly normal state. That would result in transformed variable being exactly infinite, which can’t be good…
I bet @jessegrabowski can give you some pointers, including looking at the pymc-experimental statespace module
It sounds like you’re doing online updating; you want to run your filter on incoming streaming data, one point at a time. Is that right? If so, the statespace
module won’t help.
Do you also want to update the parameter priors whenever new data comes in, or do you only want to update the hidden states?
- Is it possible to sample posterior given multivariate gaussian prior for 4 variables, while respecting above constraints?
No, because by definition this is not a multivariate normal after the constraints are applied. You will need to either use a different distribution or do some kind of transformation (as you bring up in the edit)
It sounds like you’re doing online updating; you want to run your filter on incoming streaming data, one point at a time. Is that right? If so, the statespace
module won’t help.
Yes, that is exactly right. I have an IOT device that reports data once every minute or so. I need to make continuous guesses on its state, one point at a time.
Do you also want to update the parameter priors whenever new data comes in, or do you only want to update the hidden states?
I expect that, during simulation time, nothing except the hidden state can change in any meaningful way. So I’m fine with a constant initial guess for parameters. The only effect I would like to model is the change in the hidden variables, both due to its intrinsic evolution, and due to changes in observables.
You will need to either use a different distribution or do some kind of transformation (as you bring up in the edit)
Do you have any suggestions on how to proceed?
- It seems that domain reparameterization would disbehave if a state is at the boundary. Is there a way around that?
- Do you have a distribution in mind, that would intrinsically model such boundary conditions? Naively, I would consider using 2 binomial distributions with high enough sample number, for parameter sets (p1,p2) and (p3,p4). But then I also need to stitch them together somehow, because the most important part of this model is to have a dense 4x4 covariance matrix between all parameters. Do you have some suggestions maybe?
- Sorry for being annoying. Is it not possible to have a truncated multivariate normal distribution? There is an univariate distribution like that already. Or is it conceptually impossible somehow.
Thanks a lot for all the help
Truncated MvN absolutely exists in principle, but I don’t know if it’s implemented in PyMC. @ricardoV94 ? If it does you could give it a try. It wouldn’t fix your ordering problem, though. See this discussion for some insight into how joint ordering changes marginal distributions.
Can you share any details of your transition/observation equations? My first instinct would be to model the hidden states in an unconstrained latent space, then transform that into the observation space. If you’re working with a particle filter this would essentially be exactly what we do when MCMC sampling.
Here is a simpler example for a 2 parameter model. There is a 1D pipe of length 1 meter, through which water can flow in either direction. The water that flows from the left is always blue. The water that flows from the right is always red. On the boundary between red and blue, the water mixes, resulting in purple water. Hidden state p1 denotes the end position of blue water and start of the purple water. Hidden state p2 denotes the end of purple water and start of red water.
- We know the flowrate direction and approximate value, but there is noise of known distribution.
- We have approximate differential equations which denote how much mixing happens, depending on flowrate and past state. Thus, we have an estimate for hidden states p1 and p2, given their values one timestep ago, and flowrate at this timestep.
- We measure the color of the flow at select locations in the pipe. The locations of the measurement devices are known. The color measurements are noisy, with known distribution. (Normal distribution, all measurement devices are independent).
As you can see, if I turn on the flow in the positive direction (left-to-right) for a long time, all water in the pipe will be blue. During this time, red and purple water will have flown out of the right end of the pipe, resulting in p1 = p2 = 1. Similarly, with the negative flowrate, eventually p1=p2=0. Clearly, we must also have p1 <= p2, because the size of purple water L = p2 - p1 is non-negative.
Now, it is not particularly hard to get rid of the second constraint by reparameterization, we could simply use p1 and L / (1 - p1) as parameters, then both parameters are independent and are within [0, 1], although highly correlated. Yes, the second parameter results in 0/0 when p1 = 0, but we can use some smooth function that reduces to 0.5 when p1 is small without any significant loss of accuracy. The real problem is with converting the interval [0, 1] to a real line, while allowing the state where purple water has been completely pumped out of the system in either direction.
Could you elaborate on how one might move to an unconstrained latent space in my example?
It’s not, we need to wrap a Scipy Op that computes the CDF for the truncation probability
@ricardoV94
Is it possible to define the multivariate distribution purely numerically, by rejecting points that do not satisfy constraints? Or would that destroy performance beyond usability?
So here is a scipy wrapper class that provides the truncated multivariate distribution based on rejection sampling. It provides the sampling method and the log-likelihood method. Only problem is that the log-likelihood is accurate only up to a constant, as I have not normalized the original PDF with respect to constraints.
- Is this sufficient to function as a prior for MCMC posterior sampling in PYMC? or do I need to implement exact log-likelihood (by explicitly integrating PDF outside of the domain)?
- If yes, how can I convert it into a distribution that can be directly used in PYMC?
import numpy as np
from scipy.stats import multivariate_normal
class DistrMVGaussianTruncated(Distribution):
def __init__(self, mu: np.ndarray, cov: np.ndarray, f_constr, sampleFactor=10):
"""
:param mu: Mean of the multivariate truncated distribution
:param cov: Covariance of the multivariate truncated distribution
:param f_constr: A function float[nSamples, nDim] -> bool[nSamples]. For each sample,
checks if it satisfies the truncation constraints
:param sampleFactor: Since some points will be rejected, it makes sense to sample more than the requested
number of points, to avoid sampling too many times. Excess points are stored for future requests
"""
self.mu = mu
self.cov = cov
self.f_constr = f_constr
self.x = np.array([])
self.sampleFactor = sampleFactor
def ndim(self) -> int:
return len(self.mu)
def _sample_increment(self, nsample: int) -> None:
x = np.random.multivariate_normal(self.mu, self.cov, nsample)
x = x[self.f_constr(x)]
self.x = np.concatenate([self.x, x])
def sample(self, nsample: int) -> np.ndarray:
while len(self.x) < nsample:
self._sample_increment(self.sampleFactor * nsample)
x = self.x[:nsample] # Report first n stored distribution samples
self.x = self.x[nsample:] # Keep the rest of the samples for future use
return x
def mle(self, data: np.ndarray) -> None:
"""
:param data: Data samples [nData, nDim].
:return:
Fit the distribution to the data.
NOTE: Maximum likelihood of the truncated distribution is the same as of the non-truncated distribution.
PDF of data is scaled by a constant
-> log-likelihood is different by a constant
-> MLE is the same as of the original distribution
-> Can use non-truncated scipy MLE
"""
assert data.ndim == 2
assert data.shape[0] >= self.ndim(), f"Must have at least {self.ndim()} samples to be able to do inference"
assert data.shape[1] == self.ndim()
assert np.all(self.f_constr(data)), "Data must respect the constraints of the distribution"
self.mu, self.cov = multivariate_normal.fit(data)
# TODO: This quantity is only accurate up to a constant, as original PDF is not normalized with respect to
# the truncated part. If exact quantity is required, can calculate excess by integrating outside the domain.
def log_prob(self, data: np.ndarray):
rez = multivariate_normal.logpdf(data, self.mu, self.cov)
rez[~self.f_constr(data)] = -np.inf
return rez
Ok, I switched to EMCEE with rejection sampling. Not the fastest solution, but it works