How to compute the likelihood function based on kernel density estimation results?

meyang · May 4, 2022, 11:02pm

Hi, I have been working on a simple recursive bayesian update example. The idea is to keep update my model whenever new observations come in and it can be from different distributions. Given some historical data z_1, z_2,\ldots z_k, I am interested in the following thing given a new observation z_{k+1}:

p(z_1, z_2, \ldots. z_{k+1} \propto p(z_{k+1} | x) p(x|z_1, \ldots z_k)

Where p(z_{k+1} | x) is the likelihood function with mean value at z_{k+1} while
p(x|z_1, \ldots z_k) is prior.

So the first step I am trying to figure out is how to write a custom likelihood function based on the results from Kernel density estimation.

Here is my code:

# 50 samples from N(20,2). Lets assume this is our historical data 
s1 = np.random.normal(20, 2, 50)

# This is the custom prior from the tutorial that makes me be able to learn the prior from KDE results

def from_posterior(param, samples):
    smin, smax = np.min(samples), np.max(samples)
    width = smax - smin
    x = np.linspace(smin, smax, 100)
    y = stats.gaussian_kde(samples)(x)
    x = np.concatenate([[x[0] - 3 * width], x, [x[-1] + 3 * width]])
    y = np.concatenate([[0], y , [0]])
    return Interpolated(param, x, y)

# Here is my own likelihood function
samples = s1
smin, smax = np.min(samples), np.max(samples)
width = smax - smin
kde = stats.gaussian_kde(samples)
x = np.linspace(smin, smax, 100)
y = kde(x)
x = np.concatenate([[x[0] - 3 * width], x, [x[-1] + 3 * width]])
y = np.concatenate([[0], y , [0]])
m = np.sum(x* y/np.sum(y))


def likelihood(samples, m, mu, x):
    kde = stats.gaussian_kde(samples)
    l = kde(x - m + mu) 
    return l

# pymc3 sampling. Given a new observation, say s1[0] again for simplicity purpose, I want to update my posterior. That is to say 
#p(x|z_1,…z_{k+1})  =  p(z_{k+1| x} p(x| z_1,…z_k}
#Where p(x| z_1,…z_k} is prior and p(z_{k+1| x}  is likelihood function.  The resson why I do x-m +mu in my likelihood function is to make sure my likelihood function’s expected value is at the observed value.

basic_model = Model()
with basic_model:
    mu = from_posterior("mu", s1)
    #Y_obs = Normal("Z", mu = mu, observed = s1[0])
    Y_obs = pm.Potential("observed likelihood", likelihood(s1, m, mu, s1[0]))
    trace = sample(1000)

I always got error like

Traceback (most recent call last):
File “”, line 4, in
File “”, line 3, in likelihood
File “/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/scipy/stats/kde.py”, line 241, in evaluate
output_dtype = np.common_type(self.covariance, points)
File “<array_function internals>”, line 180, in common_type
File “/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/numpy/lib/type_check.py”, line 764, in common_type
raise TypeError(“can’t get common type for non-numeric array”)
TypeError: can’t get common type for non-numeric array

But if I remove the kde(x-m+mu) line it will get rid of the error.
Can anyone give me some hints about this issues? Or any other suggestions working on similar problem is appreciated as well.

cluhmann · May 5, 2022, 2:15am

Welcome!

This error likely results from passing tensor arguments to stats.gaussian_kde() which doesn’t know what to do with the data type. That’s why the example uses pm.Interpolated to reconstruct the parameters and then uses a full-fledged pymc distribution (e.g., pm.Normal) for the observed parameter. Is there something in that example that isn’t what you need?

meyang · May 5, 2022, 3:06am

The from_posterior indeed works for any prior. However, I am not sure how to use it on the likelihood computation. We need observed part and unobserved part (this can be solved by using the pm.potential) but the return value is a number which is not same as the one returned by the Interpolated

meyang · May 5, 2022, 3:08am

For example, this example:
https://discourse.pymc.io/t/how-to-set-up-a-custom-likelihood-function-for-two-variables/906
provided a good answer how to use any blackbox likelihood function but it does not cover the case that computes the likelihood function based on some empirical density function estimated by KDE.

I think either I missed the part that how to work more with the interpolated function or something else. I have been searching around for hours but do not have luck… After trying for several hours, I think your sense on where the error is is correct. I think the mu that picked in the instance contains two parts, which is value and name, which makes KDE does not know how to deal with this case (it can only deal with number, not something working like a dict with two attributes). The return value from KDE needs to contain two parts as well, it can not be simply a raw number since in the model instance it require to have 2 attributes.

Could you give me some suggestions to modify my likelihood function so that it works with KDE and interpolated()?

Topic		Replies	Views
Custom likelihood function from kernel density estimation Questions	1	1947	August 30, 2018
Likelihood Specification and DensityDist Questions doc	3	999	October 5, 2021
Custom likelihood functions with DensityDist Sharing	1	807	December 9, 2020
Custom likelihood from a python function v5 modeling	5	906	May 27, 2024
Custom likelihood computation with no direct model-data comparison version agnostic	4	720	February 8, 2023

How to compute the likelihood function based on kernel density estimation results?

Related topics