How to compute the likelihood function based on kernel density estimation results?

Hi, I have been working on a simple recursive bayesian update example. The idea is to keep update my model whenever new observations come in and it can be from different distributions. Given some historical data z_1, z_2,\ldots z_k, I am interested in the following thing given a new observation z_{k+1}:

p(z_1, z_2, \ldots. z_{k+1} \propto p(z_{k+1} | x) p(x|z_1, \ldots z_k)

Where p(z_{k+1} | x) is the likelihood function with mean value at z_{k+1} while
p(x|z_1, \ldots z_k) is prior.

So the first step I am trying to figure out is how to write a custom likelihood function based on the results from Kernel density estimation.

Here is my code:

# 50 samples from N(20,2). Lets assume this is our historical data 
s1 = np.random.normal(20, 2, 50)

# This is the custom prior from the tutorial that makes me be able to learn the prior from KDE results

def from_posterior(param, samples):
    smin, smax = np.min(samples), np.max(samples)
    width = smax - smin
    x = np.linspace(smin, smax, 100)
    y = stats.gaussian_kde(samples)(x)
    x = np.concatenate([[x[0] - 3 * width], x, [x[-1] + 3 * width]])
    y = np.concatenate([[0], y , [0]])
    return Interpolated(param, x, y)

# Here is my own likelihood function
samples = s1
smin, smax = np.min(samples), np.max(samples)
width = smax - smin
kde = stats.gaussian_kde(samples)
x = np.linspace(smin, smax, 100)
y = kde(x)
x = np.concatenate([[x[0] - 3 * width], x, [x[-1] + 3 * width]])
y = np.concatenate([[0], y , [0]])
m = np.sum(x* y/np.sum(y))


def likelihood(samples, m, mu, x):
    kde = stats.gaussian_kde(samples)
    l = kde(x - m + mu) 
    return l

# pymc3 sampling. Given a new observation, say s1[0] again for simplicity purpose, I want to update my posterior. That is to say 
#p(x|z_1,…z_{k+1})  =  p(z_{k+1| x} p(x| z_1,…z_k}
#Where p(x| z_1,…z_k} is prior and p(z_{k+1| x}  is likelihood function.  The resson why I do x-m +mu in my likelihood function is to make sure my likelihood function’s expected value is at the observed value.

basic_model = Model()
with basic_model:
    mu = from_posterior("mu", s1)
    #Y_obs = Normal("Z", mu = mu, observed = s1[0])
    Y_obs = pm.Potential("observed likelihood", likelihood(s1, m, mu, s1[0]))
    trace = sample(1000)

I always got error like

Traceback (most recent call last):
File “”, line 4, in
File “”, line 3, in likelihood
File “/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/scipy/stats/kde.py”, line 241, in evaluate
output_dtype = np.common_type(self.covariance, points)
File “<array_function internals>”, line 180, in common_type
File “/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/numpy/lib/type_check.py”, line 764, in common_type
raise TypeError(“can’t get common type for non-numeric array”)
TypeError: can’t get common type for non-numeric array

But if I remove the kde(x-m+mu) line it will get rid of the error.
Can anyone give me some hints about this issues? Or any other suggestions working on similar problem is appreciated as well.

1 Like

Welcome!

This error likely results from passing tensor arguments to stats.gaussian_kde() which doesn’t know what to do with the data type. That’s why the example uses pm.Interpolated to reconstruct the parameters and then uses a full-fledged pymc distribution (e.g., pm.Normal) for the observed parameter. Is there something in that example that isn’t what you need?

1 Like

The from_posterior indeed works for any prior. However, I am not sure how to use it on the likelihood computation. We need observed part and unobserved part (this can be solved by using the pm.potential) but the return value is a number which is not same as the one returned by the Interpolated

For example, this example:
https://discourse.pymc.io/t/how-to-set-up-a-custom-likelihood-function-for-two-variables/906
provided a good answer how to use any blackbox likelihood function but it does not cover the case that computes the likelihood function based on some empirical density function estimated by KDE.

I think either I missed the part that how to work more with the interpolated function or something else. I have been searching around for hours but do not have luck… After trying for several hours, I think your sense on where the error is is correct. I think the mu that picked in the instance contains two parts, which is value and name, which makes KDE does not know how to deal with this case (it can only deal with number, not something working like a dict with two attributes). The return value from KDE needs to contain two parts as well, it can not be simply a raw number since in the model instance it require to have 2 attributes.

Could you give me some suggestions to modify my likelihood function so that it works with KDE and interpolated()?