Prior from a 2D histogram

vittorio_ghirardini · February 2, 2021, 2:53pm

Hello,
I am using PyMC3 to solve a problem, and I need help.
In short I have three variables x, y, and z, and I am just fitting some lines in the form y=mu+alphax+betaz, which in PyMC3 I write as:

    alpha = pm.Normal('alpha', mu=0, sd=4,shape=1)
    beta = pm.Normal('beta', mu=0, sd=4,shape=1)
    mu = pm.Normal('mu', mu=0, sd=1,shape=1)
    sigma = pm.HalfNormal('sigma', sd=1,shape=1)

    **xtzt = pm.MvNormal('xtzt',mu=mu_Lz, cov=cov_Lz, shape=(N,2))**
    xt=xtzt[:,0]
    zt=xtzt[:,1]
    yt = pm.Normal('yt',mu=mu+alpha*xt+beta*zt,sd=sigma,shape=N)


    like_x = pm.Normal('like_x',mu=xt,sigma=sigma_x_std,shape=N,observed=x_std)
    like_y = pm.Normal('like_y',mu=yt,sigma=sigma_y_std,shape=N, observed=y_std)
    like_z = pm.Normal('like_z',mu=zt,sigma=sigma_z_std,shape=N, observed=z_std)

As you can see, x and z are drawn here from a Multivariate Normal distribution, and y are drawn from Normal distribution around the mentioned line, with some scatter given by sigma. This works and gives some reasonable result, however this is not exactly what I want.

In fact, the prior on x-z distribution should not be a Multivariate Normal, but a distribution of which I do not know the formula, but I know the 2D distribution in the x-z plane, i.e. at given (x,z) on a grid I have the 2D-PDF (the grid is quite fine with 1000x1000 bins).

Now I know the two requirement by PyMC3 to use your own distribution as a new class is to have two functions that give logp and random, e.g.:

def logp(x,z):
ix=np.round((np.log(x)-np.log(x_range[0]))/x_step).astype(np.int32)
iz=np.round((z-z_range[0])/z_step).astype(np.int32)
return np.log(prob[ix,iz])

def random(size=1):
cdf = np.cumsum(prob.ravel())
cdf = cdf / cdf[-1]
values = np.random.rand(size)
value_bins = np.searchsorted(cdf, values)
x_idx, z_idx = np.unravel_index(value_bins,(len(x),len(z)))
random_from_cdf = np.column_stack(( x[x_idx], z[z_idx]))
return random_from_cdf

As you can see my grid is x-z, with x_step and z_step the increment on x and z. I know that I am not returning something very accurate, but given the fact that the grid is fine, is a good enough approximation for logp.

For the random instead I flatten all the distribution and get the cumulative and then use an approach similar to what is usually done in 1D, i.e. invert the CDF.

However I am not able to write this in a format compatible with a PyMC3 class. Therefore I was wondering if someone ever encountered a similar case, and if you have already have some suggestion on this.

TLDR; Does someone know a way to give to PyMC3 not a known MvPDF, but a 2D PDF computed on a grid. (x and z are continuous even if the PDF is on a grid)

ricardoV94 · February 2, 2021, 3:02pm

Just in case it saves you some trouble, you only need the random method if you want to do prior or posterior predictive sampling. If all you need is the posterior trace (what you get with pm.sample()), it’s fine to only define the logp method.

vittorio_ghirardini · February 2, 2021, 3:58pm

Thanks, I also think I only need the logp method. In fact I was playing around with DensityDist and Potential, however the function as they are keep giving me errors when loading it as a model.

vittorio_ghirardini · February 4, 2021, 10:40am

The real issue is, as far as I’ve tried with DensityDist for instance, that the logp function as I’ve wrote assumes input numpy array, while this is not what PyMC3 wants, and in fact gives error…

xcorat · June 6, 2023, 10:56pm

I guess this is related to the fact that we cannot compute gradient for arbitrary PDFs?

In case someone find this helpful, for the 1D case (in v5), you can use pymc.Interpolated, but I haven’t been able to use the 2D PDFs.

Maybe there’s a way to implement the equivalent multivariate version from scipy.interpolate and implement the required logp, rand etc.

ricardoV94 · June 7, 2023, 6:20am

This is another option: prior_from_idata — pymc_experimental 0.0.5 documentation

Topic		Replies	Views
How to use the posterior distribution of one model as a prior distribution for another model	19	1465	April 9, 2024
Unknown joint distribution, known conditional marginals Questions	2	747	February 27, 2019
How to set up a custom likelihood function for two variables Questions	8	13261	March 9, 2018
Using pm.MvNormal for setting prior distribution Questions	6	974	October 8, 2020
How to create and evaluate logp of Deterministic MvNormal as part of PyMC model? (Or use Potential) v5 modeling	1	337	April 24, 2023

Prior from a 2D histogram

Related topics