Hello,
I am using PyMC3 to solve a problem, and I need help.
In short I have three variables x, y, and z, and I am just fitting some lines in the form y=mu+alphax+betaz, which in PyMC3 I write as:
alpha = pm.Normal('alpha', mu=0, sd=4,shape=1)
beta = pm.Normal('beta', mu=0, sd=4,shape=1)
mu = pm.Normal('mu', mu=0, sd=1,shape=1)
sigma = pm.HalfNormal('sigma', sd=1,shape=1)
**xtzt = pm.MvNormal('xtzt',mu=mu_Lz, cov=cov_Lz, shape=(N,2))**
xt=xtzt[:,0]
zt=xtzt[:,1]
yt = pm.Normal('yt',mu=mu+alpha*xt+beta*zt,sd=sigma,shape=N)
like_x = pm.Normal('like_x',mu=xt,sigma=sigma_x_std,shape=N,observed=x_std)
like_y = pm.Normal('like_y',mu=yt,sigma=sigma_y_std,shape=N, observed=y_std)
like_z = pm.Normal('like_z',mu=zt,sigma=sigma_z_std,shape=N, observed=z_std)
As you can see, x and z are drawn here from a Multivariate Normal distribution, and y are drawn from Normal distribution around the mentioned line, with some scatter given by sigma. This works and gives some reasonable result, however this is not exactly what I want.
In fact, the prior on x-z distribution should not be a Multivariate Normal, but a distribution of which I do not know the formula, but I know the 2D distribution in the x-z plane, i.e. at given (x,z) on a grid I have the 2D-PDF (the grid is quite fine with 1000x1000 bins).
Now I know the two requirement by PyMC3 to use your own distribution as a new class is to have two functions that give logp and random, e.g.:
def logp(x,z):
ix=np.round((np.log(x)-np.log(x_range[0]))/x_step).astype(np.int32)
iz=np.round((z-z_range[0])/z_step).astype(np.int32)
return np.log(prob[ix,iz])
def random(size=1):
cdf = np.cumsum(prob.ravel())
cdf = cdf / cdf[-1]
values = np.random.rand(size)
value_bins = np.searchsorted(cdf, values)
x_idx, z_idx = np.unravel_index(value_bins,(len(x),len(z)))
random_from_cdf = np.column_stack(( x[x_idx], z[z_idx]))
return random_from_cdf
As you can see my grid is x-z, with x_step and z_step the increment on x and z. I know that I am not returning something very accurate, but given the fact that the grid is fine, is a good enough approximation for logp.
For the random instead I flatten all the distribution and get the cumulative and then use an approach similar to what is usually done in 1D, i.e. invert the CDF.
However I am not able to write this in a format compatible with a PyMC3 class. Therefore I was wondering if someone ever encountered a similar case, and if you have already have some suggestion on this.
TLDR; Does someone know a way to give to PyMC3 not a known MvPDF, but a 2D PDF computed on a grid. (x and z are continuous even if the PDF is on a grid)