Hello again and thanks in advance for the help I am trying to set up my priors with a custom function in pymc v5. I figured out that I need to use pm.DensityDist, but I’m probably missing something because my example does not work.
I have 20 variables that each has 3 degrees of freedom. In pymc language, if I understand it correctly, this would be 20 coordinates with 3 model dimensions. For each coordinate, and for each of its dimensions, I want to define a custom prior distribution. For example, the prior for dimension 1 and 2 is a combination of Normal distributions, for dimension 3 it is a uniform distribution.
My priors are constructed from data. Each coordinate may have different input data, so its prior will be different. For example, coordinate 1 may have a prior for its 1’st dimension that consists of 5 Normal distributions added together, same for coordinate 2, and for coordinate 3 it would be the addition of uniform distributions.
Now if all of the above made sense to you, I would really appreciate some guidance on what is the best way to construct these priors and sample the posteriors. I am still finding my way with probabilistic programming so this is the point where I am a bit lost. I have a code below that constructs the prior but gives it as a list, so it is not really compatible with pyMC sampling algorithms. Just to clarify: I will later call these priors in a custom likelihood function and evaluate the log likelihood.
Here is an example code of what I have:
import numpy as np
import pymc as pm
coordinates = 20
dimensions = 3
def prior_means_and_std(dimensions: int = 3,
coordinates: int = 20):
'''Here i just create a test dataset to then create my prior distributions.
The test dataset just gives the mean and std for the first two dimensions, and the lower & upper bounds
for the uniform distribution. For each coordinate, I randomly set a number of datapoints to simulate what
happens with my real dataset. The prior for each coordinate will be a combination of Normal / Uniform
distributions that is equal to the number of datapoints'''
output_table = {}
for coordinate in range(coordinates):
number_of_datapoints = np.random.randint(10) # assign randomly the number of data for this coordinate
means = np.random.uniform(size=(number_of_datapoints, dimensions - 1)) * 10 - 5
stds = np.random.uniform(size=(number_of_datapoints, dimensions - 1)) * 3
lower_lim = np.random.uniform(size=(number_of_datapoints, 1))
upper_lim = lower_lim + np.random.uniform(size=(number_of_datapoints, 1))
output_table['Coordinate ' + str(coordinate)] = [means, stds, lower_lim, upper_lim]
return output_table
def custom_prior(table: dict):
priors = {}
for key, item in table.items():
means = item[0]
stds = item[1]
lower_lim = item[2]
upper_lim = item[3]
p1 = 0
p2 = 0
p3 = 0
for index in range(means.shape[0]):
p1 += pm.Normal.dist(mu=means[index, 0], sigma=stds[index, 0])
p2 += pm.Normal.dist(mu=means[index, 1], sigma=stds[index, 1])
p3 += pm.Uniform.dist(lower=lower_lim[index], upper=upper_lim[index])
priors[key] = [p1, p2, p3]
return priors
# Create sample values for the means, std and lower/upper bounds
prior_table = prior_means_and_std()
# Now create the priors by adding together normal and uniform distributions from pyMC
# this will now return a dictionary with 20 entries, where for each entry there is a list with the three priors.
# But of course, this is not really compatible with what I want to do next..
priors = custom_prior(prior_table)
# I can call these functions and sample them
pm.draw(priors['Coordinate 19'][2])
# But what I actually want to do is to use pm.sample and a custom likelihood function
def custom_likelihood(parameter_realization, observations):
'''My likelihood function involves a more complicated forward model that I skip here. But
it will require a realization of the coordinates to run this forward model and return the
log likelihood'''
def forward_model(parameters):
return parameters
synthetic_response = forward_model(parameter_realization)
# For simplicity, I assume that the likelihood is the Normal distribution and just git the log-likelihood without
# the scaling in front
return ((synthetic_response - observations) ** 2).sum()
some_random_data = np.random.random(size=(coordinates, dimensions))
# Define the model
with pm.Model() as model:
sampled_parameters = pm.DensityDist('params',
logp=lambda value: custom_prior(), # here is one thing that is definitely done wrong
shape=(coordinates,))
observations = pm.Data('observed_data', some_random_data)
likelihood = pm.DensityDist('likelihood',
logp=custom_likelihood(sampled_parameters, observations),
shape=(coordinates,))
# Sample the posterior
with model:
trace = pm.sample(draws=1000, chains=4)