# How to use costum logliklihood function effectively using pm.DensityDist

In the following code I want to use pm.DensityDist. The parameter B, T, var is the known data which uploaded from directory.

where
B is a 2D matrix
T is 1D array
var is 1D array.

mp or model is unknown parameter I want to estimate.

I have following questions:

1. What value/parameter should I pass in “observed” of pm.DensityDist
2. How can I make this code faster as the size of B and T is quite large.

pymc.version = ‘5.13.0’

``````with basic_model:
mp = pm.Normal("mp",mu=0, sigma=1, shape=N)

def logp(model, B, T, var):

NG=len(T)

data_cal = np.dot(B, model);

data_residuals= data_cal - T;
cov_m=1/(var)*np.identity(NG);
factor=  -0.5 * np.dot(np.dot(data_residuals.T,covd),data_residuals)

return factor

loglike = pm.DensityDist('loglike', B, T, var, logp=logp, observed=?)
trace=pm.sample(100,progressbar=True)
``````

Thank you!

First things off, the logp must be implemented with PyTensor variables and operations.

If you don’t know what `observed` should be, it probably means you don’t have a typical random variable that is being observed. Instead you can use `pm.Potential` with the logp (which still has to be defined with PyTensor variables and operations)

Before continuing with either approach, is there a generative model that you can define instead with pure PyMC building blocks? This looks like some sort of linear regression?

Yes, you are right. This is like a linear regression. I have the matrix B and need to find some coefficients (a: 1D array) which can fit the the “T” by minimizing error ||T - B*a||.

I am new to pymc just trying to learn how can I solve this problem effectively. Kindly suggest if there is another method for this particular problem within the Bayesian framework which I can use.

Thank you!

I suggest browsing :

This is the most beginner one: GLM: Linear regression — PyMC 5.13.1 documentation

You’ll notice none of those examples implement CustomDist, they always use PyMC building blocks. That’s definitely what we would recommend for 99.999% of users and 100% of beginner ones

Thank you for your suggestion. Following one of the example, I have wrote the following code:

``````with basic_model:
# Priors for the parameters
mp = pm.Normal("mp",mu=0, sigma=1, shape=N)

# Define linear model
y_est = pt.tensor.dot(B,mp)

# Define Normal likelihood
pm.Normal("y", mu=y_est, sigma=cov_data, observed=T)

with basic_model:
trc_ols = pm.sample(
tune=5000,
draws=500,
chains=4,
cores=4,
n_init=50000,
)
``````

The size of B is 18965X1625.
cov_data is the measurement errors of T.

I have a decent computer and code is showing me more than 6 hours to complete. is it normal for this size of data? Can you please suggest where can I improve it? Just to begin with.

This is my last question regarding this post.

Thank you

1.6k features? That’s quite a lot. Probably a lot of redundancy in there?

You may need to do some feature engineering, or find more structured priors for `mp`. Also are you sure about `sigma` being fixed? That’s usually also inferred.

Besides that:

1. Do you get a warning about blas when you import PyMC? If so you should reinstall using the official instructions: Installation — PyMC 5.13.1 documentation
2. Do you have a CUDA GPU? Pick Numpyro as the sampler, so your GPU can crunch through that huge matmul: Faster Sampling with JAX and Numba — PyMC example gallery
1 Like

1. Yes, I have variance of data which I want to use in likelihood function through Diagonal covariance matrix,

2. I have installed pymc following the documentation you suggested. After importing the following libraries I do get some warnings:

``````import pymc as pm
import arviz as az
import pandas as pd
basic_model = pm.Model()
import pytensor as pt
``````
``````WARNING (pytensor.configdefaults): g++ not detected!  PyTensor will be unable to compile C-implementations and will default to Python. Performance may be severely degraded. To remove this warning, set PyTensor flags cxx to an empty string.
WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
``````
1. I have only workstation with capacity in attached file.

Thank you
Regards

You are not installing correctly, or are not linking to the right environment (if on a jupyter notebook). Those warnings shouldn’t be there, and indicate you will have significant slow down on models that rely mostly on matmul like yours.