How to use costum logliklihood function effectively using pm.DensityDist

Mahak_Singh_Chauhan · April 18, 2024, 10:45am

In the following code I want to use pm.DensityDist. The parameter B, T, var is the known data which uploaded from directory.

where
B is a 2D matrix
T is 1D array
var is 1D array.

mp or model is unknown parameter I want to estimate.

I have following questions:

What value/parameter should I pass in “observed” of pm.DensityDist
How can I make this code faster as the size of B and T is quite large.

pymc.version = ‘5.13.0’

with basic_model:
    mp = pm.Normal("mp",mu=0, sigma=1, shape=N)
    
    def logp(model, B, T, var):            
        
        NG=len(T)

        data_cal = np.dot(B, model);
        
        data_residuals= data_cal - T;
        cov_m=1/(var)*np.identity(NG);
        factor=  -0.5 * np.dot(np.dot(data_residuals.T,covd),data_residuals)
    
        return factor

    loglike = pm.DensityDist('loglike', B, T, var, logp=logp, observed=?)
    trace=pm.sample(100,progressbar=True)

Thank you!

ricardoV94 · April 18, 2024, 10:55am

First things off, the logp must be implemented with PyTensor variables and operations.

If you don’t know what observed should be, it probably means you don’t have a typical random variable that is being observed. Instead you can use pm.Potential with the logp (which still has to be defined with PyTensor variables and operations)

Before continuing with either approach, is there a generative model that you can define instead with pure PyMC building blocks? This looks like some sort of linear regression?

Mahak_Singh_Chauhan · April 18, 2024, 11:10am

Thank you for your kind reply.

Yes, you are right. This is like a linear regression. I have the matrix B and need to find some coefficients (a: 1D array) which can fit the the “T” by minimizing error ||T - B*a||.

I am new to pymc just trying to learn how can I solve this problem effectively. Kindly suggest if there is another method for this particular problem within the Bayesian framework which I can use.

Thank you!

ricardoV94 · April 18, 2024, 11:35am

I suggest browsing :

This is the most beginner one: GLM: Linear regression — PyMC 5.13.1 documentation

ricardoV94 · April 18, 2024, 11:37am

You’ll notice none of those examples implement CustomDist, they always use PyMC building blocks. That’s definitely what we would recommend for 99.999% of users and 100% of beginner ones

Mahak_Singh_Chauhan · April 18, 2024, 11:54am

Thank you for your suggestion. Following one of the example, I have wrote the following code:

with basic_model:
    # Priors for the parameters
    mp = pm.Normal("mp",mu=0, sigma=1, shape=N)

    # Define linear model
    y_est = pt.tensor.dot(B,mp)

    # Define Normal likelihood
    pm.Normal("y", mu=y_est, sigma=cov_data, observed=T)


with basic_model:
    trc_ols = pm.sample(
        tune=5000,
        draws=500,
        chains=4,
        cores=4,
        init="advi+adapt_diag",
        n_init=50000,
    )

The size of B is 18965X1625.
cov_data is the measurement errors of T.

I have a decent computer and code is showing me more than 6 hours to complete. is it normal for this size of data? Can you please suggest where can I improve it? Just to begin with.

This is my last question regarding this post.

Thank you

ricardoV94 · April 18, 2024, 12:06pm

1.6k features? That’s quite a lot. Probably a lot of redundancy in there?

You may need to do some feature engineering, or find more structured priors for mp. Also are you sure about sigma being fixed? That’s usually also inferred.

Besides that:

Do you get a warning about blas when you import PyMC? If so you should reinstall using the official instructions: Installation — PyMC 5.13.1 documentation
Do you have a CUDA GPU? Pick Numpyro as the sampler, so your GPU can crunch through that huge matmul: Faster Sampling with JAX and Numba — PyMC example gallery

Mahak_Singh_Chauhan · April 18, 2024, 12:15pm

Thanks again for your reply.

Yes, I have variance of data which I want to use in likelihood function through Diagonal covariance matrix,
I have installed pymc following the documentation you suggested. After importing the following libraries I do get some warnings:

import pymc as pm
import arviz as az
import pandas as pd
basic_model = pm.Model()
import pytensor as pt

WARNING (pytensor.configdefaults): g++ not detected!  PyTensor will be unable to compile C-implementations and will default to Python. Performance may be severely degraded. To remove this warning, set PyTensor flags cxx to an empty string.
WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.

I have only workstation with capacity in attached file.

Thank you
Regards

ricardoV94 · April 18, 2024, 12:17pm

You are not installing correctly, or are not linking to the right environment (if on a jupyter notebook). Those warnings shouldn’t be there, and indicate you will have significant slow down on models that rely mostly on matmul like yours.

Topic		Replies	Views
How to use the observed parameter and what's the meaning? Questions	2	4053	June 12, 2021
Making custom multivariate log density function with input of different dimensions work v5 modeling	1	310	May 21, 2023
Using pm.DensityDist and customized likelihood with a black-box function Questions	10	6808	August 29, 2018
Calculation of joint density from pymc model v5	1	28	May 19, 2025
DensityDist in PyMC4 PyMC4	2	985	February 12, 2020

How to use costum logliklihood function effectively using pm.DensityDist

Related topics