In the following code I want to use pm.DensityDist. The parameter B, T, var is the known data which uploaded from directory.
where
B is a 2D matrix
T is 1D array
var is 1D array.
mp or model is unknown parameter I want to estimate.
I have following questions:
- What value/parameter should I pass in “observed” of pm.DensityDist
- How can I make this code faster as the size of B and T is quite large.
pymc.version = ‘5.13.0’
with basic_model:
mp = pm.Normal("mp",mu=0, sigma=1, shape=N)
def logp(model, B, T, var):
NG=len(T)
data_cal = np.dot(B, model);
data_residuals= data_cal - T;
cov_m=1/(var)*np.identity(NG);
factor= -0.5 * np.dot(np.dot(data_residuals.T,covd),data_residuals)
return factor
loglike = pm.DensityDist('loglike', B, T, var, logp=logp, observed=?)
trace=pm.sample(100,progressbar=True)
Thank you!
First things off, the logp must be implemented with PyTensor variables and operations.
If you don’t know what observed
should be, it probably means you don’t have a typical random variable that is being observed. Instead you can use pm.Potential
with the logp (which still has to be defined with PyTensor variables and operations)
Before continuing with either approach, is there a generative model that you can define instead with pure PyMC building blocks? This looks like some sort of linear regression?
Thank you for your kind reply.
Yes, you are right. This is like a linear regression. I have the matrix B and need to find some coefficients (a: 1D array) which can fit the the “T” by minimizing error ||T - B*a||.
I am new to pymc just trying to learn how can I solve this problem effectively. Kindly suggest if there is another method for this particular problem within the Bayesian framework which I can use.
Thank you!
I suggest browsing :
This is the most beginner one: GLM: Linear regression — PyMC 5.13.1 documentation
You’ll notice none of those examples implement CustomDist, they always use PyMC building blocks. That’s definitely what we would recommend for 99.999% of users and 100% of beginner ones
Thank you for your suggestion. Following one of the example, I have wrote the following code:
with basic_model:
# Priors for the parameters
mp = pm.Normal("mp",mu=0, sigma=1, shape=N)
# Define linear model
y_est = pt.tensor.dot(B,mp)
# Define Normal likelihood
pm.Normal("y", mu=y_est, sigma=cov_data, observed=T)
with basic_model:
trc_ols = pm.sample(
tune=5000,
draws=500,
chains=4,
cores=4,
init="advi+adapt_diag",
n_init=50000,
)
The size of B is 18965X1625.
cov_data is the measurement errors of T.
I have a decent computer and code is showing me more than 6 hours to complete. is it normal for this size of data? Can you please suggest where can I improve it? Just to begin with.
This is my last question regarding this post.
Thank you
1.6k features? That’s quite a lot. Probably a lot of redundancy in there?
You may need to do some feature engineering, or find more structured priors for mp
. Also are you sure about sigma
being fixed? That’s usually also inferred.
Besides that:
- Do you get a warning about blas when you import PyMC? If so you should reinstall using the official instructions: Installation — PyMC 5.13.1 documentation
- Do you have a CUDA GPU? Pick Numpyro as the sampler, so your GPU can crunch through that huge matmul: Faster Sampling with JAX and Numba — PyMC example gallery
1 Like
Thanks again for your reply.
-
Yes, I have variance of data which I want to use in likelihood function through Diagonal covariance matrix,
-
I have installed pymc following the documentation you suggested. After importing the following libraries I do get some warnings:
import pymc as pm
import arviz as az
import pandas as pd
basic_model = pm.Model()
import pytensor as pt
WARNING (pytensor.configdefaults): g++ not detected! PyTensor will be unable to compile C-implementations and will default to Python. Performance may be severely degraded. To remove this warning, set PyTensor flags cxx to an empty string.
WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
- I have only workstation with capacity in attached file.
Thank you
Regards
You are not installing correctly, or are not linking to the right environment (if on a jupyter notebook). Those warnings shouldn’t be there, and indicate you will have significant slow down on models that rely mostly on matmul like yours.