Hi I am trying to establish a model to describe strength data, with the following structure:
X \sim Normal(\mu,\sigma)
C \sim Beta(\alpha,\beta)
Y = C.X
where my data are the measured strength of corroded samples, per Y. Here C represents a reduction factor (in the range 0 to 1, hence Beta is appropriate) that accounts for corrosion. X is the original uncorroded strength (modelled via Normal distribution). I can formulate quite strong priors re. the (\mu,\sigma) parameters that define this distribution in original strength (based on manufacturers data)
I’m unsure how/if I can set up this model in pymc
, since my observed
data does not relate to a definable distribution. Per the above the PDF function for Y is as follows:
f_Y(y) = \int_0^1 f_C(c).f_X(y/c).\frac{1}{c}. dc
(see Distribution of the product of two random variables - Wikipedia)
where f_C(c) is the PDF of Beta(\alpha,\beta) and f_X(x) is the PDF of Normal(\mu,\sigma). Despite this integral being finite-bounded (due to the limited support for C) this integral of the product of these PDFs doesn’t have a nice analytic form, so I couldn’t even define the logp
using a custom density function.
Any thoughts?
I don’t think the suggestion in that link is right. You can try to use scipy to approximate the integral, but there’s no pre-built stuff so you may need to have a look at: Using a “black box” likelihood function — PyMC example gallery
Thanks! I presume that I may then have to accept that my model won’t sample as efficiently, as I won’t be able to compile (since won’t be using pytensor
)? That may be ok, it is not a huge model
Just to follow up, I’ve got this working nicely, and actually have been able to stick with using pytensor
to implement the integral, using gauss quadrature (which is computationally efficient and sufficiently accurate - for now in this script this is for a single gauss point per element length dc
(i.e. is equivalent to trapezium rule) but I will now extend this).
The advantage of this approach versus use of scipy
is that I can still define logp
such that its derivatives can be evaluated, and hence efficient MCMC methods such as NUTS can be used.
Code example is in the attached percent script. The key ideas being that:
- PDF functions of the X~Normal and C~Beta distributions are defined using
pytensor
functions, along with PDF for Y=C.X, implementing the above integral.
- Log-likelihood is just per its definition, i.e. sum(log(PDF)) format
It can be seen that this is performing well with simulated data. Now to move onto conducting inference on the real dataset…
reduction_factor_test.py (15.1 KB)
Thanks again for this input, much appreciated. Hope this is useful to others wishing to do similar.