Inferring expectation value of complicated distribution

DBCerigo · October 31, 2017, 5:17pm

Hi,

My goal is to infer the expectation value of a process which outputs real numbers, but the distribution of these real numbers (empirically) is a complicated distribution.

For an example: we will generate data from a Skew-Normal distribution, and will suppose that this represents a distribution which has no corresponding object in PyMC (though of course we know it does). We will attempt to model this process by using a Normal distribution for the processes outputs.

# data generation
skew = 10
mu = 5
sd = 100
x = np.linspace(-50,400,1000)
contribution_observations = scipy.stats.skewnorm.rvs(skew, loc=mu, scale=sd, size=100)
scipy.stats.skewnorm.stats(skew, mu, sd, moments='m') # -> (84.39)

# model 
with pm.Model() as model:
    mu = pm.Uniform('mu', lower=-10, upper=100)
    sd = pm.Uniform('sd', lower=0, upper=300)
    observations = pm.Normal('observations', mu=mu, sd=sd, testval=5, observed=contribution_observations)
    trace = pm.sample(10000)

Then I would get the posterior for the expectation value of my process from the trace of mu. So far it seems that the normal dist works ok in this case, even though the sample_ppc outputs of course give very differing output distributions.

So questions regarding this:

Is modeling a distribution that is not normal by a normal dist bad practice?
- Why?
- What biases could/should I expect?
How does the choice of the RV distribution which we feed our data to affect the probability density space created?
- Is there way of spotting if you have a mis-match? (I tried using a Uniform as an extreme but it just gives -inf energy.)
What is the standard approach to modelling processes with complex/non-standard distributions?
- It seems like densityDist could be what I want, but what if I don’t know/don’t want to guess at the process causing the distribution?*

Here is a gist of plots of the kind of distributions I’m trying to compare expectation values for: https://gist.github.com/DBCerigo/0e45129157698172bfb5fc9ed2ac06d8

Thanks for the advice, hope this is helpful for people in the future!

*Very sorry if that step is whole point of Bayesian modelling and I have clearly misunderstood a central part of how to use PyMC

junpenglao · November 1, 2017, 12:53pm

you can have a look at this example: http://docs.pymc.io/notebooks/dp_mix.html

As for your other questions, I think it would be difficult to provide a straightforward definite answer, as the bias and the effect on the probability space of normal approximation are case dependent. You might want to consult some textbook on approximation inference (e.g., Laplace approximation, variational inference).

Topic		Replies	Views
Log Skew Normal distribution and expected value Questions	0	742	September 4, 2020
Poisson realization of log-normal distribution Questions	4	819	January 28, 2021
Custom Distribution - Multivariate SkewNormal	16	444	December 21, 2023
Specifying complex distribution as a prior	7	347	May 19, 2023
Help With A/B Testing For a Learner Questions	2	384	March 1, 2023

Inferring expectation value of complicated distribution

Related topics