My goal is to infer the expectation value of a process which outputs real numbers, but the distribution of these real numbers (empirically) is a complicated distribution.
For an example: we will generate data from a Skew-Normal distribution, and will suppose that this represents a distribution which has no corresponding object in PyMC (though of course we know it does). We will attempt to model this process by using a Normal distribution for the processes outputs.
# data generation skew = 10 mu = 5 sd = 100 x = np.linspace(-50,400,1000) contribution_observations = scipy.stats.skewnorm.rvs(skew, loc=mu, scale=sd, size=100) scipy.stats.skewnorm.stats(skew, mu, sd, moments='m') # -> (84.39) # model with pm.Model() as model: mu = pm.Uniform('mu', lower=-10, upper=100) sd = pm.Uniform('sd', lower=0, upper=300) observations = pm.Normal('observations', mu=mu, sd=sd, testval=5, observed=contribution_observations) trace = pm.sample(10000)
Then I would get the posterior for the expectation value of my process from the trace of mu. So far it seems that the normal dist works ok in this case, even though the
sample_ppc outputs of course give very differing output distributions.
So questions regarding this:
- Is modeling a distribution that is not normal by a normal dist bad practice?
- What biases could/should I expect?
- How does the choice of the RV distribution which we feed our data to affect the probability density space created?
- Is there way of spotting if you have a mis-match? (I tried using a Uniform as an extreme but it just gives -inf energy.)
- What is the standard approach to modelling processes with complex/non-standard distributions?
- It seems like
densityDistcould be what I want, but what if I don’t know/don’t want to guess at the process causing the distribution?*
- It seems like
Here is a gist of plots of the kind of distributions I’m trying to compare expectation values for: https://gist.github.com/DBCerigo/0e45129157698172bfb5fc9ed2ac06d8
Thanks for the advice, hope this is helpful for people in the future!
*Very sorry if that step is whole point of Bayesian modelling and I have clearly misunderstood a central part of how to use PyMC