Hey! I’m trying to make a statistical model of the vibration amplitudes a certain conductor experiences. My data consists of measurements over a given period of time but instead of the individual measured values I have the histogram of the different registered amplitudes. I’ve been following what’s detailed in:
Fitting a Histogram and Fitting a spectra of gaussians.
I implemented the following:
Since literature for this subject purposes using Weibull distribution for the amplitudes the logp I used is the following:
def mixture_density(alpha, beta, scalling, x):
logp = pm.Weibull.dist(alpha, beta).logp(x)
return scalling * tt.exp(logp)
and the model I used is:
with pm.Model() as disp_model:
alpha = pm.HalfNormal('Alpha', sigma= 1., shape=1)
beta = pm.HalfNormal('Beta', sigma= 1., shape=1)
scalling = pm.HalfNormal('Scale Factor', sigma= 2., shape=1)
noise = pm.HalfNormal('Noise', sigma=1)
normed_disp = pm.Normal('obs',
mixture_density(alpha, beta, scalling, amplitudes),
noise,
observed=tot_cycles)
trace:Dict[str,np.ndarray] = pm.sample(draws=4000, chains = 4, tune=2000, target_accept=0.92)
Which yields the following posteriors for the different parameters:
And the posterior samples from the density function fit the data quite nicely:
The next step is to sample from this adjusted distribution for which I use:
def theano_weibull_samples(a, b, scale = 1, size=None):
uniform = np.random.uniform(size=size)
return b * (-tt.log(uniform/scale)) ** (1 / a)
with disp_model:
amp_samples = pm.Deterministic('amplitudes',
theano_weibull_samples(alpha,
beta,
scale=scalling,
size=24000))
samples:Dict[str,np.ndarray] = pm.sample_posterior_predictive(trace, samples = 1000, var_names=['amplitudes'])
My question is the following: Am I overfitting the model by including the scalling parameter? When I sample from the distribution using scalling = 1
instead of the sampled variable I get very simillar results but when I exclude the scalling
parameter from the initial inference then I get something more akin to an Exponential distribution but the sampling process yields very strange results:
Which is the best practice?