Hey! I’m trying to make a statistical model of the vibration amplitudes a certain conductor experiences. My data consists of measurements over a given period of time but instead of the individual measured values I have the histogram of the different registered amplitudes. I’ve been following what’s detailed in:
Fitting a Histogram and Fitting a spectra of gaussians.
I implemented the following:
Since literature for this subject purposes using Weibull distribution for the amplitudes the logp I used is the following:
def mixture_density(alpha, beta, scalling, x): logp = pm.Weibull.dist(alpha, beta).logp(x) return scalling * tt.exp(logp)
and the model I used is:
with pm.Model() as disp_model: alpha = pm.HalfNormal('Alpha', sigma= 1., shape=1) beta = pm.HalfNormal('Beta', sigma= 1., shape=1) scalling = pm.HalfNormal('Scale Factor', sigma= 2., shape=1) noise = pm.HalfNormal('Noise', sigma=1) normed_disp = pm.Normal('obs', mixture_density(alpha, beta, scalling, amplitudes), noise, observed=tot_cycles) trace:Dict[str,np.ndarray] = pm.sample(draws=4000, chains = 4, tune=2000, target_accept=0.92)
Which yields the following posteriors for the different parameters:
And the posterior samples from the density function fit the data quite nicely:
The next step is to sample from this adjusted distribution for which I use:
def theano_weibull_samples(a, b, scale = 1, size=None): uniform = np.random.uniform(size=size) return b * (-tt.log(uniform/scale)) ** (1 / a) with disp_model: amp_samples = pm.Deterministic('amplitudes', theano_weibull_samples(alpha, beta, scale=scalling, size=24000)) samples:Dict[str,np.ndarray] = pm.sample_posterior_predictive(trace, samples = 1000, var_names=['amplitudes'])
My question is the following: Am I overfitting the model by including the scalling parameter? When I sample from the distribution using
scalling = 1 instead of the sampled variable I get very simillar results but when I exclude the
scalling parameter from the initial inference then I get something more akin to an Exponential distribution but the sampling process yields very strange results:
Which is the best practice?