Project for learning on pymc site for finding mean temperatures for european cities
with pm.Model(coords=coords) as model: data = pm.ConstantData(“observed_temp”, df_data, dims=(“date”, “city”)) europe_mean = pm.Normal(“europe_mean_temp”, mu=15.0, sigma=3.0) city_offset = pm.Normal(“city_offset”, mu=0.0, sigma=3.0, dims=“city”) city_temperature = pm.Deterministic( “expected_city_temp”, europe_mean + city_offset, dims=“city” ) sigma = pm.Exponential(“sigma”, 1) pm.Normal(“temperature”, mu=city_temperature, sigma=sigma, observed=data, dims=(“date”, “city”)) idata = pm.sample( random_seed=RANDOM_SEED,).
axes = az.plot_trace(idata, var_names=[“europe_mean_temp”, “expected_city_temp”], legend=True);
This is a very elegant method for finding [“europe_mean_temp”, “expected_city_temp”].however I believe that the “europe_mean_temp” which is a very useful and important output of this model but this is not what the posterior samples give or converge to for this variable. Instead I enclose the following explanation:
Calculation of europe_mean_temperature:
(1) city_temperature = (ref_temperature + city_offset),
(2) europe_mean_temperature = {city_temperature(Berlin) + city_temperature(Paris) + city_temperature(San Marino)} / 3,
(3) from (1) and (2), therefore: europe_mean_temperature = ref_temperature + (offset_Berlin + offset_Paris + offset_San_Marino) / 3.
If the sample data is given by:
#Set-up Known Data with wide Pandas format.Equal Samples Sizes
date = pd.date_range(start="2020-05-01", end="2020-07-01")
df = pd.DataFrame({'date':date }).set_index('date')
for city, mu in {"Berlin": 15, "Paris": 16,"San Marino": 20}.items():
df[city] = pm.draw(pm.Normal.dist(mu=mu, sigma=1),len(date))
np.random.seed(4)
df[city]=stats.norm(loc=mu,scale=1).rvs(len(date))
df.index.name = "day"
print('No. of date_measurements =',len(df.index),'\nNumber of_cities =',len(df.columns))
display(df.head())
df.describe().T.round(2)
and the corrected model:
coords = {'mydate':df.index,"mycity": ['Berlin', 'Paris', 'San Marino']}
with pm.Model(coords = coords) as model_1:
data = pm.ConstantData("observed_temp", df, dims=("mydate", "mycity"))
# group priors
ref_temperature = pm.Normal("ref_temperature", mu=15.0, sigma=3.0)
city_offset = pm.Normal("city_offset", mu=0.0, sigma=3.0, dims="mycity")
city_temperature = pm.Deterministic("city_temperature",ref_temperature + city_offset, dims="mycity") #leave in dims='mycity' for plot
sigma = pm.Exponential("sigma", 1)
# Calculation of europe_mean_temperature
offset_Berlin=city_offset[coords['mycity'].index('Berlin')]
offset_Paris=city_offset[coords['mycity'].index('Paris')]
offset_San_Marino=city_offset[coords['mycity'].index('San Marino')]
europe_mean_temperature=pm.Deterministic('europe_mean_temperature',
ref_temperature+(offset_Berlin+offset_Paris+offset_San_Marino)/3)
pm.Normal("temperature", mu=city_temperature, sigma=sigma, observed=data, dims=("mydate", "mycity"))
idata1 = pm.sample(500)
idata1.extend(pm.sample_prior_predictive())
idata1.extend(pm.sample_posterior_predictive(idata1))
az.summary(idata1,kind='stats').rename_axis('Temperature Analysis').round(2)
az.plot_trace(idata1,legend=True)
with the original pymc model the european_mean_temp v city_offset correlation is approx -1. which again raises the suspicion that the european_mean_temp is not the average of the component cities in the model outlined on the pymc website…
As I am new to modelling and to pymc , I put forward this tweek to the pymc introduction site model with great caution and would value your opinion if I am miss-interpreting some aspect of the design. Thank you for your time an expertise on this puzzle. declan