Hi folks -
I’m running a big model that is crashing memory mid-sampling (killled 9 errors) and I am attempting to move to use mcbackend (thanks @michaelosthege for it). However, in getting things working on a minimal model on my machine, the to_inferencedata
function is coming up empty, even where get_run
has values. In short I could use help getting an example going so I can try it on my big model. My example (basic analytics model from @fonnesbeck):
import arviz as az
import matplotlib.pyplot as plt
import matplotlib as mpl
import seaborn as sns
import numpy as np
import pandas as pd
import pymc as pm
import pytensor.tensor as pt
import clickhouse_driver
import mcbackend
import warnings
warnings.filterwarnings("ignore", category=UserWarning)
RANDOM_SEED = 42
print(f"Running on PyMC v{pm.__version__}")
baseball_data = pd.read_csv('https://raw.githubusercontent.com/fonnesbeck/hierarchical_models_sports_analytics/main/data/stats_by_player_team.csv')
baseball_data['label'] = baseball_data.batter_name + ' (' + baseball_data.name_abbrev + ') ' + baseball_data.season.astype(str)
baseball_data = baseball_data.rename(columns={'name_abbrev': 'team'})
fitting_subset = baseball_data[baseball_data.season<2023].dropna()
pa, hr = fitting_subset[['pa', 'hr']].astype(int).values.T
coords = {'batter':fitting_subset.label.values}
with pm.Model(coords=coords) as uninformative_prior_model:
p = pm.Uniform('p', 0, 1, dims='batter')
y = pm.Binomial('y', n=pa, p=p, observed=hr, dims='batter')
# Create clickhouse backend
ch_client = clickhouse_driver.Client("localhost")
ch_backend = mcbackend.ClickHouseBackend(ch_client)
with uninformative_prior_model:
pm.sample(draws=100, tune=100, cores=4, chains=4, random_seed=RANDOM_SEED, trace=ch_backend)
This all goes fine in the sense that I can retrieve the run (note there is no trace
to access the rid
in @michaelosthege’s code example: GitHub - pymc-devs/mcbackend: A backend for storing MCMC draws., so I have used get_runs
):
# Fetch the most recent run from the database
model_run = ch_backend.get_run(ch_backend.get_runs().index[-1])
ch_trace = model_run.to_inferencedata()
and with model_run.get_chains()[0].get_draws('p')
I get:
array([[0.31278036, 0.52744033, 0.27108869, ..., 0.48431582, 0.45812994,
0.29023263],
[0.31278036, 0.52744033, 0.27108869, ..., 0.48431582, 0.45812994,
0.29023263],
[0.08720657, 0.31698537, 0.12071515, ..., 0.14424979, 0.34796451,
0.13165968],
...,
[0.0802733 , 0.03178383, 0.06493201, ..., 0.02980206, 0.03055315,
0.01064518],
[0.07095099, 0.06097259, 0.05025255, ..., 0.03906811, 0.02481613,
0.01069069],
[0.08256711, 0.06653404, 0.08217616, ..., 0.01457395, 0.0243532 ,
0.02938725]])`
while with ch_trace.posterior.p
i get:
xarray.DataArray
'p'
chain: 4draw: 0batter: 1494
array([], shape=(4, 0, 1494), dtype=float64)
Coordinates:
chain
(chain)
int64
0 1 2 3
draw
(draw)
int64
batter
(batter)
<U33
'Pujols, Albert (STL) 2022' ... ...
Indexes: (3)
Attributes: (0)
which as zero draws and an empty xarray. Any thoughts on what’s going on here? Thanks much,