i have a dataset with columns “local” and “Resultado”. These columns are categorical variables that describe if a team of soccer plays at home or away and the “Resultado” talks about if a team win,lose or draw. I’m trying to model “Resultado” and look if play home or away influentiate the result. This is my code
#data preparation
df_l_w_d=df[[‘Equipo’,‘local’,‘Resultado’]]
categories=np.array([‘Si’,‘No’])
mapeo = {‘L’: 1, ‘W’: 2, ‘D’: 0}
df_l_w_d[‘Resultado’]=df_l_w_d[‘Resultado’].map(mapeo)
results=df_l_w_d[‘Resultado’].values
idx=pd.Categorical(df_l_w_d[‘local’],categories=categories).codes
k=3
#model
coords = {“local”: categories, “local_flat”:categories[idx]}
with pm.Model(coords=coords) as model_local:
frac=pm.Dirichlet(“frac”,a=np.ones(k),dims=“local_flat”)prior
y=pm.Multinomial(‘y’,n=1308,p=frac[idx],observed=results,dims=‘local_flat’)#likelihood
idata_local=pm.sample()
but i get this error:
Auto-assigning NUTS sampler… Initializing NUTS using jitter+adapt_diag…
Output exceeds the size limit. Open the full output data in a text editor
--------------------------------------------------------------------------- SamplingError Traceback (most recent call last) c:\Users\luis\Downloads\ListaDesplegada\ListaDesplegada\bayesian_analysis.ipynb Celda 48 in <cell line: 2>() 3 frac=pm.Dirichlet(“frac”,a=np.ones(k),dims=“local_flat”) 4 y=pm.Multinomial(‘y’,n=1308,p=frac[idx],observed=results) ----> 5 idata_local=pm.sample() File c:\Python310\lib\site-packages\pymc\sampling\mcmc.py:740, in sample**(draws, tune, chains, cores, random_seed, progressbar, progressbar_theme, step, var_names, nuts_sampler, initvals, init, jitter_max_retries, n_init, trace, discard_tuned_samples, compute_convergence_checks, keep_warning_stat, return_inferencedata, idata_kwargs, nuts_sampler_kwargs, callback, mp_ctx, model, kwargs) 738 ip: dict[str, np.ndarray] 739 for ip in initial_points: → 740 model.check_start_vals(ip) 741 _check_start_shape(model, ip) 743 if var_names is not None: File c:\Python310\lib\site-packages\pymc\model\core.py:1765, in Model.check_start_vals**(self, start)** 1762 initial_eval = self.point_logps(point=elem) 1764 if not all(np.isfinite(v) for v in initial_eval.values()): → 1765 raise SamplingError( 1766 “Initial evaluation of model at starting point failed!\n” 1767 f"Starting values:\n{elem}\n\n" 1768 f"Logp initial evaluation results:\n{initial_eval}\n" 1769 “You can call model.debug()
for more details.” 1770 ) SamplingError: Initial evaluation of model at starting point failed!
…
{‘frac_simplex__’: array([-0.80663028, -0.5792175 ])} Logp initial evaluation results: {‘frac’: -3.04, ‘y’: -inf} You can call model.debug()
for more details.
this is the debug:
point={‘frac_simplex__’: array([-0.03512017, 0.47570545])}
The variable y has the following parameters:
0: 1308 [id A] <Scalar(int16, shape=())>
1: AdvancedSubtensor1 [id B] <Vector(float64, shape=(?,))>
├─ Softmax{axis=0} [id C] <Vector(float64, shape=(3,))> ‘frac’
│ └─ Sub [id D] <Vector(float64, shape=(3,))>
│ ├─ Join [id E] <Vector(float64, shape=(3,))>
│ │ ├─ 0 [id F] <Scalar(int8, shape=())>
│ │ ├─ frac_simplex__ [id G] <Vector(float64, shape=(2,))>
│ │ └─ Neg [id H] <Vector(float64, shape=(1,))>
│ │ └─ ExpandDims{axis=0} [id I] <Vector(float64, shape=(1,))>
│ │ └─ Sum{axes=None} [id J] <Scalar(float64, shape=())>
│ │ └─ frac_simplex__ [id G] <Vector(float64, shape=(2,))>
│ └─ ExpandDims{axis=0} [id K] <Vector(float64, shape=(1,))>
│ └─ Max{axis=0} [id L] <Scalar(float64, shape=())> ‘max’
│ └─ Join [id E] <Vector(float64, shape=(3,))>
│ └─ ···
└─ [0 1 0 … 1 0 1] [id M] <Vector(uint8, shape=(1308,))>
The parameters evaluate to:
0: 1308
1: [0.3 0.5 0.3 … 0.5 0.3 0.5]
This does not respect one of the following constraints: 0 <= p <= 1, sum(p) = 1, n >= 0
0 <= p <= 1, sum(p) = 1, n >= 0
Apply node that caused the error: Check{0 <= p <= 1, sum(p) = 1, n >= 0}(-inf, All{axes=None}.0)
Toposort index: 17
Inputs types: [TensorType(float64, shape=()), TensorType(bool, shape=())]
Inputs shapes: [(), ()]
Inputs strides: [(), ()]
Inputs values: [array(-inf), array(False)]
Outputs clients: [[DeepCopyOp(y_logprob)]]
Backtrace when the node is created (use PyTensor flag traceback__limit=N to make it longer):
File “c:\Python310\lib\site-packages\pymc\logprob\basic.py”, line 611, in transformed_conditional_logp
temp_logp_terms = conditional_logp(
File “c:\Python310\lib\site-packages\pymc\logprob\basic.py”, line 541, in conditional_logp
q_logprob_vars = _logprob(
File “c:\Python310\lib\functools.py”, line 889, in wrapper
return dispatch(args[0].class)(*args, **kw)
File “c:\Python310\lib\site-packages\pymc\distributions\distribution.py”, line 213, in logp
return class_logp(value, *dist_params)
File “c:\Python310\lib\site-packages\pymc\distributions\multivariate.py”, line 587, in logp
return check_parameters(
File “c:\Python310\lib\site-packages\pymc\distributions\dist_math.py”, line 74, in check_parameters
return CheckParameterValue(msg, can_be_replaced_by_ninf)(expr, all_true_scalar)
File “c:\Python310\lib\site-packages\pytensor\graph\op.py”, line 292, in call
node = self.make_node(*inputs, **kwargs)
File “c:\Python310\lib\site-packages\pytensor\raise_op.py”, line 91, in make_node
[value.type()],
HINT: Use the PyTensor flag exception_verbosity=high
for a debug print-out and storage map footprint of this Apply node.
my data looks like this
Equipo local Resultado
0 Athletic Club Si D
1 Athletic Club No W
2 Athletic Club Si W
3 Athletic Club No L
4 Athletic Club Si D
… … … …
1308 Real Sociedad No W
1309 Real Sociedad Si D
1310 Real Sociedad No L
1311 Real Sociedad Si W
1312 Real Sociedad No W
what can i do? Thanks!