Initial evaluation of model at starting point failed! Multinomial and Dirichlet Prior

i have a dataset with columns “local” and “Resultado”. These columns are categorical variables that describe if a team of soccer plays at home or away and the “Resultado” talks about if a team win,lose or draw. I’m trying to model “Resultado” and look if play home or away influentiate the result. This is my code

#data preparation
df_l_w_d=df[[‘Equipo’,‘local’,‘Resultado’]]
categories=np.array([‘Si’,‘No’])
mapeo = {‘L’: 1, ‘W’: 2, ‘D’: 0}
df_l_w_d[‘Resultado’]=df_l_w_d[‘Resultado’].map(mapeo)
results=df_l_w_d[‘Resultado’].values
idx=pd.Categorical(df_l_w_d[‘local’],categories=categories).codes

k=3

#model
coords = {“local”: categories, “local_flat”:categories[idx]}
with pm.Model(coords=coords) as model_local:
frac=pm.Dirichlet(“frac”,a=np.ones(k),dims=“local_flat”)prior
y=pm.Multinomial(‘y’,n=1308,p=frac[idx],observed=results,dims=‘local_flat’)#likelihood
idata_local=pm.sample()

but i get this error:
Auto-assigning NUTS sampler… Initializing NUTS using jitter+adapt_diag…

Output exceeds the size limit. Open the full output data in a text editor

--------------------------------------------------------------------------- SamplingError Traceback (most recent call last) c:\Users\luis\Downloads\ListaDesplegada\ListaDesplegada\bayesian_analysis.ipynb Celda 48 in <cell line: 2>() 3 frac=pm.Dirichlet(“frac”,a=np.ones(k),dims=“local_flat”) 4 y=pm.Multinomial(‘y’,n=1308,p=frac[idx],observed=results) ----> 5 idata_local=pm.sample() File c:\Python310\lib\site-packages\pymc\sampling\mcmc.py:740, in sample**(draws, tune, chains, cores, random_seed, progressbar, progressbar_theme, step, var_names, nuts_sampler, initvals, init, jitter_max_retries, n_init, trace, discard_tuned_samples, compute_convergence_checks, keep_warning_stat, return_inferencedata, idata_kwargs, nuts_sampler_kwargs, callback, mp_ctx, model, kwargs) 738 ip: dict[str, np.ndarray] 739 for ip in initial_points: → 740 model.check_start_vals(ip) 741 _check_start_shape(model, ip) 743 if var_names is not None: File c:\Python310\lib\site-packages\pymc\model\core.py:1765, in Model.check_start_vals**(self, start)** 1762 initial_eval = self.point_logps(point=elem) 1764 if not all(np.isfinite(v) for v in initial_eval.values()): → 1765 raise SamplingError( 1766 “Initial evaluation of model at starting point failed!\n” 1767 f"Starting values:\n{elem}\n\n" 1768 f"Logp initial evaluation results:\n{initial_eval}\n" 1769 “You can call model.debug() for more details.” 1770 ) SamplingError: Initial evaluation of model at starting point failed!

{‘frac_simplex__’: array([-0.80663028, -0.5792175 ])} Logp initial evaluation results: {‘frac’: -3.04, ‘y’: -inf} You can call model.debug() for more details.

this is the debug:
point={‘frac_simplex__’: array([-0.03512017, 0.47570545])}

The variable y has the following parameters:
0: 1308 [id A] <Scalar(int16, shape=())>
1: AdvancedSubtensor1 [id B] <Vector(float64, shape=(?,))>
├─ Softmax{axis=0} [id C] <Vector(float64, shape=(3,))> ‘frac’
│ └─ Sub [id D] <Vector(float64, shape=(3,))>
│ ├─ Join [id E] <Vector(float64, shape=(3,))>
│ │ ├─ 0 [id F] <Scalar(int8, shape=())>
│ │ ├─ frac_simplex__ [id G] <Vector(float64, shape=(2,))>
│ │ └─ Neg [id H] <Vector(float64, shape=(1,))>
│ │ └─ ExpandDims{axis=0} [id I] <Vector(float64, shape=(1,))>
│ │ └─ Sum{axes=None} [id J] <Scalar(float64, shape=())>
│ │ └─ frac_simplex__ [id G] <Vector(float64, shape=(2,))>
│ └─ ExpandDims{axis=0} [id K] <Vector(float64, shape=(1,))>
│ └─ Max{axis=0} [id L] <Scalar(float64, shape=())> ‘max’
│ └─ Join [id E] <Vector(float64, shape=(3,))>
│ └─ ···
└─ [0 1 0 … 1 0 1] [id M] <Vector(uint8, shape=(1308,))>
The parameters evaluate to:
0: 1308
1: [0.3 0.5 0.3 … 0.5 0.3 0.5]
This does not respect one of the following constraints: 0 <= p <= 1, sum(p) = 1, n >= 0

0 <= p <= 1, sum(p) = 1, n >= 0
Apply node that caused the error: Check{0 <= p <= 1, sum(p) = 1, n >= 0}(-inf, All{axes=None}.0)
Toposort index: 17
Inputs types: [TensorType(float64, shape=()), TensorType(bool, shape=())]
Inputs shapes: [(), ()]
Inputs strides: [(), ()]
Inputs values: [array(-inf), array(False)]
Outputs clients: [[DeepCopyOp(y_logprob)]]

Backtrace when the node is created (use PyTensor flag traceback__limit=N to make it longer):
File “c:\Python310\lib\site-packages\pymc\logprob\basic.py”, line 611, in transformed_conditional_logp
temp_logp_terms = conditional_logp(
File “c:\Python310\lib\site-packages\pymc\logprob\basic.py”, line 541, in conditional_logp
q_logprob_vars = _logprob(
File “c:\Python310\lib\functools.py”, line 889, in wrapper
return dispatch(args[0].class)(*args, **kw)
File “c:\Python310\lib\site-packages\pymc\distributions\distribution.py”, line 213, in logp
return class_logp(value, *dist_params)
File “c:\Python310\lib\site-packages\pymc\distributions\multivariate.py”, line 587, in logp
return check_parameters(
File “c:\Python310\lib\site-packages\pymc\distributions\dist_math.py”, line 74, in check_parameters
return CheckParameterValue(msg, can_be_replaced_by_ninf)(expr, all_true_scalar)
File “c:\Python310\lib\site-packages\pytensor\graph\op.py”, line 292, in call
node = self.make_node(*inputs, **kwargs)
File “c:\Python310\lib\site-packages\pytensor\raise_op.py”, line 91, in make_node
[value.type()],

HINT: Use the PyTensor flag exception_verbosity=high for a debug print-out and storage map footprint of this Apply node.
my data looks like this

Equipo local Resultado
0 Athletic Club Si D
1 Athletic Club No W
2 Athletic Club Si W
3 Athletic Club No L
4 Athletic Club Si D
… … … …
1308 Real Sociedad No W
1309 Real Sociedad Si D
1310 Real Sociedad No L
1311 Real Sociedad Si W
1312 Real Sociedad No W

what can i do? Thanks!

Most likely a problem with here:

 y=pm.Multinomial(‘y’,n=1308,p=frac[idx],observed=results,dims=‘local_flat’)#likelihood

frac[idx] should sum to 1, the warning suggests this is violated

This does not respect one of the following constraints: 0 <= p <= 1, sum(p) = 1, n >= 0

0 <= p <= 1, sum(p) = 1, n >= 0

Also n should equal to sum of results. For more applications of multinomial and Dirichlet multinomial you can check:

2 Likes

Actually, I’m following that tutorial ! could be n the reason of the error?. My dataset have 1312 rows and 3 columns “Equipo”, “local”, “Resultado” and in the example of Blog “Dirichlet mixtures of multinomials” have 10 rows and five columns, but the difference between my data and that simulated data is that his values are composed of the count of each specie of tree on each forest and my dataset is the occurrence of a match of each team being home or away and wining, losing or drawing that match. In my case, how can i get the total count to put it in the model? could be this the final solution? Thanks for your appreciation.

I can’t say for certain if this is correct w/o seeing your actual dataset, but if I understand correctly then each row is a single instance of a game, right? If so then you need to restructure your data so that each outcome is a column with the count. I think your data should look something like this

equipo local W L D Total
Athletic Club Si 4 2 1 7
Athletic Club No 3 1 0 4
Real Sociedad Si 2 2 1 5
Real Sociedad No 1 3 1 5


| Real Sociedad | Si | 2 | 2 | 1| 5|
| Real Sociedad | No | 1 | 3 | 1| 4|

I think you should start by first testing to make sure that the model works before adding local as part of the structure. i.e., group by teams and then sum values to just get their record. I think your model might look something like this:

data = pd.DataFrame(....)

# Coords
coords = dict({"outcomes"     : ['W', 'L', 'D'],
               "observations" : np.arange(data.shape[0])})

# PyMC model
with pm.Model(coords=coords) as model:
    frac   = pm.Dirichlet("frac", a=np.ones(len(coords["outcomes"])), dims="outcomes")
    conc   = pm.Lognormal("conc", mu=1, sigma=3)
    counts = pm.DirichletMultinomial("counts", n=data.Total.values, a=frac*conc, observed=data[['W', 'L', 'D']].values,dims=("observations", "outcomes"))

If this is the approach you are going for then you can look to add the local variable as part of the hierarchical structure.

You can use evaluations such as

frac[idx].sum().eval()

to check whether if it sums to one. However whether you should just sum it or sum it along an axis depends on the structure of the problem as explained by JAB. If rows are observations columns are counts then I believe frac[idx].eval().shape should also either have the same shape or be a vector with the same number of elements as the number of columns of data. Similarly if n=X where X is a scalar then it should be equal to results.sum(axis=1) for every entry. Instead of stuff like n=1308, people generally directly put n=results.sum(axis=1) (an exception is when n depends on a parameter for instance)

Hi JAB! i hope you’re going well. My table have the next extructure

Equipo Local Resultado
Athletic Club No W
Athletic Club No L
Athletic Club No L
Athletic Club No L
Athletic Club No D
Athletic Club Si D
Athletic Club Si W
Athletic Club Si D
Athletic Club Si D
Athletic Club Si W
Here we have a sample of my data. The data describes all matches that a club of soccer played. The first column “Equipo” is the name of the team, the next column is “local” that give us information about if this team played at home or away and the last column “Resultado” says that if that team lose win or draw in that match. The first row for example is a match that Athletic Club played ,Local says “No” so this team that day played away and that match they win. My objetive here is study the effect of being at home or away in a match on the result. I understand that this structure of my data is not useful for my model but i’m so confused because i dont know how to restructure my data.

I think to accomplish what you want to do you still need to reshape the data frame. E.g., if your data looks like this:

# Team A, Local == si; Bias towards winning
A_local_si = pm.draw(pm.Multinomial.dist(n=10,p=[.65, .2, .15]))
# Team A, Local == no; Bias towards losing
A_local_no = pm.draw(pm.Multinomial.dist(n=10,p=[.55, .15, .3]))
# Team B, Local == si; Bias towards winning
B_local_si = pm.draw(pm.Multinomial.dist(n=10,p=[.3,.1, .6]))
# Team B, Local == no; Bias towards losing
B_local_no = pm.draw(pm.Multinomial.dist(n=10,p=[.2, .15, .65]))

# Results
results = []
for a in [A_local_si, A_local_no, B_local_si, B_local_no]:
    results.extend(a[0]*["W"] + a[1]*["D"] + a[2]*["L"])

# Data
df = pd.DataFrame({"Equipo": 20*["A"] + 20*["B"], 
                   "Local" : 10*["Si"] + 10*["No"] + 10*["Si"] + 10*["No"],
                   "Resultado": results})

# Factorize teams and outcomes
teams_factorized, team_ids   = pd.factorize(df.Equipo)
outcome_factorized, outcomes = pd.factorize(df.Resultado)
local_factorized, local      = pd.factorize(df.Local)

# Assign to df
df = df.assign(Equipo_factorized    = teams_factorized,
               Resultado_factorized = outcome_factorized,
               Local_factorized     = local_factorized)

Note that the factorize operation is just done to get numeric values. In order to use the Dirichlet-Multinomial then you would need to pivot your data frame.

out = (pd.pivot_table(df, 
               values='Resultado_factorized', 
               index=['Equipo', 'Equipo_factorized', 'Local', 'Local_factorized'],
               columns='Resultado',
               aggfunc="count")
        .fillna(0)
        )

# Get rid of multi index
out = out.set_axis(out.columns.to_list(), axis=1).reset_index()

# Total observations. I assigned it but this allows for flexibility
out = out.assign(Total=out[["D","L","W"]].sum(axis=1))

Now you can proceed with the model. I assume you want to learn what the underlying probability of each outcome is, right? Then you can compare the probability of each outcome given local=si or local==no. I assume this is what you want to do.

# Coords
coords = dict({"resultado"      : outcomes,
               "equipos"        : team_ids,
               "local"          : local,
               "observaciones"  : np.arange(out.shape[0])})

# PyMC model
with pm.Model(coords=coords) as model:
    frac   = pm.Dirichlet("frac", a=np.ones(shape=(len(coords["local"]),len(coords["resultado"]))), dims=("local", "resultado"))
    conc   = pm.Lognormal("conc", mu=1, sigma=3)
    counts = pm.DirichletMultinomial("counts", n=out.Total.values, a=conc*frac[out.Local_factorized], observed=out[['W', 'L', 'D']].values, dims=("observaciones", "resultado"))

pm.model_to_graphviz(model)

The key here is that there are probability vectors for home and away. You could add another dimension to capture the team specific probabilities but the inference does get more difficult.

with pm.Model(coords=coords) as model:
    frac   = pm.Dirichlet("frac", a=np.ones(shape=(len(coords["local"]),len(coords["equipos"]), len(coords["resultado"]))), dims=("local", "equipos", "resultado"))
    conc   = pm.Lognormal("conc", mu=1, sigma=3)
    counts = pm.DirichletMultinomial("counts", n=out.Total.values, a=conc*frac[out.Local_factorized, data.Equipo_factorized, :], observed=out[['W', 'L', 'D']].values, dims=("observaciones", "resultado"))

I have been playing with a similar model but with thousands of rows. In my case the inference was ok on on simulated data but gets messy on real data.

Just a final note, I don’t know that this is the best solution to the problem, but I do think it at least solves your issue of setting up the problem.

Hopefully this helps.