I am working on a model with 2500 parameters, 50 features and 424560 samples. I am trying to see the fit and performance of the model and to diagnose whether it is overfitting or underfitting. I am referring to Cross-validation FAQ and Model comparison — PyMC 5.6.0 documentation for my analysis. In my model, I am getting below results when I use arviz.loo-
Computed from 1000 posterior samples and 424560 observations log-likelihood matrix.
elpd_loo- Estimate-nan, SE- nan
p_loo- Estimate-nan, Se-
There has been a warning during the calculation. Please check the results.
Pareto k diagnostic values:
Count Pct.
(-Inf, 0.5] (good) 714 0.2%
(0.5, 0.7] (ok) 292 0.1%
(0.7, 1] (bad) 455 0.1%
(1, Inf) (very bad) 423099 99.7%
What are the reasons for having very bad diagnostic values? Does the above result mean that 1000 posterior samples are too less or all the data points are widely spread? Are there any functions in PyMC that can tell me about the effective number of parameters? Any help is greatly appreciated.