Question about General Modeling Techniques

jordan.howell2 · May 10, 2022, 11:46am

Is there any reason to “not” transform the predicted variable (y) into it’s log form? I’m trying to develop a time series model of Poisson distributed data and it seems to fit better when I transform the y variable to it’s log form and fit it with a Normal Distribution.

But I’m not sure if there are risks doing that…are there risks?

mfansler · May 10, 2022, 4:26pm

“it seems to fit better”

How is that being assessed? Have you ruled out that the data may be over- or under-dispersed? That is, the “betterness” could simply be that the variance is modeled by an additional parameter. Gamma-Poisson is a more typical model for such a case, staying within the discrete support while modeling dispersion.

jordan.howell2 · May 10, 2022, 4:46pm

I am assessing the fit by looking at the predictive posterior samples in addition to the plot trace and summary stats…

Posterior Predictive
~Normal Distro w/ Log(Y)

-Poisson Distro w/ Non-transformed Y

-Normal DIstro w/ Non Transformed Y

Forgive my naivete but how do I check for over/under dispersion? I assume this means the variance but correct me if I’m wrong.

Thank you for the tidbit on gamma-poisson. Is there a tutorial or example to demonstrate this?

mfansler · May 10, 2022, 5:11pm

That looks overdispersed - and yes, more variance in the data (after conditioning on covariates) than the mean of Poisson can support. The Gamma-Poisson is also known as Negative Binomial. There is a regression example in this notebook.

jessegrabowski · May 11, 2022, 2:51am

Transformation of the observation variable is super common, especially in the time series domain (where I think you are working). Basically every analysis starts with scaling or de-trending one way or another (you can’t even start doing classical time series analysis until variables have been transformed to be stationary!). Statistically there’s no danger to doing transformations – modeling one way is as valid as modeling another. Just make sure you keep track of all the transformations your do, and make sure you undo them in the right order once it comes time to check your predictions.

To the extent there is danger, it comes from 1) how to interpret your results, and 2) how to communicate those results to stake holders, who might not have any statistics training.

Problem 1 isn’t so bad, especially if all your transformations are reversible.

Problem 2, you have to be careful and think about what the key points of the analysis are. For example, transforming your sales into logs and using a normal distribution means the coefficients of your model are semi-elasticities (i.e. a 1 unit change in X leads to a \beta% change in the predicted mean of y).

If you instead model the rate parameter of a Poisson, actually the interpretation is similar because you end up taking logs anyway, but it’s common to talk about “rates” in this context, i.e. the rate of sales increase by \beta percent given a unit increase in X (since you will model \lambda = \exp(f(X)), so \mathbb{E}[\log(y_t)] = \lambda = f(X) )

Topic		Replies	Views
Poisson realization of log-normal distribution Questions	4	844	January 28, 2021
Modeling count time series (Negative Binomial VS Normal) Questions	2	1037	March 25, 2020
Biased (?) results from Poisson regression model Questions	4	1172	January 17, 2020
Help fixing an over dispersed model v5 modeling	0	14	January 11, 2025
Heteroscedastic errors in Poisson regression Questions	3	1263	January 23, 2020

Question about General Modeling Techniques

Related topics