Selecting likelihood for scaled count-data

hughlaughter · March 22, 2023, 1:56pm

Im working on a regression-task, to simplify things, lets assume the response can be modelled as following: y = b1*x.
For some reason we have to scale the input-data and the output-data. A reason could be e.g to construct general priors that can function on a vast variety of datasets.
My question is this, assuming that y can solely take on positive integer values and we scale it by for exampling dividing all responses by the maximum of y. Does it make sense to still work with a e.g Normal or Student-T likelihood or should we tweak a poisson likelihood to act on these potential decimal values that however are discrete?

drbenvincent · March 22, 2023, 7:07pm

Hi @hughlaughter

When you’ve scaled so that you don’t have integers, then likelihoods that operate on integers won’t work. You can try it yourself with something like

y = np.arange(10)
pm.logp(pm.Poisson.dist(mu=5), y/4).eval()

You’ll see that you get the same result as

pm.logp(pm.Poisson.dist(mu=5), np.floor(y/4)).eval()

If I understand correctly, tweaking like you suggest basically amounts to a potentially massive filtering of your data and will most likely end up in tears.

So the obvious thing to do is to use a likelihood for continuous data.

Topic		Replies	Views
Choosing the right likelihood for your data Questions	4	576	March 19, 2021
How to model Normally distributed but integer data? Questions	3	669	December 30, 2019
How to choose a distribution for the likelihood step? Questions	20	869	June 3, 2024
Choosing the right likelihood function Questions	3	1425	May 8, 2022
Use of Gamma PDF for Poisson regression? Questions	4	408	December 16, 2020

Selecting likelihood for scaled count-data

Related topics