Declare an internal Poisson distribution inside a custom distribution failes

bhomass · April 23, 2018, 5:45am

I defined a custom distribution PageView which has an internal Poisson distribution

    class PageView(Continuous):
        def __init__(self, mu=None, observed=None, *args, **kwargs):
            super(PageView, self).__init__(*args, **kwargs)
            self.mu = mu = tt.as_tensor_variable(mu)
            self.mode = tt.floor(mu).astype('int32')
            self.poisson = pm.Poisson("ipoisson", mu = mu, observed=observed)

I call it from a client program
pageview = PageView("pv", mu=lambda_tilda0, observed=y_train)

But by the time observed reach the Poisson line, observed is already removed from the cue. And the Poisson call crashes with

    "can't turn {} and {} into a dict. {}".format(args, kwargs, e))
TypeError: can't turn [TensorConstant{[[[0.]
  [..]
  [0.]]]}] and {} into a dict. TensorType does not support iteration. Maybe you are using builtin.sum instead of theano.tensor.sum? (Maybe .max?)

If I don’t pass observed to Poisson, it crashes the same way.

What is the proper way to do this?

junpenglao · April 23, 2018, 5:55am

I dont think you can create a distribution to accept observed this way, as the observed property is associated with a pymc3 random variable, and the observed evaluated on the logp function of a distribution.

If you want to use other distribution from a custom distribution, you need to write them into the logp function.

bhomass · April 23, 2018, 6:05am

I tried that and still the same. The code is very simple

y_train = read_dataset()

def build_model(value):

    def logp(value):
        ipoisson = pm.Poisson("ipoisson", mu = mu)
        return ipoisson.logp(value)
    
    # model
    with pm.Model() as model:
        lambdas = pm.Normal("lambdas", mu=0.0, sd=0.6)
        lambda_tilda0 = np.exp(lambdas)      # must tbe positive
        
        mu=lambda_tilda0
        
        pm.DensityDist('y', logp, observed={'value': value})
        
    return model

def run(n_samples=3000):
    model = build_model(y_train)
    with model:
        trace = pm.sample(n_samples, tune=10000) #, step=step)
    return trace

if __name__ == "__main__":
    run()

Appreciate any advice.

junpenglao · April 23, 2018, 6:34am

I am confused - why not use the Poisson directly?

bhomass · April 23, 2018, 10:04am

Well don’t be. This is only the beginning of a much bigger problem. Once I get thru this babay step I will add the rest one piece at a time. So please help out. I am stuck at this beginner step

junpenglao · April 23, 2018, 12:29pm

It’s a bit difficult to give you advice without knowing what you are trying to achieve. But at least if you are trying to make the above code running without error you can rewrite the logp as:

def logp(value):
    ipoisson = pm.Poisson.dist(mu = mu)
    return ipoisson.logp(value)

bhomass · April 23, 2018, 5:47pm

it seems the only thing you did is remove the name argument, but if I did that, I get

ipoisson = pm.Poisson(mu = mu)

TypeError: new() missing 1 required positional argument: ‘name’

junpenglao · April 23, 2018, 6:04pm

Nope, what i did was pm.Poisson .dist (mu = mu), which allows you to access the logp function without creating a name node in the model.

bhomass · April 23, 2018, 10:47pm

Thank you. my bad for missing that. I have moved past this point now.

bhomass · April 28, 2018, 2:49am

Back to this. As I add more things to this model, things begin to fail in a serious way.

Basically, I want a custom distribution

pm.DensityDist('lh', logp, observed={'visits': visits, 'x_logits' : N_x_logit, 'mu':mu})

The dictionary elements in observed are the only way to pass variables to logp. In my case, I need to pass some numpy arrays, and others are pm variables. eg. mu is derived from pm.Normal, followed by some tt operations.

inside logp(visits, x_logits, mu), I have some pm and tt operations that uses the numpy array visits.

v_value = tt.tile(visits, [1, 1, S])
ipoisson = pm.Poisson.dist(mu = mu, shape=(2,))
log_prob_v = ipoisson.logp(v_value)

prob_v = tt.exp(log_prob_v)

in the last statement, it blows up at runtime with

Expected an array-like object, but found a Variable: maybe you are trying to call a function on a (possibly shared) variable instead of a numeric array?

This series of statements all starts with a real numpy array: visits. Why would it complain about finding a Variable? it should have real values to work with in runtime.

junpenglao · April 28, 2018, 6:59am

I cannot see any obvious error (you are not returning prob_v directly right?). maybe it would be easier if you can share the code with stimulated data.

bhomass · April 28, 2018, 10:40pm

I change the code around to look like arma_example.py. Now getting 0% in training session. But, at least its running

junpenglao · April 29, 2018, 6:59am

In my experience, many theano.scan application could be rewrite into matrix operation, which makes the computation much faster and debugging much easier. Have a look at this example for some inspiration:
http://docs.pymc.io/notebooks/PyMC3_tips_and_heuristic.html

bhomass · April 29, 2018, 7:08am

Thanks. How should I check into this 0% result every iteration in NUTS?

junpenglao · April 29, 2018, 7:24am

I dont quite understand what you meant by 0% result.

bhomass · April 29, 2018, 6:48pm

sorry. this is the output

0%| | 0/13000 [00:00<?, ?it/s]
0%| | 1/13000 [00:02<9:59:27, 2.77s/it]
0%| | 2/13000 [00:03<5:30:53, 1.53s/it]
0%| | 3/13000 [00:04<5:02:13, 1.40s/it]
0%| | 4/13000 [00:05<4:47:37, 1.33s/it]
0%| | 5/13000 [00:06<4:38:57, 1.29s/it]
0%| | 6/13000 [00:11<6:39:07, 1.84s/it]
0%| | 7/13000 [00:11<5:59:58, 1.66s/it]
0%| | 8/13000 [00:16<7:23:14, 2.05s/it]
0%| | 9/13000 [00:18<7:32:36, 2.09s/it]
0%| | 10/13000 [00:21<7:41:45, 2.13s/it]
0%| | 11/13000 [00:22<7:24:03, 2.05s/it]

junpenglao · April 29, 2018, 7:01pm

Oh the sampler is running, it just it is super inefficient - the 0% is the percentage of planned samples had been drawn, which is close to 0.
Again:

bhomass · April 29, 2018, 7:15pm

I have a kind of a time series model, where computation on one period depends on results from the previous period, so I don’t think a matrix would help.

I was just looking for advice how to dig into this inefficiency. like printing out the log likelihood, examining sampled trace, things like that.

junpenglao · April 29, 2018, 7:31pm

There is a profiling option you might find useful: http://docs.pymc.io/notebooks/profiling.html

bhomass · April 29, 2018, 10:36pm

I feed the loop index in the scan call
outputs, updates = theano.scan(fn=body, outputs_info=[combined_prob], sequences=tt.arange(prob_v.shape[1]), non_sequences=[prob_v, N_x_logit] )
When I printed out the loop index, and it appears it is scanning forward then backward,

this is i: str = 15
this is i: str = 14
this is i: str = 13
this is i: str = 12
this is i: str = 11
this is i: str = 10
this is i: str = 9
this is i: str = 8
this is i: str = 7
this is i: str = 6
this is i: str = 5
this is i: str = 4
this is i: str = 3
this is i: str = 2
this is i: str = 1
this is i: str = 0
this is i: str = 0
this is i: str = 1
this is i: str = 2
this is i: str = 3
this is i: str = 4
this is i: str = 5
this is i: str = 6
this is i: str = 7
this is i: str = 8
this is i: str = 9
this is i: str = 10
this is i: str = 11
this is i: str = 12
this is i: str = 13
this is i: str = 14
this is i: str = 15

That’s not how the model is meant to run. Is it true that scan order is oscillating between incrementing and decrementing the sequence index?

Can I fix it to only go forward? or have another loop index I can control?

I tried
[new_index, outputs], updates = theano.scan(fn=body, outputs_info=[index, combined_prob], non_sequences=[prob_v, N_x_logit], n_steps=T )

but, it kept insisting I enter None.
ValueError: Please provide None as outputs_info for any output that does not feed back into scan (i.e. it behaves like a map)
I need to set the starting index value to 0, so I can’t enter None

Topic		Replies	Views
Question About Custom Distribution : the Generalized Poisson v5 bug , modeling	2	320	July 9, 2023
Custom discrete distribution for zero truncated Poisson Questions	10	2569	December 19, 2017
Drawing values from custom distribution Questions	8	990	September 14, 2022
Using CustomDist as a stochastic variable - causing TensorType error v5 pytensor	6	347	December 11, 2023
Custom distribution for Truncated Gamma v3 theano , modeling	8	898	August 24, 2024

Declare an internal Poisson distribution inside a custom distribution failes

Related topics