# Perplexity of a model

I noticed that in some papers people determine whether the model converged based on perplexity on some data set aside earlier for testing. Is there a way to compute the perplexity of a model with the new OPVI API? As far as I understand, to get the perplexity we need to compute the values of `log(q)` of the current variational approximation `q` on user-provided data.

I had a look at the source code and it seems it would amount to evaluating `logq` (or `norm_logq`, I’m not sure at this point) but the complexity of the OPVI API is beyond me at this point and I have no idea how to supply user data to these functions. Maybe I’m looking in the wrong place and there is an easier way around it.

Long term I hope to implement a perplexity-based convergence test as a callback.

Are the other techniques of evaluating the convergence, apart from simply looking at average loss, available in pymc3? For example, the Wikipedia article also mentions using cross-entropy as the test function.

## Replies:

junpenglao: You can find the convergence evaluation callback for VI in PyMC3 here https://github.com/pymc-devs/pymc3/blob/master/pymc3/variational/callbacks.py#L31
So far we are doing it in by looking at the change of loss in a given window.

As for evaluating the values of `log(q)` maybe @ferrine knows a quick way to do that.

pwl: @junpenglao Doesn’t that callback simply look track a change in the norm of the overall parameter vector of approximations `q_i` (so means and stdevs of transformed probability distributions) instead looking at the loss? In particular, it is completely ignoring `loss` in its `__call__` definition.

ferrine: @pwl Yes, We are currently looking at parameters as it works for both parametric gradient and non parametric gradient descent.
I hope I correctly understood the main idea of the proposed method. Evaluating likelihood for approximate model is quite easy. You can get access to `approx.model.observed_RVs` in callback and in the first iteration compile a function using `approx_lik = approx.apply_replacements(likelihood)`. Note that likelihood is scaled to the full dataset there if using minibatches

pwl: @ferrine thanks for the tip. How do I access `likelihood`? Is it stored somewhere in `approx`?

ferrine: I think not yet. But you can make it

``````
# first, you create a tensor, that can be any expression that uses pymc3 random nodes
# in our case it is likelihood
likelihood = tt.sum([v.logpt for v in approx.model.observed_RVs])
# to evaluate it under approximate posterior you can replace random nodes with approximation
# for the only sample it is
approx_lik = approx.apply_replacements(likelihood)
# for many samples
approx_liks = approx.sample_node(likelihood, n_samples)
# then you can teke expectation under posterior.
marginal_lik = approx_liks.mean(0)
# after all, function can be compiled
marginal_lik_fn = theano.function([], marginal_lik)
``````

Wow, nice, it will be great to continue discussion here

Great! Thanks for setting this up @junpenglao!

@ferrine Thanks for a thorough answer. One thing I still don’t understand how can I supply custom data point for the `likelihood` to be evaluated at. I guess instead of computing the `mean(0)` I would have to evaluate each `logpt` at some point but I will take a closer look at that tomorrow.

data point where likelihood is evaluated can be a shared variable or just a symbolic tensor. There is no way to change it except using `theano.clone` and having a reference to your data. As a shortcut you can use `approx.sample_node(likelihood, n_samples, more_replacements={old_data_tensor:new_data_tensor})`

@ferrine I am playing around with the new functionality of `sample_node` and `apply_replacements` and I’m slowly getting a hold of it (mostly thanks to your recent blog post, having working examples to build on really helps). I still have some questions though, as some things are not entirely clear to me yet.

Why are there two functions `sample_node` and `apply_replacements` if the `sample_node` with `size=1` seems to do the same thing as `apply_replacements`?

When running a model with the `Tracker` callback involving `sample_node` or `apply_replacements` I’m getting much more cpu usage then without the callback (all 8 cores at 100%). Is that expected? Why?

More computations are done. That’s why you get cpuload

As for 2 methods: shapes differ. Moreover internally scan is behind `sample_node` while `apply_replacements` does not use them. I agree that is not the best design and it’s better to wrap up in single function Maybe I was not entirely clear with the CPU load. I’m doing inference on a GPU with usually just one core active so I was surprised to see such a dramatic increase in CPU usage. But that was my mistake: I have not wrapped the test data set with the `theano.shared` so these computations were done on a CPU instead on a GPU (as far as I understand that). It’s now working much better with the shared arrays.

When running

``````with pm.Model():
val = pm.Poisson.dist(
mu = pm.Gamma('U',1,1,shape=3)).logp(theano.shared(np.arange(3)))
inference.approx.sample_node(val)
``````

I’m getting a MethodNotDefined erorr. Am I doing something wrong here or is it a bug? Also the above snippet works perfectly fine if I remove `theano.shared` and use the numpy array directly.

To be clear, I’m using these flags:

``````os.environ['THEANO_FLAGS'] = 'device=cuda,floatX=float32,exception_verbosity=high'
``````

Are you on master? The above code works fine for me.

Yes, I am on the master (in particular https://github.com/pymc-devs/pymc3/issues/2358 works for me already).

Oh I see your edit. It must be a GPU issue then…

I posted an issue about that.

1 Like

Continuing on `sample_node` related errors:

``````data = np.random.randint(3,size=(2,2)).astype(np.float32)
with pm.Model():
a = pm.Beta('a',1,1,shape=2)
b = pm.Beta('b',1,1,shape=2)
psi = theano.tensor.outer(a,b)
theta = pm.Gamma('g',1,1,shape=(2,2))
data = pm.ZeroInflatedPoisson('data',theta=theta,psi=psi,observed=data)

val = inference.approx.sample_node(
pm.ZeroInflatedPoisson.dist(theta=theta,psi=psi).logp(data),size=1)
inference.fit(n=2,
callbacks=[pm.callbacks.Tracker(val=val.eval)])
``````

yields `AttributeError` but trains fine if I remove the callbacks. I tried replacing `outer` with explicit multiplication but then it complains that `AttributeError: mul.0 has no test value`. It does not happen for scalar variables though (that is if `a` and `b` are scalars).

Are you able to reproduce the problem on master?

Yes, I’m using the master branch (updated a few hours ago).

You’d better define tracker before calling fit

I see the problem is hidden in missing test_values somewhere in logp. That’s strange. I feel more worried about context dependent flag change

One more thing, it only seems to happen only for `ZeroInflatedPoisson.dist(theta=theta,psi=psi)`, I tried using `Poisson.dist(mu=psi)` without any issues. And there is no difference between defining tracker outside of the call to `fit`, if that’s what you meant.