Perplexity of a model

junpenglao · June 13, 2017, 1:19pm

I noticed that in some papers people determine whether the model converged based on perplexity on some data set aside earlier for testing. Is there a way to compute the perplexity of a model with the new OPVI API? As far as I understand, to get the perplexity we need to compute the values of log(q) of the current variational approximation q on user-provided data.

I had a look at the source code and it seems it would amount to evaluating logq (or norm_logq, I’m not sure at this point) but the complexity of the OPVI API is beyond me at this point and I have no idea how to supply user data to these functions. Maybe I’m looking in the wrong place and there is an easier way around it.

Long term I hope to implement a perplexity-based convergence test as a callback.

Are the other techniques of evaluating the convergence, apart from simply looking at average loss, available in pymc3? For example, the Wikipedia article also mentions using cross-entropy as the test function.

junpenglao · June 13, 2017, 1:24pm

Replies:

junpenglao: You can find the convergence evaluation callback for VI in PyMC3 here https://github.com/pymc-devs/pymc3/blob/master/pymc3/variational/callbacks.py#L31
So far we are doing it in by looking at the change of loss in a given window.

As for evaluating the values of log(q) maybe @ferrine knows a quick way to do that.

–
pwl: @junpenglao Doesn’t that callback simply look track a change in the norm of the overall parameter vector of approximations q_i (so means and stdevs of transformed probability distributions) instead looking at the loss? In particular, it is completely ignoring loss in its __call__ definition.

–
ferrine: @pwl Yes, We are currently looking at parameters as it works for both parametric gradient and non parametric gradient descent.
I hope I correctly understood the main idea of the proposed method. Evaluating likelihood for approximate model is quite easy. You can get access to approx.model.observed_RVs in callback and in the first iteration compile a function using approx_lik = approx.apply_replacements(likelihood). Note that likelihood is scaled to the full dataset there if using minibatches

–
pwl: @ferrine thanks for the tip. How do I access likelihood? Is it stored somewhere in approx?

–
ferrine: I think not yet. But you can make it


# first, you create a tensor, that can be any expression that uses pymc3 random nodes
# in our case it is likelihood
likelihood = tt.sum([v.logpt for v in approx.model.observed_RVs])
# to evaluate it under approximate posterior you can replace random nodes with approximation
# for the only sample it is
approx_lik = approx.apply_replacements(likelihood)
# for many samples
approx_liks = approx.sample_node(likelihood, n_samples)
# then you can teke expectation under posterior.
marginal_lik = approx_liks.mean(0)
# after all, function can be compiled
marginal_lik_fn = theano.function([], marginal_lik)

ferrine · June 13, 2017, 1:36pm

Wow, nice, it will be great to continue discussion here

pwl · June 13, 2017, 2:04pm

Great! Thanks for setting this up @junpenglao!

@ferrine Thanks for a thorough answer. One thing I still don’t understand how can I supply custom data point for the likelihood to be evaluated at. I guess instead of computing the mean(0) I would have to evaluate each logpt at some point but I will take a closer look at that tomorrow.

ferrine · June 13, 2017, 4:12pm

data point where likelihood is evaluated can be a shared variable or just a symbolic tensor. There is no way to change it except using theano.clone and having a reference to your data. As a shortcut you can use approx.sample_node(likelihood, n_samples, more_replacements={old_data_tensor:new_data_tensor})

pwl · June 27, 2017, 7:41am

@ferrine I am playing around with the new functionality of sample_node and apply_replacements and I’m slowly getting a hold of it (mostly thanks to your recent blog post, having working examples to build on really helps). I still have some questions though, as some things are not entirely clear to me yet.

Why are there two functions sample_node and apply_replacements if the sample_node with size=1 seems to do the same thing as apply_replacements?

When running a model with the Tracker callback involving sample_node or apply_replacements I’m getting much more cpu usage then without the callback (all 8 cores at 100%). Is that expected? Why?

ferrine · June 27, 2017, 10:18am

More computations are done. That’s why you get cpuload

ferrine · June 27, 2017, 10:22am

As for 2 methods: shapes differ. Moreover internally scan is behind sample_node while apply_replacements does not use them. I agree that is not the best design and it’s better to wrap up in single function

pwl · June 27, 2017, 10:49am

Maybe I was not entirely clear with the CPU load. I’m doing inference on a GPU with usually just one core active so I was surprised to see such a dramatic increase in CPU usage. But that was my mistake: I have not wrapped the test data set with the theano.shared so these computations were done on a CPU instead on a GPU (as far as I understand that). It’s now working much better with the shared arrays.

pwl · June 28, 2017, 8:17am

When running

with pm.Model():
    val = pm.Poisson.dist(
        mu = pm.Gamma('U',1,1,shape=3)).logp(theano.shared(np.arange(3)))
    inference = pm.ADVI()
    inference.approx.sample_node(val)

I’m getting a MethodNotDefined erorr. Am I doing something wrong here or is it a bug? Also the above snippet works perfectly fine if I remove theano.shared and use the numpy array directly.

To be clear, I’m using these flags:

os.environ['THEANO_FLAGS'] = 'device=cuda,floatX=float32,exception_verbosity=high'

junpenglao · June 28, 2017, 8:33am

Are you on master? The above code works fine for me.

pwl · June 28, 2017, 9:09am

Yes, I am on the master (in particular https://github.com/pymc-devs/pymc3/issues/2358 works for me already).

junpenglao · June 28, 2017, 9:13am

Oh I see your edit. It must be a GPU issue then…

pwl · June 28, 2017, 9:42am

I posted an issue about that.

pwl · June 28, 2017, 11:44am

Continuing on sample_node related errors:

data = np.random.randint(3,size=(2,2)).astype(np.float32)
with pm.Model():
    a = pm.Beta('a',1,1,shape=2)
    b = pm.Beta('b',1,1,shape=2)
    psi = theano.tensor.outer(a,b)
    theta = pm.Gamma('g',1,1,shape=(2,2))
    data = pm.ZeroInflatedPoisson('data',theta=theta,psi=psi,observed=data)

    inference = pm.ADVI()
    val = inference.approx.sample_node(
        pm.ZeroInflatedPoisson.dist(theta=theta,psi=psi).logp(data),size=1)
    inference.fit(n=2,
                  callbacks=[pm.callbacks.Tracker(val=val.eval)])

yields AttributeError but trains fine if I remove the callbacks. I tried replacing outer with explicit multiplication but then it complains that AttributeError: mul.0 has no test value. It does not happen for scalar variables though (that is if a and b are scalars).

ferrine · June 28, 2017, 12:25pm

Are you able to reproduce the problem on master?

pwl · June 28, 2017, 12:33pm

Yes, I’m using the master branch (updated a few hours ago).

ferrine · June 28, 2017, 2:55pm

You’d better define tracker before calling fit

ferrine · June 28, 2017, 3:01pm

I see the problem is hidden in missing test_values somewhere in logp. That’s strange. I feel more worried about context dependent flag change

pwl · June 29, 2017, 7:26am

One more thing, it only seems to happen only for ZeroInflatedPoisson.dist(theta=theta,psi=psi), I tried using Poisson.dist(mu=psi) without any issues. And there is no difference between defining tracker outside of the call to fit, if that’s what you meant.

Topic		Replies	Views
Regarding likelihood evaluation in case of a blackbox likelihood Questions theano	4	741	October 9, 2020
Evaluating Variational Inference: Need Access to fitted (AD)VI approximation density Questions	7	1393	November 15, 2018
`sample_node` no longer working? Questions	10	730	September 11, 2017
Collecting ELBO from ADVI Questions	14	1468	April 5, 2018
How to troubleshoot sampling error with custom, not differentiable likelihood? v3	13	1621	October 20, 2022

Perplexity of a model

Replies:

Related topics