Possible extensions of total_size

junpenglao · June 13, 2017, 1:58pm

I have some trouble understanding the total_size parameter and there is no mention of it in the docs. From the examples I have seen so far the parameter should be set to the total size of the training data when doing mini-batch training. This is simple to interpret if the data is just a 1d array but what should I put in there if I’m training a model on views of 2d data (subsampling in both dimensions)? Should I then use total_size=data.shape[0]*data.shape[1]?

From the gitter conversation it seems that total_size for subsampling in more then 1 dimensions needs some extra support on the pymc3 side. @ferrine suggested the following API:

total_size = int # for shape[0] subsampling
total_size = [int, None, int] # for subsampling [shape[0], shape[2]]
total_size = [int, Ellipsis, int] # for subsampling [shape[0], shape[-1]]

In my 2d case I would write total_size = [data.shape[0], data.shape[1]].

ferrine · June 13, 2017, 2:17pm

The most robust way is using total_size=data.shape

ferrine · June 13, 2017, 2:19pm

It will be working fine for all types of RVs except MultiObserved RV, there I rely on logp_elemwice.shape

Topic		Replies	Views
pm.Minibatch Doc string v5	1	26	May 24, 2025
Minibatch not working v5 bug	11	356	October 2, 2024
Minibatch when latent variable size depends on data dimension Questions	2	676	February 8, 2019
Running with minibatches (memory constraints) Questions	5	1209	January 24, 2018
Wrong posterior variance with Minibatches Questions	2	473	September 12, 2019

Possible extensions of total_size

Related topics