Pickling trace object

gaddamanil16 · September 6, 2018, 1:04am

Hi,

I am trying to save trace object by pickling it. While writing I got a file of huge size around 13GB. Now when I try to unpickle it it is throwing EOF error. Has anyone tried pickling it.

My code is:

#To save
    with open('/raid60/anil.gaddam/trace_advi.pkl', 'wb') as buff:
         pickle.dump({'model': model, 'trace': tr}, buff)
#to load
    with open('E:/Masters Project/dataset/results/LDA/trace_advi.pkl', 'rb') as buff:
    #    u = pickle._Unpickler(buff)
    #    u.encoding = 'latin1'
        data= pickle.load(buff,encoding = 'latin')          #load()
    ericsson_model, trace = data['model'], data['trace']

junpenglao · September 7, 2018, 9:11am

Pickling is usually not very safe - I suggest you to save the trace and model separately, see eg: Saving and Loading GP model in PYMC3

gaddamanil16 · September 15, 2018, 8:40pm

Hi @junpenglao,

So I can simply use pm.save_trace and save in a pickle file. Or how do I do it?
I am running my code on a remote server. I am not able to understand how I can get back trace results so that i can analyse on my computer. And even if I do it, will pm.load_trace work without the context.

Help much needed.

Thanks

gaddamanil16 · September 15, 2018, 9:24pm

I also tried the way it was discussed here:Saving ADVI results and reloading . But when i try to use the saved paramters in approx and apply it back to get inference it takes a lot of time (I had to stop the process because my computer was stuck) because of large dataset size.

In the discussion you have mentioned that the model needs to be same, but when I try to use the same model and then run it, it just cant handle it. Then how can i possibly use that approx object to get inferences on my local.

junpenglao · September 16, 2018, 6:35am

I suggest you to try with a smaller example first. And no you need a model context to load the trace to you need to initialize the model again locally on your machine.

I find that surprising, as you are only saving 2n values (n=number of parameters).

Yes you need to have the same model (same for when you are loading a trace). When you said your machine cant handle it - is it too many nodes to fit into the memory?

gaddamanil16 · September 16, 2018, 3:12pm

I dont get exactly about the number of nodes. My model is:

with model: 
theta = pm.Dirichlet("theta", a=alpha, shape=(D, K))
phi = pm.Dirichlet("phi", a=beta, shape=(K, V))
doc = pm.DensityDist('docs', log_lda(theta,phi), observed=LDA_output.T)

D = 283000
K =150
V = 7500
LDA_output.T = (9944576, 3)

In Memory error it d> oesnt give me anything related to number of nodes.

Traceback (most recent call last):

File “”, line 15, in
doc = pm.DensityDist(‘docs’, log_lda(theta,phi), observed=LDA_output.T)

File “C:\Users\Anil\Anaconda3\lib\site-packages\pymc3\distributions\distribution.py”, line 37, in new
return model.Var(name, dist, data, total_size)

File “C:\Users\Anil\Anaconda3\lib\site-packages\pymc3\model.py”, line 832, in Var
total_size=total_size, model=self)

File “C:\Users\Anil\Anaconda3\lib\site-packages\pymc3\model.py”, line 1288, in init
self.logp_elemwiset = distribution.logp(data)

File “”, line 16, in ll_lda
ll = value[:, 2] *pm.math.logsumexp(np.log(theta[value[:, 0].astype(‘int64’)]) + np.log(phi.T[value[:,1].astype(‘int64’)]), axis = 1).ravel()

File “C:\Users\Anil\Anaconda3\lib\site-packages\theano\tensor\var.py”, line 570, in getitem
return self.take(args[axis], axis)

File “C:\Users\Anil\Anaconda3\lib\site-packages\theano\tensor\var.py”, line 614, in take
return theano.tensor.subtensor.take(self, indices, axis, mode)

File “C:\Users\Anil\Anaconda3\lib\site-packages\theano\tensor\subtensor.py”, line 2431, in take
return advanced_subtensor1(a, indices)

File “C:\Users\Anil\Anaconda3\lib\site-packages\theano\gof\op.py”, line 674, in call
required = thunk()

File “C:\Users\Anil\Anaconda3\lib\site-packages\theano\gof\op.py”, line 862, in rval
thunk()

File “C:\Users\Anil\Anaconda3\lib\site-packages\theano\gof\cc.py”, line 1735, in call
reraise(exc_type, exc_value, exc_trace)

File “C:\Users\Anil\Anaconda3\lib\site-packages\six.py”, line 693, in reraise
raise value

MemoryError: None

junpenglao · September 16, 2018, 7:44pm

I think this is reaching to a limit what PyMC3 can handle… did you try with smaller number of D K and V?

Topic		Replies	Views
Posterior prediction of pickled model and inferencedata v3	0	481	May 18, 2022
Saving a model object with minimal size for later sampling Questions	0	453	November 7, 2019
What is the maximum of the trace? and saving with pickle? Questions	0	831	June 3, 2018
How to save fitted ADVI Result? Questions	3	2095	August 6, 2018
Very high memory usage with each prediction Questions	2	751	November 30, 2020

Pickling trace object

Related topics