Pickling trace object

Hi,

I am trying to save trace object by pickling it. While writing I got a file of huge size around 13GB. Now when I try to unpickle it it is throwing EOF error. Has anyone tried pickling it.

My code is:

#To save
    with open('/raid60/anil.gaddam/trace_advi.pkl', 'wb') as buff:
         pickle.dump({'model': model, 'trace': tr}, buff)
#to load
    with open('E:/Masters Project/dataset/results/LDA/trace_advi.pkl', 'rb') as buff:
    #    u = pickle._Unpickler(buff)
    #    u.encoding = 'latin1'
        data= pickle.load(buff,encoding = 'latin')          #load()
    ericsson_model, trace = data['model'], data['trace']

Pickling is usually not very safe - I suggest you to save the trace and model separately, see eg: Saving and Loading GP model in PYMC3

Hi @junpenglao,

So I can simply use pm.save_trace and save in a pickle file. Or how do I do it?
I am running my code on a remote server. I am not able to understand how I can get back trace results so that i can analyse on my computer. And even if I do it, will pm.load_trace work without the context.

Help much needed.

Thanks

I also tried the way it was discussed here:Saving ADVI results and reloading . But when i try to use the saved paramters in approx and apply it back to get inference it takes a lot of time (I had to stop the process because my computer was stuck) because of large dataset size.

In the discussion you have mentioned that the model needs to be same, but when I try to use the same model and then run it, it just cant handle it. Then how can i possibly use that approx object to get inferences on my local.

I suggest you to try with a smaller example first. And no you need a model context to load the trace to you need to initialize the model again locally on your machine.

I find that surprising, as you are only saving 2n values (n=number of parameters).

Yes you need to have the same model (same for when you are loading a trace). When you said your machine cant handle it - is it too many nodes to fit into the memory?

I dont get exactly about the number of nodes. My model is:

with model: 
theta = pm.Dirichlet("theta", a=alpha, shape=(D, K))
phi = pm.Dirichlet("phi", a=beta, shape=(K, V))
doc = pm.DensityDist('docs', log_lda(theta,phi), observed=LDA_output.T)

D = 283000
K =150
V = 7500
LDA_output.T = (9944576, 3)

In Memory error it d> oesnt give me anything related to number of nodes.

Traceback (most recent call last):

File “”, line 15, in
doc = pm.DensityDist(‘docs’, log_lda(theta,phi), observed=LDA_output.T)

File “C:\Users\Anil\Anaconda3\lib\site-packages\pymc3\distributions\distribution.py”, line 37, in new
return model.Var(name, dist, data, total_size)

File “C:\Users\Anil\Anaconda3\lib\site-packages\pymc3\model.py”, line 832, in Var
total_size=total_size, model=self)

File “C:\Users\Anil\Anaconda3\lib\site-packages\pymc3\model.py”, line 1288, in init
self.logp_elemwiset = distribution.logp(data)

File “”, line 16, in ll_lda
ll = value[:, 2] *pm.math.logsumexp(np.log(theta[value[:, 0].astype(‘int64’)]) + np.log(phi.T[value[:,1].astype(‘int64’)]), axis = 1).ravel()

File “C:\Users\Anil\Anaconda3\lib\site-packages\theano\tensor\var.py”, line 570, in getitem
return self.take(args[axis], axis)

File “C:\Users\Anil\Anaconda3\lib\site-packages\theano\tensor\var.py”, line 614, in take
return theano.tensor.subtensor.take(self, indices, axis, mode)

File “C:\Users\Anil\Anaconda3\lib\site-packages\theano\tensor\subtensor.py”, line 2431, in take
return advanced_subtensor1(a, indices)

File “C:\Users\Anil\Anaconda3\lib\site-packages\theano\gof\op.py”, line 674, in call
required = thunk()

File “C:\Users\Anil\Anaconda3\lib\site-packages\theano\gof\op.py”, line 862, in rval
thunk()

File “C:\Users\Anil\Anaconda3\lib\site-packages\theano\gof\cc.py”, line 1735, in call
reraise(exc_type, exc_value, exc_trace)

File “C:\Users\Anil\Anaconda3\lib\site-packages\six.py”, line 693, in reraise
raise value

MemoryError: None

I think this is reaching to a limit what PyMC3 can handle… did you try with smaller number of D K and V?