Hi, are there any examples of how to implement the random()
function needed to support sample_ppc
for a custom distribution? It’s briefly mentioned here, but I’m not sure where to start on this.
Thx!
We dont have a good example, but have a look at the test case for some inspiration:
I am trying to implement the random method for DensityDist
, but I do not understand how the function should be implemented. I would like to get a template to follow.
For example, for the logp
function this information (from this post) is very useful (we should include it in the documentation):
# Log likelihood function
def logp_function(parameters):
def logp(observed):
_logp = some_function(parameters, observed)
return _logp
return logp # returns a function
Is there anything similar for random
?
The example you provided in the post above uses normal_dist.random
. The random
method is:
def random(self, point=None, size=None):
"""
Draw random values from Normal distribution.
Parameters
----------
point : dict, optional
Dict of variable values on which random values are to be
conditioned (uses default point if not specified).
size : int, optional
Desired size of random sample (returns one sample if not
specified).
Returns
-------
array
"""
mu, tau, _ = draw_values([self.mu, self.tau, self.sd],
point=point, size=size)
return generate_samples(stats.norm.rvs, loc=mu, scale=tau**-0.5,
dist_shape=self.shape,
size=size)
but I have difficulties to create a standalone function from the method in the class. For example, how do I pass the parameters for my function? Do I need to use draw_values
and generate_samples
? If so, how to pass the parameter point
and shape
?
You do need draw_values
and generate_samples
: draw_values
fetch a numpy value from theano tensor, and generate_sample is a function wrapper that turns an “elementary” random generator to a one outputs high-dimensional arrayes
Maybe you can have a look at my PR for adding random method to LKJCorr:
I tried to implement the method for a normal distribution (for sake of example), but I get the error 'numpy.ndarray' object is not callable
. I created a notebook here. Briefly, my random
method is:
def normal_rng(mu, sd, point=None, size=None):
# draw a numerical value for the parameters
mu_, sd_ = draw_values([mu, sd], point=point)
size = 1 if size is None else size
return generate_samples(scipy.stats.norm.rvs, loc=mu_, scale=sd_, size=size, broadcast_shape=(size,))
and it is called in DensityDist
as:
likelihood = pm.DensityDist('likelihood', normal_logp(mu, sd), observed=y, random=normal_rng(mu, sd))
Still has shape problem but this is starting to run:
with pm.Model() as model:
mu = pm.Normal('mu', mu=0, sd=10)
sd = pm.HalfNormal('sd', sd=10)
def normal_rng(point=None, size=None):
# draw a numerical value for the parameters
mu_, sd_ = draw_values([mu, sd], point=point)
size = 1 if size is None else size
return generate_samples(scipy.stats.norm.rvs, loc=mu_, scale=sd_, size=size, broadcast_shape=(size,))
likelihood = pm.DensityDist('likelihood', normal_logp(mu, sd), observed=y, random=normal_rng)
trace = pm.sample()
ppc = pm.sample_posterior_predictive(trace)
Strange it does not work if the function is outside the context. Do you think it is some sort of bug?
I implemented the function as you suggested and it seems that works (see updated notebook). Note that I removed broadcast_shape=
because it seems it does not change the results. What do you mean with
If I run ppc = pm.sample_posterior_predictive(trace, samples=250, size=100)
I get an (250x100) array, which I think is correct.
I feel like there should be a way to write the function outside of the context, but I havent figure out yet
Actually, you should get a (250x100) array by doing just ppc = pm.sample_posterior_predictive(trace, samples=250)
, as the size
should be inferred from the shape of the observed - that’s how RV defined using regular distribution in pymc3 does.
Let’s see if I understand. If I run ppc = pm.sample_posterior_predictive(trace)
, then I get a (trace_length, 1) array. Instead I should get a (trace_length, observed_size) array?
EDIT: I feel like we need to declare dist_shape
somehow (which is always defined in the class method). A dirty hack would be generate_samples(scipy.stats.norm.rvs, loc=mu_, scale=sd_, size=size, dist_shape=observed.shape)
Yes.
I changed this behavior in https://github.com/pymc-devs/pymc3/pull/3053/. I think it makes more sense, see discussion here: Sample_ppc shape
Do you think I should open a Github issue to track down the problem here?
Also, I cannot use more than 1 core with DensityDist
. Given the model:
with pm.Model() as model:
mu = pm.Normal('mu', mu=0, sd=10)
sd = pm.HalfNormal('sd', sd=10)
def normal_logp(value):
return (1/2)*(-((value - mu)**2/sd**2) - tt.log(2*np.pi) - 2*tt.log(sd))
def normal_rng(point=None, size=None):
# draw a numerical value for the parameters
mu_, sd_ = draw_values([mu, sd], point=point)
print(shape)
size = 1 if size is None else size
return generate_samples(scipy.stats.norm.rvs, loc=mu_, scale=sd_, size=size, dist_shape=y.shape)
likelihood = pm.DensityDist('likelihood', normal_logp, observed=y, random=normal_rng)
if I use cores
>1, then I get a broken pipe
error (I am using Win10, and I do not know if the problem exists with Unix also):
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [sd, mu]
---------------------------------------------------------------------------
BrokenPipeError Traceback (most recent call last)
<ipython-input-27-d9190de005b7> in <module>()
2 CORES = 2
3 with model:
----> 4 trace = pm.sample(draws=1000, tune=1000, chains=CHAINS, cores=CORES, random_seed=[23+ i for i in np.arange(CHAINS)])
C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\pymc3\sampling.py in sample(draws, step, init, n_init, start, trace, chain_idx, chains, cores, tune, nuts_kwargs, step_kwargs, progressbar, model, random_seed, live_plot, discard_tuned_samples, live_plot_kwargs, compute_convergence_checks, use_mmap, **kwargs)
438 _print_step_hierarchy(step)
439 try:
--> 440 trace = _mp_sample(**sample_args)
441 except pickle.PickleError:
442 _log.warning("Could not pickle model, sampling singlethreaded.")
C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\pymc3\sampling.py in _mp_sample(draws, tune, step, chains, cores, chain, random_seed, start, progressbar, trace, model, use_mmap, **kwargs)
985 sampler = ps.ParallelSampler(
986 draws, tune, chains, cores, random_seed, start, step,
--> 987 chain, progressbar)
988 try:
989 with sampler:
C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\pymc3\parallel_sampling.py in __init__(self, draws, tune, chains, cores, seeds, start_points, step_method, start_chain_num, progressbar)
273 ProcessAdapter(draws, tune, step_method,
274 chain + start_chain_num, seed, start)
--> 275 for chain, seed, start in zip(range(chains), seeds, start_points)
276 ]
277
C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\pymc3\parallel_sampling.py in <listcomp>(.0)
273 ProcessAdapter(draws, tune, step_method,
274 chain + start_chain_num, seed, start)
--> 275 for chain, seed, start in zip(range(chains), seeds, start_points)
276 ]
277
C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\pymc3\parallel_sampling.py in __init__(self, draws, tune, step_method, chain, seed, start)
180 draws, tune, seed)
181 # We fork right away, so that the main process can start tqdm threads
--> 182 self._process.start()
183
184 @property
C:\Miniconda3\envs\intro_to_pymc3\lib\multiprocessing\process.py in start(self)
103 'daemonic processes are not allowed to have children'
104 _cleanup()
--> 105 self._popen = self._Popen(self)
106 self._sentinel = self._popen.sentinel
107 # Avoid a refcycle if the target function holds an indirect
C:\Miniconda3\envs\intro_to_pymc3\lib\multiprocessing\context.py in _Popen(process_obj)
221 @staticmethod
222 def _Popen(process_obj):
--> 223 return _default_context.get_context().Process._Popen(process_obj)
224
225 class DefaultContext(BaseContext):
C:\Miniconda3\envs\intro_to_pymc3\lib\multiprocessing\context.py in _Popen(process_obj)
320 def _Popen(process_obj):
321 from .popen_spawn_win32 import Popen
--> 322 return Popen(process_obj)
323
324 class SpawnContext(BaseContext):
C:\Miniconda3\envs\intro_to_pymc3\lib\multiprocessing\popen_spawn_win32.py in __init__(self, process_obj)
63 try:
64 reduction.dump(prep_data, to_child)
---> 65 reduction.dump(process_obj, to_child)
66 finally:
67 set_spawning_popen(None)
C:\Miniconda3\envs\intro_to_pymc3\lib\multiprocessing\reduction.py in dump(obj, file, protocol)
58 def dump(obj, file, protocol=None):
59 '''Replacement for pickle.dump() using ForkingPickler.'''
---> 60 ForkingPickler(file, protocol).dump(obj)
61
62 #
BrokenPipeError: [Errno 32] Broken pipe
I think there is some kind of pickling issue (I thought this is not a problem any more but apparently it is at least under Windows). Try declaring the logp function outside of the model context:
def normal_logp(value):
return (1/2)*(-((value - mu)**2/sd**2) - tt.log(2*np.pi) - 2*tt.log(sd))
with pm.Model() as ...
Feel free to open an issue on Github - I think the random method would need some more work to make it more user friendly (documentation, example, etc)