 Examples of random() in custom distribution?

Hi, are there any examples of how to implement the random() function needed to support sample_ppc for a custom distribution? It’s briefly mentioned here, but I’m not sure where to start on this.
Thx!

We dont have a good example, but have a look at the test case for some inspiration:

I am trying to implement the random method for DensityDist, but I do not understand how the function should be implemented. I would like to get a template to follow.

For example, for the logp function this information (from this post) is very useful (we should include it in the documentation):

# Log likelihood function
def logp_function(parameters):
def logp(observed):
_logp = some_function(parameters, observed)
return _logp

return logp # returns a function

Is there anything similar for random?
The example you provided in the post above uses normal_dist.random. The random method is:

def random(self, point=None, size=None):
"""
Draw random values from Normal distribution.
Parameters
----------
point : dict, optional
Dict of variable values on which random values are to be
conditioned (uses default point if not specified).
size : int, optional
Desired size of random sample (returns one sample if not
specified).
Returns
-------
array
"""
mu, tau, _ = draw_values([self.mu, self.tau, self.sd],
point=point, size=size)
return generate_samples(stats.norm.rvs, loc=mu, scale=tau**-0.5,
dist_shape=self.shape,
size=size)

but I have difficulties to create a standalone function from the method in the class. For example, how do I pass the parameters for my function? Do I need to use draw_values and generate_samples? If so, how to pass the parameter point and shape?

1 Like

You do need draw_values and generate_samples: draw_values fetch a numpy value from theano tensor, and generate_sample is a function wrapper that turns an “elementary” random generator to a one outputs high-dimensional arrayes
Maybe you can have a look at my PR for adding random method to LKJCorr:

I tried to implement the method for a normal distribution (for sake of example), but I get the error 'numpy.ndarray' object is not callable. I created a notebook here. Briefly, my random method is:

def normal_rng(mu, sd, point=None, size=None):
# draw a numerical value for the parameters
mu_, sd_ = draw_values([mu, sd], point=point)

size = 1 if size is None else size

return generate_samples(scipy.stats.norm.rvs, loc=mu_, scale=sd_, size=size, broadcast_shape=(size,))

and it is called in DensityDist as:

likelihood = pm.DensityDist('likelihood', normal_logp(mu, sd), observed=y, random=normal_rng(mu, sd))

Still has shape problem but this is starting to run:

with pm.Model() as model:

mu = pm.Normal('mu', mu=0, sd=10)
sd = pm.HalfNormal('sd', sd=10)

def normal_rng(point=None, size=None):
# draw a numerical value for the parameters
mu_, sd_ = draw_values([mu, sd], point=point)
size = 1 if size is None else size
return generate_samples(scipy.stats.norm.rvs, loc=mu_, scale=sd_, size=size, broadcast_shape=(size,))

likelihood = pm.DensityDist('likelihood', normal_logp(mu, sd), observed=y, random=normal_rng)
trace = pm.sample()
ppc = pm.sample_posterior_predictive(trace)
1 Like

Strange it does not work if the function is outside the context. Do you think it is some sort of bug?
I implemented the function as you suggested and it seems that works (see updated notebook). Note that I removed broadcast_shape= because it seems it does not change the results. What do you mean with

If I run ppc = pm.sample_posterior_predictive(trace, samples=250, size=100) I get an (250x100) array, which I think is correct.

I feel like there should be a way to write the function outside of the context, but I havent figure out yet

Actually, you should get a (250x100) array by doing just ppc = pm.sample_posterior_predictive(trace, samples=250), as the size should be inferred from the shape of the observed - that’s how RV defined using regular distribution in pymc3 does.

Let’s see if I understand. If I run ppc = pm.sample_posterior_predictive(trace), then I get a (trace_length, 1) array. Instead I should get a (trace_length, observed_size) array?

EDIT: I feel like we need to declare dist_shape somehow (which is always defined in the class method). A dirty hack would be generate_samples(scipy.stats.norm.rvs, loc=mu_, scale=sd_, size=size, dist_shape=observed.shape)

Yes.

I changed this behavior in https://github.com/pymc-devs/pymc3/pull/3053/. I think it makes more sense, see discussion here: Sample_ppc shape

Do you think I should open a Github issue to track down the problem here?
Also, I cannot use more than 1 core with DensityDist. Given the model:

with pm.Model() as model:

mu = pm.Normal('mu', mu=0, sd=10)
sd = pm.HalfNormal('sd', sd=10)

def normal_logp(value):
return (1/2)*(-((value - mu)**2/sd**2) - tt.log(2*np.pi) - 2*tt.log(sd))

def normal_rng(point=None, size=None):
# draw a numerical value for the parameters
mu_, sd_ = draw_values([mu, sd], point=point)
print(shape)
size = 1 if size is None else size

return generate_samples(scipy.stats.norm.rvs, loc=mu_, scale=sd_, size=size, dist_shape=y.shape)

likelihood = pm.DensityDist('likelihood', normal_logp, observed=y, random=normal_rng)

if I use cores>1, then I get a broken pipe error (I am using Win10, and I do not know if the problem exists with Unix also):

Auto-assigning NUTS sampler...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [sd, mu]
---------------------------------------------------------------------------
BrokenPipeError                           Traceback (most recent call last)
<ipython-input-27-d9190de005b7> in <module>()
2 CORES = 2
3 with model:
----> 4     trace = pm.sample(draws=1000, tune=1000, chains=CHAINS, cores=CORES, random_seed=[23+ i for i in np.arange(CHAINS)])

C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\pymc3\sampling.py in sample(draws, step, init, n_init, start, trace, chain_idx, chains, cores, tune, nuts_kwargs, step_kwargs, progressbar, model, random_seed, live_plot, discard_tuned_samples, live_plot_kwargs, compute_convergence_checks, use_mmap, **kwargs)
438             _print_step_hierarchy(step)
439             try:
--> 440                 trace = _mp_sample(**sample_args)
441             except pickle.PickleError:
442                 _log.warning("Could not pickle model, sampling singlethreaded.")

C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\pymc3\sampling.py in _mp_sample(draws, tune, step, chains, cores, chain, random_seed, start, progressbar, trace, model, use_mmap, **kwargs)
985         sampler = ps.ParallelSampler(
986             draws, tune, chains, cores, random_seed, start, step,
--> 987             chain, progressbar)
988         try:
989             with sampler:

C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\pymc3\parallel_sampling.py in __init__(self, draws, tune, chains, cores, seeds, start_points, step_method, start_chain_num, progressbar)
274                            chain + start_chain_num, seed, start)
--> 275             for chain, seed, start in zip(range(chains), seeds, start_points)
276         ]
277

C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\pymc3\parallel_sampling.py in <listcomp>(.0)
274                            chain + start_chain_num, seed, start)
--> 275             for chain, seed, start in zip(range(chains), seeds, start_points)
276         ]
277

C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\pymc3\parallel_sampling.py in __init__(self, draws, tune, step_method, chain, seed, start)
180             draws, tune, seed)
181         # We fork right away, so that the main process can start tqdm threads
--> 182         self._process.start()
183
184     @property

C:\Miniconda3\envs\intro_to_pymc3\lib\multiprocessing\process.py in start(self)
103                'daemonic processes are not allowed to have children'
104         _cleanup()
--> 105         self._popen = self._Popen(self)
106         self._sentinel = self._popen.sentinel
107         # Avoid a refcycle if the target function holds an indirect

C:\Miniconda3\envs\intro_to_pymc3\lib\multiprocessing\context.py in _Popen(process_obj)
221     @staticmethod
222     def _Popen(process_obj):
--> 223         return _default_context.get_context().Process._Popen(process_obj)
224
225 class DefaultContext(BaseContext):

C:\Miniconda3\envs\intro_to_pymc3\lib\multiprocessing\context.py in _Popen(process_obj)
320         def _Popen(process_obj):
321             from .popen_spawn_win32 import Popen
--> 322             return Popen(process_obj)
323
324     class SpawnContext(BaseContext):

C:\Miniconda3\envs\intro_to_pymc3\lib\multiprocessing\popen_spawn_win32.py in __init__(self, process_obj)
63             try:
64                 reduction.dump(prep_data, to_child)
---> 65                 reduction.dump(process_obj, to_child)
66             finally:
67                 set_spawning_popen(None)

C:\Miniconda3\envs\intro_to_pymc3\lib\multiprocessing\reduction.py in dump(obj, file, protocol)
58 def dump(obj, file, protocol=None):
59     '''Replacement for pickle.dump() using ForkingPickler.'''
---> 60     ForkingPickler(file, protocol).dump(obj)
61
62 #

BrokenPipeError: [Errno 32] Broken pipe

I think there is some kind of pickling issue (I thought this is not a problem any more but apparently it is at least under Windows). Try declaring the logp function outside of the model context:

def normal_logp(value):
return (1/2)*(-((value - mu)**2/sd**2) - tt.log(2*np.pi) - 2*tt.log(sd))
with pm.Model() as ...

Feel free to open an issue on Github - I think the random method would need some more work to make it more user friendly (documentation, example, etc)