Seeking Advice on my Model

Hello and thank you for reading this. I’m looking for some general advice and some advice regarding my specific model. Any help you could provide would be very appreciated!

Assumptions:

  • 2 types of Lego kits
    • We know brick make-up of each kit
    • We know total # of brick in each kit
  • 8 total Lego kits
    • *Roughly* 50/50 (i.e. rougly 4 of each type of kit)
  • Kits each emptied on long & narrow table
    • Bricks are roughly uniformally distributed within each kits’ area
    • Pieces from separate kits can touch but do not mix
    • Essentially a long/narrow rectangle of Legos
    • Each kit has the same depth on table
    • Each kit has *roughly* the same area on table
    • Just call start position (x-axis) 0.0 and end-position (1.0)
  • Eventually, we’ll deal with:
    • Determining the # of kits
    • Dealing with missing bricks
    • Dealing with depth (y-position) of bricks

Goal:

  • Put bricks back in correct packages
    • Determine breakpoints between (or centroids) of each kit’s pieces
    • Determine which bricks belong to the same kit
    • Determine which type of kit each set of blocks belongs to

And here’s my model:


def increment_value_at_position(indexx, indexy, dimx, dimy):
  zeros = pt.tensor.zeros((dimx, dimy))
  zeros_subtensor = zeros[indexx, indexy]
  return pt.tensor.set_subtensor(zeros_subtensor, 1)

__types_of_kits = 2
__types_of_bricks = 30
__kits = 6
__total_bricks = 600

# the types have been encoded to be integers 0..__types_of_bricks - 1
brick_types = pt.tensor.as_tensor(data.types)
# grab just the width or x-values
brick_locations = pt.tensor.as_tensor(data.locations[:. 0])

coords = {
  "kits": range(__kits),
  "brick_types": range(__types_of_bricks),
  "kit_types": range(__types_of_kits),
  "observations": range(__total_bricks)
}

# kits has shape (__types_of_kits, __types_of_bricks) - a probability dist. of bricks for each kit type
_kits = kits + 1.0e-6
_kits = (_kits.T / _kits.sum(axis=1)).T
assert _kits.shape == (__types_of_kits, __types_of_blocks)
assert all(_kits_.sum(axis=1) == [1., 1.])

# brick breakdown for each kit
phi = pt.tensor.as_tensor(_kits)

# each kit type has a different number of bricks
bricks_per_kit = pt.tensor.as_tensor([30.0, 70.0])

with pm.Model(coords=coords) as model:
  # mixture mixing weights
  w_kits = pm.Dirichlet("w_kits", a=5.0 * np.ones(__kits), dims="kits")
  # mixture centroids
  mu_kits = pm.Normal("mu_kits", mu=np.linspace(0.0, 1.0, __kits), sigma=1.0, dims="kits")

  # the spatial mixutre
  mix = pm.NormalMixture("mix", w=w_kits, mu=mu_kits, sigma=20.0, observed=brick_locations, dims="total_bricks")
  # purely spatial assignment of bricks to kits
  # z_centroid : observation index -> kit index
  # for now, assign brick to kit with closest centroid
  z_kit = pm.Deterministic("z_kit", pm.math.abs(pm.math.abs(brick_locations[:, None] - mu_kits)).argmin(axis=1), dims="total_bricks")

  # here, we have assignments of each brick to a kit (z_kit) and the type of each brick (brick_types)
  # we scan over these indices and sum to create brick counts for our kit assignments
  results, updates = pt.scan(
    fn=increment_value_at_position,
    outputs_info=None,
    sequences=[z_kit, brick_types],
    non_sequences=[__kits, __types_of_bricks],
  )
  counts_from_spatial = pm.Deterministic("counts_from_spatial", results.sum(0))

  # assign type of kit to each kit
  w_kit_types = pm.Dirichlet("w_kit_types", a=np.ones(__templates), dims="kit_types")
  z_kit_types = pm.Categorical("z_kit_types", p=w_templates, dims="kits")

  # here, we create a different count based on the type of kit and the bricks we expect to see given the type of the kit
  counts_from_kits = pm.Deterministic("counts_from_kits", phi[z_kit_types] * __bricks_per_kit[z_kit_types[:, None]])

  # this probably dumb, but I didn't know how to minimize the difference between these two counts
  # so I subtract and state the difference should be normal with 0 mean
  # this is probably stupid, but I couldn't think of of how to combine the types and locations
  blah = pm.Normal("blah", mu=counts_from_spatial - counts_from_kits, sigma=5.0, observed=np.zeros((__kits, __types_of_bricks)))

  trace = pm.sample(draws=5000, tune=500, target_accept=0.99)

My data are just a list of x,y values along with the type of each block (so three values per block).

I am please asking for advice on how to better assemble this model. I sorta abandon the generative idea halfway though the model, and that’s probably dumb. And the model trains very slowly. If anyone could point out the dumb things I’m doing and/or provide suggestions I’d be very appreciative.

First impression: your model seems to be essentially a discrete / combinatorial model, those abs in the deterministic won’t behave well with NUTS, and most samplers may struggle with it.

Could you show how one would simulate data from the generative model?

Minor point, you don’t need the scan for what you’re doing. You can use advanced indexing to obtain the same in a vectorized / more efficient way:

import pytensor
import pytensor.tensor as pt
import numpy as np

# Just draw some values and wrap as tensor constants
brick_types = pt.as_tensor(pt.random.categorical(p=pt.ones(30)/30, size=(600,)).eval())
z_kit = pt.as_tensor(pt.random.categorical(p=pt.ones(6)/6, size=(600,)).eval())


def increment_value_at_position(indexx, indexy):
  return pt.zeros((6, 30))[indexx, indexy].set(1)

res_scan, _ = pytensor.scan(
    fn=increment_value_at_position,
    outputs_info=None,
    sequences=[z_kit, brick_types],
  )

res_adv_subtensor = pt.zeros((600, 6, 30))[np.arange(600), z_kit, brick_types].set(1)

np.testing.assert_array_equal(
    res_scan.sum(0).eval(),
    res_adv_subtensor.sum(0).eval(),
)
1 Like

Thank you very much for the response!

I’m sorry if I’m talking non-sense. My knowledge in this area is poor, though I’m reading Statistical Rethinking by McElreath to try and be able to actually know what I’m talking about!

To sample a brick, we need to know in which kit it’s located and the type of the kit. Then we can sample the location (given kit from the mixture of normals) and the type of brick (given the type of kit). So I imagine sampling a single brick should involve:

  1. Sample a kit index (0..8)
  2. Given the index, sample the location from a normal (in a list of normals?)
  3. Given the index, sample the kit type
  4. Given the kit type and the brick breakdowns of each kit type, sample a brick type

We can use a Dirichlet for 1. to capture that different kits have different #s of bricks. We know that that the normals are roughly evenly spaced along the table (and each kit covers about the same area), we can capture their locations with a relatively flat Dirichlet. We know the global kit types are about 50/50, but I am not sure how to sample the kit type given the index. Finally, once we have the kit type, we’ve already defined the brick breakdown of each kit type, so just pick one. Does this make sense?

This is basically what I’m trying to do. I think one (of the many things!) confusing me is how to deal with data that are both continuous and discrete (the locations and brick types). Is this enough detail on how I envision generating a brick location/type?

Some numpy code generating data according to your model would be much more precise :slight_smile: I can try to think about it in the meantime

1 Like

Again, thank you so much for look helping me! Does this help?

import numpy as np
import matplotlib.pyplot as plt

##
# given parameters
##

kit_freqs = np.array([0.5, 0.5]) # expected dist of kit types
kit_type_counts = np.array([ # count of bricks in each kit type
    [1, 2, 3, 1, 3, 0, 0, 0, 0, 0],
    [0, 0, 2, 4, 0, 1, 2, 5, 4, 2],
])
num_kit_types, num_brick_types = kit_type_counts.shape
kit_sizes = kit_type_counts.sum(1) # total # bricks in each kit type
kit_type_freqs = kit_type_counts / kit_sizes[:, np.newaxis] # counts as a prob. dist.
num_kits = 8
num_observations = 120

##
# learned parameters
##

z_kits = np.array([0, 0, 1, 1, 0, 1, 1, 0]) # kit type for each kit
# prob. of random brick coming from each kit scaled by type of each kit and # of bricks in each kit type
index_weights = np.array([float(kit_sizes[z]) for z in z_kits])
index_weights /= index_weights.sum()
# mu and sigma of each kit's spatial distribution
kit_centroids = np.linspace(0, 1, num_kits + 2)[1:-1] + np.random.normal(0.0, 0.03, num_kits)
kit_sigmas = np.random.gamma(shape=1.0, scale=0.05, size=num_kits)

##
# sampling procedure
##

locs, types = np.empty(num_observations, dtype=float), np.empty(num_observations, dtype=int)
for obs in range(num_observations):
    kit_idx = np.random.choice(range(num_kits), p=index_weights)
    kit_type = z_kits[kit_idx]

    loc = np.random.normal(kit_centroids[kit_idx], kit_sigmas[kit_idx])
    locs[obs] = loc

    type_ = np.random.choice(range(num_brick_types), p=kit_type_freqs[kit_type])
    types[obs] = type_
    
plt.scatter(x=locs, y=np.ones_like(locs), alpha=0.3, c=types)
plt.show()

This isn’t physical, as you can’t sample 15 bricks from a kit that only has 10 bricks (i.e. we need to sample without replacement), but that’s for another day.