Inconsistency in variables being treated as tensor or numpy array

Hi Pymc Folks,
I’m working on simulating audio where multiple bats may be emitting sounds - and trying to see which subset of possible scenarios matches my observed audio data best (e.g. with 3 individuals A,B,C we can have 7 scenarios/combinations where at least one of them emits a sound) . I’m doing this using ABC as I need to understand the match to my audio data itself.

My simulation function takes two parameters per draw - an integer which decides the scenario to be simulated. This integer is used to subset a binary ‘scenario array’. The other input parameter is a random draw of potential positions from where sound was emitted.

Before I even began to write the audio simulation portion - I ran into inconsistencies on how the drawn integer was treated - it seems like pymc is treating it like an np.ndarray when I print its type(), but throws an IndexError when I try to use the integer to subset the scenario array.

I understand from looking around the docs and here that pytensor and numpy objects don’t mix well. Does anyone have any ideas on 1) why this odd type inconsistency occurs (the output type is numpy.ndarray - but it is not really this kind of object?), and 2) if there is anyway to convert a pytensor into a numpy array so that I can perform all operations within the simulation only with numpy objects.

System specs: Windows 10, Python 3.11, Pymc 5.5.0. Same error also seen in Python 3.11, Pymc 5.5.0 and Ubuntu 20.04 too.

# -*- coding: utf-8 -*-
"""
Created on Tue Jul  4 18:22:40 2023

@author: theja
"""
from itertools import combinations
import numpy as np 
import pymc as pm 
import os 
os.environ['PYTHONIOENCODING']='utf-8'    

def generate_all_possible_scenarios(nbats):
    '''
    Generates all possible 2^nbats - 1  possible combinations where
    at least 1 bat is calling
    '''
    
    all_scenarios = []
    scenario_number = 0
    for subset in range(1, nbats+1):
        subscenario_combis = combinations(range(1,nbats+1),subset)
        
        for each in subscenario_combis:
            binary_scenario = np.zeros(nbats)
            batinds = np.array((each))-1
            binary_scenario[batinds] = 1 
            all_scenarios.append(binary_scenario )
            scenario_number += 1 
    all_scenarios = np.array(all_scenarios)
    return all_scenarios

nbats = 5
# generate various possible scenarios individual called (1=call, 0=silent)
allscenarios_asnp = generate_all_possible_scenarios(nbats)
# set probability of all scenarios to be the same 
num_scenarios = allscenarios_asnp.shape[0]
p_categories = np.tile(1/num_scenarios, num_scenarios)
p_categories = p_categories/p_categories.sum() # 're' normalise everything to be sure it adds to 1


# x,y,z ranges of possible emission points for each individual per row
# ordered as xmin, xmax, ymin, ymax, zmin, zmax
xyz_ranges = np.ones((nbats,6))
for batnum in range(nbats):
    xyz_ranges[batnum,:] = batnum + np.linspace(0,1,6)


def generate_fake_cc(rng, scenario_number, xyz_emissions, size=None):
    '''
    An incomplete but functional example to show issues with subsetting 

    Parameters
    ----------
    rng : 
    scenario_number : int? or pytensor?
        The row number to choose from the binary scenario matrix
    xyz_emissions : np.array? or pytensor?
        The set of drawn potential emission points
    size : 
        
    '''
    
    print(type(xyz_emissions), xyz_emissions.shape,
          type(scenario_number), scenario_number)
    
    # PROBLEM POINT when uncommented
    # calling_bats = allscenarios_asnp[scenario_number,:]
    
    

    
    # dummy output just to get the function to run
    return np.array([5, 2]).reshape(-1,2)
if __name__ == "__main__":
    with pm.Model() as mo:
        # choose which scenario is being tested
        current_scenario = pm.Categorical('current_scenario',
                                              p_categories,
                                              shape=(1,))
        # generate the hypothesised x,y,z for the emitting bats
        x_hyp = pm.Uniform('x_hyp', lower=xyz_ranges[:,0],
                              upper=xyz_ranges[:,1], shape=(nbats,))
        y_hyp = pm.Uniform('y_hyp', lower=xyz_ranges[:,2],
                              upper=xyz_ranges[:,3],shape=(nbats,))
        z_hyp = pm.Uniform('z_hyp', lower=xyz_ranges[:,4],
                              upper=xyz_ranges[:,5],shape=(nbats,))
        xyz_stack = pm.math.stack([x_hyp, y_hyp, z_hyp], axis=1)
        print('xyz_stack:',xyz_stack.shape)
        # #scenario_data[1] = xyz_stack
        # # recreate the simulated audio and cross-correlations
        sim = pm.Simulator('sim', generate_fake_cc, params=(current_scenario, xyz_stack),
                                                   observed=np.array([3,3]).reshape(-1,2),
                                                   )
    with mo:
        idata = pm.sample_smc(draws=100, chains=2)
        idata.extend(pm.sample_posterior_predictive(idata))

I believe I may also be having a similar issue in my case where i use a custom numpy function to impact my value of p for a multinomial function, but I am a novice so bear with me!

So for the audio simulation component of how that drawn integer is being treated, have you come across this page in the example gallery: Using a “black box” likelihood function (numpy) — PyMC example gallery
In the page, it kind of goes over converting a numpy function into a Pytensor Op. Maybe it will be insightful?

But to answer your specific questions, (1) the answer could be that Pytensor’s API tries to mirror numpy’s so it could be a method thing. For (2) in the framework of PyMC i think you should check out that link. I also believe that you can use eval() (e.g. x.eval() where x is a pytensor variable) works and that is what i tried but i don’t know if that works haha!

1 Like

Thanks a lot for the potential directions to take - I played around quite a bit today and yesterday - but failed to get anything working.
The .eval() option - didn’t work and the same old ValueError: setting an array element with a sequence. appears :expressionless:

Using the pt.Op as suggested in the black-box post does seem like it could work, but I couldn’t quite wrap my ehad around it as the pm.Simulator class needs the input-function to have rng, arg1,arg2...size=None), and this is where I'm stuck. The pt.Opapproach allows me to write the function, but I can't quite define what thenp.random.Generator``` type is in pytensor terms…

Glad I could help, unfortunately I am also still in the stage of wrapping my head around writing as a pytensor op. Bayesian regression is always such an easy case that is used in examples but science work is never that simple haha!

A quick google search led me to come across this: utils – Friendly random numbers — PyTensor dev documentation where it says RandomStream is a symbolic stand-in for np.random.generator and has a few links on the page about using random numbers. So if you haven’t seen that then perhaps give it a glance but outside of that I can’t really give any further advice since I am also still in the camp of trying to figure out pytensor!

Best of luck though, hopefully you can crack it (or some other good samaritan on the forum comes along to help :joy: )

I suggest trying to create a new discourse thread with more information about what you want to do and what you’ve tried. The messages here are to vague to be able to provide any directions.

Usually you will not need a custom Op or a blackbox model. If you’re a beginner that’s probably even more so.

Thanks for taking an interest. Will do soon.