How to slice an array of numbers during fitting

newerjazz · July 21, 2020, 2:00pm

Hi All,

Just joined and posting for the first time. I am trying to fit a model to figure out the concentration of drugs needed to kill some bacteria. The drug will only contribute to death of bacteria if its concentration is above a certain amount. I could not figure out how to pass this condition in the model.

In the model df[‘Conc’] is a distribution [array] of drug so each single value of kill_pred depends on the calculation of this array.

with pm.Model() as model_kill:
# Priors

active = pm.Normal('m', mu = 2, sd=2)
ϵ = pm.HalfCauchy('ϵ', 5)

μ = pm.Deterministic('μ', np.sum(df['Conc'][:active])) #get error here
# TypeError: cannot do slice indexing on <class 
# 'pandas.core.indexes.range.RangeIndex'> with these indexers [n] 
# of <class 'pymc3.model.FreeRV'>
# Likelihood
kill_pred = pm.Normal('kill_pred', mu=μ, sd=ϵ, observed=y )

Thanks in advance!

ckrapu · July 22, 2020, 2:51am

Would you be able to elaborate a little more on what the concentration dataframe looks like? Is there a reason why you are summing over some of its cells? Also, it seems like the quantity that you want to model is the concentration of bacteria. Is that correct?

newerjazz · July 22, 2020, 3:02pm

Hi ckrapu, thanks for asking.

df[‘Conc’] ~ a normal distribution of droplets that contain a drug (mu can be 1-5um and sd is 0.1~2um, depending on the drug).

We suspect that only the bigger droplets (say mu + n * sd) are killing the bacteria and the smaller droplets are doing nothing. That’s why we are summing over the bigger active droplets. We have the distributions of the drugs: drug 1 (mu=1um, sd=.2um), drug 2 (mu=3um, sd =1um) and number of bacteria killed kill1=10,000, kill2=20,000. And we are hoping to fir for the parameter n.

So it appears there is a need to splice. However I am guessing there is a cleverer way to do this.

Thanks in advance!
Cheers,
newerjazz

newerjazz · July 22, 2020, 4:19pm

Managed to get it going by passing data in pm.Data but still stumped on how to get more than one set of data into model. Right now the inputs are one column of df_np (which contains the droplet size distribution) and one number kill (which is number of bacteria killed). I would like to pass i columns of df_np for drug(i) and i numbers of kills too. How do I do that? Thanks!

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from theano import shared
import scipy.stats as stats
from scipy.stats import gamma, norm
import pymc3 as pm
import theano.tensor as tt
import arviz as az

Make up fake data

mu = 3
sd = .5
D = norm(dia_mu, dia_sd)
d = D.rvs(100)

df_np = pd.DataFrame({‘Dia’:d})
n = 2.7 # what I am trying to solve for
active = df_np[‘Dia’] > mu + n * sd
kill = np.sum(df_np[‘Dia’][active])

with pm.Model() as model_b:
#Priors
tau = pm.Normal(‘tau’, mu = 2, sd=2)
ϵ = pm.HalfCauchy(‘ϵ’, 5)

#Observed
df_tt = pm.Data('df_tt', df_np['Dia'])
act = df_tt > mu + tau * sd
# Likelihood
μ = pm.Deterministic('μ', pm.math.sum(df_tt[act]))
kill_pred = pm.Normal('kill_pred', mu=μ, sd=ϵ, observed=kill )

trace_b = pm.sample()

pm.traceplot(trace_b);

newerjazz · July 23, 2020, 9:17pm

I have now included working code for this problem. The problem is to splice some data within pymc3 or at least condition some data.

At the last line of code I get the error
Input dimension mis-match. (input[0].shape[1] = 10, input[1].shape[1] = 1)
How do I get conditioning within the model?

Thanks a lot in advance!

import numpy as np
from theano import shared
import pymc3 as pm
import theano.tensor as tt

a = np.arange(20.).reshape(2,10)
a_mu = a.mean(axis=1)
a_mu = a_mu.reshape(2,1)
a_sd = a.std(axis=1)
a_sd = a_sd.reshape(2,1)
act = a_ > a_mu + 3 * a_sd

a_ = shared(a)
a_mu_ = shared(a_mu)
a_sd_ = shared(a_sd)

with pm.Model() as model_b:
tau = pm.Normal(‘tau’, mu = 2, sd=1)
act_ = a_ > a_mu_ + tau * a_sd_

ckrapu · July 24, 2020, 4:08pm

Here’s my attempt to fix the issue in your second-to-last post. I wasn’t sure if you had resolved the indexing issue. I replaced the indexing with a boolean mask to zero out contributions of droplets which were too small. Here’s the gist. With regard to the conditioning, it appears that you were already conditioning on data correctly before (though the sampler wasn’t working too smoothly). How are you intending to condition differently?

newerjazz · July 24, 2020, 8:58pm

Hi ckrapu,

Thank you for the solution. It solves for the unknown tau easily.

I have updated your code to get 2 sets of droplet_sizes and two sets of kill results into the model here

gist.github.com

https://gist.github.com/newerjazz/c983b371294ed9052d0cbc76a17f2b16

conc-model

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from theano import shared
import scipy.stats as stats
from scipy.stats import gamma, norm
import pymc3 as pm
import theano.tensor as tt
import arviz as az

This file has been truncated. show original

For some reason the model now gets tau wrong even though the second set of data is simply a duplicate. Not sure what I am doing wrong.

Thanks a lot!

newerjazz · July 24, 2020, 9:32pm

after updating the model with
μ = pm.Deterministic(‘μ’, (droplet_sizes*active_).sum(axis=1))

everything seems to work.

Thanks, ckrapu!

ckrapu · July 24, 2020, 9:41pm

Okay - glad that it’s working. I am still a little confused as to how you’re aggregating. I think that I misunderstood whether you want to take the sum across the first axis (summing over droplets) or over the second axis (summing over observations) and I assumed you wanted the former.

Topic		Replies	Views
Boolean of a lists of arrays during modeling Questions	1	440	July 30, 2020
Dynamic Slicing of Array for multi-index hierarchical model v5 modeling , hierarchical , pytensor	4	103	June 21, 2024
Variable Lag and slicing version agnostic modeling	10	789	May 7, 2023
Use of indexes to select slices of variable tensors v5 pytensor	5	124	June 17, 2024
Help with Matrix-Based Inference (A toy problem) modeling	6	99	January 14, 2025

How to slice an array of numbers during fitting

Make up fake data

Related topics