# Bayesian modelling of two sequential tests

Hi!

I am fairly new to Bayes.
But I have a challenging dataset that looks perfect for Bayesian statistics.
I have two medical tests, that are performed in sequential order. Test A gives a certain probability of a binary outcome, and then Test B provides better insight into assumptions of Test A. Test B returns an ordinary/multiclass categorical variable as result.

Ground truth was then verified by Test C, only available post-mortem.

Therefore the goal is to model two probability distributions:

1. Probability after Test A, in case a physician doesn’t have access to Test B and can’t perform it on the patient immediately.
2. Updated probability after Test B.

Schematic to simplify my reasoning! Would be grateful for any tips, references on how to approach it. It was challenging to find anything similar in the literature.

Cheers!

In this case it helps to go back to the basics of probability. Let `X` be the medical disease variable of interest and `A`, `B` the outcome of the two medical tests.

`P(X|A,B) = P(A,B|X)P(X) = P(B|A,X) P(A|X)P(X)`

Where `=` means proportional to (ignoring the normalization constant `P(A,B)`)

This tells you you need 3 pieces for your analysis:

1. Prior probability of medical disease `X`, `P(X)`
2. Conditional probability of `A` test results given all possible `X`, `P(A|X)`
3. Conditional probability of `B` test results given all possible combinations of `A` and `X`, `P(B|A,X)`

Only then can you compute an answer for a given set of observations.

The analysis would be simpler if you could assume that `P(B|A,X)` = `P(B|X)` but that’s not generally the case in medical testing.

If you only have `A`, you compute `P(X|A) = P(A|X)P(X)`, again ignoring the normalization constant `P(A)`

2 Likes

I do understand it in theory. however, I have no idea how to grasp it in pymc3 or if it’s possible at all. Maybe you’ve stumbled upon a similar analysis?

You need to parametrize those prior and conditional distributions (that is find a distribution and parameters that convey the prior and test information in a mathematical format). Assuming X is binary it might look something like:

``````with pm.Model() as model:
X = pm.Beta("X", alpha=1, beta=10)  # 10% prior probability of X = 1
testA = pm.Bernoulli(p=fA(X), observed=testA_result)
testB = pm.Categorical(p=fB(X, testA), observed=testB_result))
``````

You need to find out what the right prior information is for X, and what `fA`, and `fB` should be, those are the functions that give you the probability of the outcomes given the variable of interest `X` and the observed results `testA` (in the case of `testB`). That is something you have to figure out.

For example, perhaps the probability of `testA` being positive is well enough described by a logistic transformation of `X`, so:

``````def fA(x):
return pm.math.sigmoid(x)
``````

You also need to find how `fB` should convert `A`, and `X` into categorical probabilities.

Once you have that, you just have to call:

``````with model:
posterior = pm.sample()
``````

Which will give you the posterior probability of `X`, given your test results.

The hard work is in those `fA` and `fB` which have to reflect the information that the tests `A` and `B` give you in a mathematical format. That’s the modelling task, which can’t be automated. I suggest you try to start with just `X` and `testA`, and once you understand that you can add `testB`.

2 Likes

Interesting problem. Just in case it’s not clear, `X` mentioned by @ricardoV94 corresponds to the ground truth of presence `X=1` or absence `X=0` as established by test C.

Is the sequential nature of test A and B a red herring? For example, if test B is always conducted regardless of the outcome of test A, then it might be conceptually clearer to think about it as a regression style model. Especially because the result of test A will presumably not affect the result of test B.

Goal 1 can be approached by using the results of test A to predict the ground truth.
Goal 2 can be approached by using the results of test A and B to predict the ground truth.

Not sure if that helps, but happy if you want to expand on describing your problem in more detail.