Tutorial real work about simple and multiple linear regression bayesian

Dear everyone.
I am newer using Pymc3. I have a problem need everyone help me. My problem is I have a function F(x,y,z). the variable (x,y,z) has the distribution is Uniform and have the value corresponding to ([0.3;0.6],[0.9;1.1],[45;55]). The first, I keep the Y and Z value in the center position. I conducted experiments with the X value I employ random value of X with 10 samples. I have 10 values of function F.

example: I have x value [ 0.3,0.32,0.35,0.37,0.4,0.43,0.45,0.5,0.55,0.6] then I have corresponding values functions F [0.002,0.0025,0.0024,0.0026,0.0027,0.00278,0.0265,0.0281,0.0286,0.0295]

How to apply Bayesian to my problem?
My target is to find the Mean, variance of the Function value and predict the value of the function when I have the x value.

My future target, don’t keep the value of Y and Z. I find the Mean, variance of the function and predict the value of the function when I do not fix Y, Z which I fix value above.
everyone Who has an idea helps me, please

sincerely thanks

HoaiThanh

Since you are approximating a function, the best way to model it is using a Gaussian Process. Have a look at the related doc: https://docs.pymc.io/notebooks/GP-Marginal.html

Thank you to reply for my problem.
Do you think about If I use Multiple Linear Regression for my problem. Because My target is to find the mean and variance of output. And find the correlation coefficient between every coefficient and output value and coefficient together.
Thank you.

Oh, so you know function F and it is a linear function? In that case you can model it as a F ~ Normal(beta*x + Y + Z) and had Y and Z from some distribution.

I do not know if my function is linear or not. you can say more clear than about (F~Normal(beta*x+y+z)
thank you.

Well, if you are using mulitple linear regression, then you are making the assumption of a linear function. In that case, your function is actually F(x, y, z) = ax+by+c*z, otherwise, GP can approximate function F if you dont know what is the expression of F.

yes. Because My function is so complex. It is the Navier-Stockes function in the computation fluid dynamic (CFD). In the function have the 3 default coefficient. Maybe my function is F(x,y,z) =AX+BY+CZ. Thank you for your help to me. I will try study about Gaussian Process.

Hmm, this is going to be pretty complicated depending on how you are going to solve your diffusion model. You might need to do some pretty advance stuff like pde.ipynb · GitHub

1 Like

Hi you, This is detail my research.

I have the 3 default variables of a function. the Value of the variables correspond to Cb1 [lower:21; upper 25]; Cb2 [lower:23; upper 27]; K [lower:0.2; upper 0.6]. I restrict the priors for these parameter to (+or-)25% of their nominal values. I can simulate to have the result of function ( don’t solve the function, I only use the numerical method to simulation it (“computation fluid dynamic”(CFD))). How to I can find the mean and variance of the Function. I want to applies Bayesian method into my problem. Do you know how to find the mean and variance?

The last time, I have applied non-intrusive polynomial chaos expansion methods for my problem.(point Collocation method). I used the package Chaospy in the python to simulation it.

Step 1 : we generate 30 samples from the distribution which we then use to evaluate the model:
I simulate the 30 samples I have the 10 output.

step 2: Generate a 8th order polynomial and fit it to the 1000 samples generated.

step 3: Estimate mean and variance of output (value’s function).

Now, I want to apply Bayesian into my problem. Because in the polynomial chaos expansion method if you want have the good result then you have to take a lot of samples. The simulation with many case it is impossible.

I need you help me. How to approach to solve my problem?

looking forward for your help

Could you please share your code+result in Chaospy in a notebook? I am still a bit unclear of what is your aim - you said you want to find mean and variance of the function output, but what would be the input in this case? Also what is your data?

I am afraid if you want to have quality inference you also need lots of samples for it to work using Bayesian method.

So there are a few different ways to do it, the simplest is to replace the 8th order polynomial in step 2 you mentioned above with a GP, which gives you mean and variance of the function across all the inputs.

this is the code in the Chaospy.
the first, I run the code to consider what is the value of three coefficients

import chaospy as cp
import numpy as np
import matplotlib.pyplot as plt
distribution = cp.J(cp.Uniform(0.054, 0.066), cp.Uniform(0.9, 1.1), cp.Uniform( 45, 55))
cp.seed (20)
samples = distribution.sample(20,"L")
print (samples)

this is the output:

nodes1 = np.array([0.05675, 0.05934, 0.05633, 0.0557 , 0.05702, 0.06401, 0.06322, 0.06571, 0.06099, 0.05832, 0.06196, 0.06463, 0.05507, 0.06351, 0.06287, 0.06122, 0.05407, 0.05985, 0.05774,0.06015])
nodes2 = np.array([0.9486, 0.9095, 0.9856, 0.9318, 1.0477, 1.0489,1.0663, 0.9184, 0.9646, 1.0345, 1.0168, 1.0565, 0.9727, 0.9907, 0.9277, 0.9548, 1.0933, 1.0751,1.0026, 1.0231])
nodes3 = np.array([51.813, 54.279, 52.659, 51.197 , 46.629, 49.791, 48.581, 54.799, 46.413, 47.078, 52.367, 48.204, 50.389, 45.402, 47.893, 50.796 , 49.332, 53.323, 53.713, 45.757])

and then.
I use then these values to simulation I have the correspond to result. I called “evals”
example: with the node1 = 0.05675, node2 = 0.9486, node3 = 51.813 after I run simulation I have the evals = 0.0204232.

I have to simulate all case.

when I have the result all case. I will run the code

import chaospy as cp
import numpy as np
import matplotlib.pyplot as plt

nodes1 = np.array([0.05675, 0.05934, 0.05633, 0.0557 , 0.05702, 0.06401, 0.06322, 0.06571, 0.06099, 0.05832, 0.06196, 0.06463, 0.05507, 0.06351, 0.06287, 0.06122, 0.05407, 0.05985, 0.05774,0.06015])
nodes2 = np.array([0.9486, 0.9095, 0.9856, 0.9318, 1.0477, 1.0489,1.0663, 0.9184, 0.9646, 1.0345, 1.0168, 1.0565, 0.9727, 0.9907, 0.9277, 0.9548, 1.0933, 1.0751,1.0026, 1.0231])
nodes3 = np.array([51.813, 54.279, 52.659, 51.197 , 46.629, 49.791, 48.581, 54.799, 46.413, 47.078, 52.367, 48.204, 50.389, 45.402, 47.893, 50.796 , 49.332, 53.323, 53.713, 45.757])


distribution = cp.J(cp.Uniform(0.054, 0.066), cp.Uniform(0.9, 1.1), cp.Uniform( 45, 55))


nodes = np.vstack([nodes1, nodes2, nodes3])
evals = np.array([0.0204232,0.0205054,0.0204971,0.0204463,0.0206686,0.0206678,0.0206883,0.0204627,0.020426,0.0206532,0.0206322,0.020677,0.0204431,0.0205319,0.0204508,0.0204115,0.020721,0.0206988,0.0206179,0.0206418])

polynomial_expansion = cp.orth_ttr(2, distribution)
approx = cp.fit_regression(polynomial_expansion, nodes, evals)
expected = cp.E( approx, distribution)
variance = cp.Var(approx, distribution)
deviation = cp.Std( approx, distribution)

sobol_t = cp.Sens_t(approx,distribution)

print (sobol_t)
print (expected)
print(deviation)

samples = distribution.sample(10000,"L")

evaluations = approx(*samples)

plt.hist(evaluations,bins = 80, rwidth =1, color = 'b',edgecolor='K')
plt.xlabel ('Cd')
plt.ylabel ('numbers')
plt.grid(True,linestyle ='dotted' )
plt.show()

the result that I need is the mean and variance value of "evals " and the last step I sample 10000 sampling to plot the histogram of the 10000 sampling.

thank you

Yeah I think it could be done with Gaussian process, in this case your input X is [node1, node2, node3] and your observation Y is [evals]. And after you do the inference you can generate posterior prediction and compute the expectation and variance of the posterior predictive samples.

my problem is in the pymc3 I don’t know How to create the value [node1, node2, node3] the similar with first step that I make the input for my run simulation computational fluid dynamic (CFD) .

You would still need to use chaospy to do the first generation, then treating them as input to and try to approximate the underlying function using GP.