Tutorial real work about simple and multiple linear regression bayesian

nguyenhoaiThanhbk281 · September 15, 2018, 6:31am

Dear everyone.
I am newer using Pymc3. I have a problem need everyone help me. My problem is I have a function F(x,y,z). the variable (x,y,z) has the distribution is Uniform and have the value corresponding to ([0.3;0.6],[0.9;1.1],[45;55]). The first, I keep the Y and Z value in the center position. I conducted experiments with the X value I employ random value of X with 10 samples. I have 10 values of function F.

example: I have x value [ 0.3,0.32,0.35,0.37,0.4,0.43,0.45,0.5,0.55,0.6] then I have corresponding values functions F [0.002,0.0025,0.0024,0.0026,0.0027,0.00278,0.0265,0.0281,0.0286,0.0295]

How to apply Bayesian to my problem?
My target is to find the Mean, variance of the Function value and predict the value of the function when I have the x value.

My future target, don’t keep the value of Y and Z. I find the Mean, variance of the function and predict the value of the function when I do not fix Y, Z which I fix value above.
everyone Who has an idea helps me, please

sincerely thanks

HoaiThanh

junpenglao · September 15, 2018, 6:38am

Since you are approximating a function, the best way to model it is using a Gaussian Process. Have a look at the related doc: https://docs.pymc.io/notebooks/GP-Marginal.html

nguyenhoaiThanhbk281 · September 16, 2018, 12:47pm

Thank you to reply for my problem.
Do you think about If I use Multiple Linear Regression for my problem. Because My target is to find the mean and variance of output. And find the correlation coefficient between every coefficient and output value and coefficient together.
Thank you.

junpenglao · September 16, 2018, 12:51pm

Oh, so you know function F and it is a linear function? In that case you can model it as a F ~ Normal(beta*x + Y + Z) and had Y and Z from some distribution.

nguyenhoaiThanhbk281 · September 16, 2018, 1:04pm

I do not know if my function is linear or not. you can say more clear than about (F~Normal(beta*x+y+z)
thank you.

junpenglao · September 16, 2018, 1:23pm

Well, if you are using mulitple linear regression, then you are making the assumption of a linear function. In that case, your function is actually F(x, y, z) = ax+by+c*z, otherwise, GP can approximate function F if you dont know what is the expression of F.

nguyenhoaiThanhbk281 · September 16, 2018, 1:39pm

yes. Because My function is so complex. It is the Navier-Stockes function in the computation fluid dynamic (CFD). In the function have the 3 default coefficient. Maybe my function is F(x,y,z) =AX+BY+CZ. Thank you for your help to me. I will try study about Gaussian Process.

junpenglao · September 16, 2018, 7:42pm

Hmm, this is going to be pretty complicated depending on how you are going to solve your diffusion model. You might need to do some pretty advance stuff like pde.ipynb · GitHub

nguyenhoaiThanhbk281 · September 17, 2018, 1:35pm

Hi you, This is detail my research.

I have the 3 default variables of a function. the Value of the variables correspond to Cb1 [lower:21; upper 25]; Cb2 [lower:23; upper 27]; K [lower:0.2; upper 0.6]. I restrict the priors for these parameter to (+or-)25% of their nominal values. I can simulate to have the result of function ( don’t solve the function, I only use the numerical method to simulation it (“computation fluid dynamic”(CFD))). How to I can find the mean and variance of the Function. I want to applies Bayesian method into my problem. Do you know how to find the mean and variance?

The last time, I have applied non-intrusive polynomial chaos expansion methods for my problem.(point Collocation method). I used the package Chaospy in the python to simulation it.

Step 1 : we generate 30 samples from the distribution which we then use to evaluate the model:
I simulate the 30 samples I have the 10 output.

step 2: Generate a 8th order polynomial and fit it to the 1000 samples generated.

step 3: Estimate mean and variance of output (value’s function).

Now, I want to apply Bayesian into my problem. Because in the polynomial chaos expansion method if you want have the good result then you have to take a lot of samples. The simulation with many case it is impossible.

I need you help me. How to approach to solve my problem?

looking forward for your help

junpenglao · September 18, 2018, 4:32am

Could you please share your code+result in Chaospy in a notebook? I am still a bit unclear of what is your aim - you said you want to find mean and variance of the function output, but what would be the input in this case? Also what is your data?

I am afraid if you want to have quality inference you also need lots of samples for it to work using Bayesian method.

So there are a few different ways to do it, the simplest is to replace the 8th order polynomial in step 2 you mentioned above with a GP, which gives you mean and variance of the function across all the inputs.

nguyenhoaiThanhbk281 · September 19, 2018, 3:54pm

this is the code in the Chaospy.
the first, I run the code to consider what is the value of three coefficients

import chaospy as cp
import numpy as np
import matplotlib.pyplot as plt
distribution = cp.J(cp.Uniform(0.054, 0.066), cp.Uniform(0.9, 1.1), cp.Uniform( 45, 55))
cp.seed (20)
samples = distribution.sample(20,"L")
print (samples)

this is the output:

nodes1 = np.array([0.05675, 0.05934, 0.05633, 0.0557 , 0.05702, 0.06401, 0.06322, 0.06571, 0.06099, 0.05832, 0.06196, 0.06463, 0.05507, 0.06351, 0.06287, 0.06122, 0.05407, 0.05985, 0.05774,0.06015])
nodes2 = np.array([0.9486, 0.9095, 0.9856, 0.9318, 1.0477, 1.0489,1.0663, 0.9184, 0.9646, 1.0345, 1.0168, 1.0565, 0.9727, 0.9907, 0.9277, 0.9548, 1.0933, 1.0751,1.0026, 1.0231])
nodes3 = np.array([51.813, 54.279, 52.659, 51.197 , 46.629, 49.791, 48.581, 54.799, 46.413, 47.078, 52.367, 48.204, 50.389, 45.402, 47.893, 50.796 , 49.332, 53.323, 53.713, 45.757])

and then.
I use then these values to simulation I have the correspond to result. I called “evals”
example: with the node1 = 0.05675, node2 = 0.9486, node3 = 51.813 after I run simulation I have the evals = 0.0204232.

I have to simulate all case.

when I have the result all case. I will run the code

import chaospy as cp
import numpy as np
import matplotlib.pyplot as plt

nodes1 = np.array([0.05675, 0.05934, 0.05633, 0.0557 , 0.05702, 0.06401, 0.06322, 0.06571, 0.06099, 0.05832, 0.06196, 0.06463, 0.05507, 0.06351, 0.06287, 0.06122, 0.05407, 0.05985, 0.05774,0.06015])
nodes2 = np.array([0.9486, 0.9095, 0.9856, 0.9318, 1.0477, 1.0489,1.0663, 0.9184, 0.9646, 1.0345, 1.0168, 1.0565, 0.9727, 0.9907, 0.9277, 0.9548, 1.0933, 1.0751,1.0026, 1.0231])
nodes3 = np.array([51.813, 54.279, 52.659, 51.197 , 46.629, 49.791, 48.581, 54.799, 46.413, 47.078, 52.367, 48.204, 50.389, 45.402, 47.893, 50.796 , 49.332, 53.323, 53.713, 45.757])


distribution = cp.J(cp.Uniform(0.054, 0.066), cp.Uniform(0.9, 1.1), cp.Uniform( 45, 55))


nodes = np.vstack([nodes1, nodes2, nodes3])
evals = np.array([0.0204232,0.0205054,0.0204971,0.0204463,0.0206686,0.0206678,0.0206883,0.0204627,0.020426,0.0206532,0.0206322,0.020677,0.0204431,0.0205319,0.0204508,0.0204115,0.020721,0.0206988,0.0206179,0.0206418])

polynomial_expansion = cp.orth_ttr(2, distribution)
approx = cp.fit_regression(polynomial_expansion, nodes, evals)
expected = cp.E( approx, distribution)
variance = cp.Var(approx, distribution)
deviation = cp.Std( approx, distribution)

sobol_t = cp.Sens_t(approx,distribution)

print (sobol_t)
print (expected)
print(deviation)

samples = distribution.sample(10000,"L")

evaluations = approx(*samples)

plt.hist(evaluations,bins = 80, rwidth =1, color = 'b',edgecolor='K')
plt.xlabel ('Cd')
plt.ylabel ('numbers')
plt.grid(True,linestyle ='dotted' )
plt.show()

the result that I need is the mean and variance value of "evals " and the last step I sample 10000 sampling to plot the histogram of the 10000 sampling.

thank you

junpenglao · September 19, 2018, 6:34pm

Yeah I think it could be done with Gaussian process, in this case your input X is [node1, node2, node3] and your observation Y is [evals]. And after you do the inference you can generate posterior prediction and compute the expectation and variance of the posterior predictive samples.

nguyenhoaiThanhbk281 · September 20, 2018, 3:43am

my problem is in the pymc3 I don’t know How to create the value [node1, node2, node3] the similar with first step that I make the input for my run simulation computational fluid dynamic (CFD) .

junpenglao · September 20, 2018, 4:37am

You would still need to use chaospy to do the first generation, then treating them as input to and try to approximate the underlying function using GP.

Topic		Replies	Views
Beyond linear regression with pymc v5 modeling	11	650	October 29, 2022
Prediction by Bayesian linear regression Questions	7	1763	April 12, 2021
Mean of likelihood function for Bayesian regression Questions modeling	6	500	January 20, 2022
Multivariant linear regression with uncertainty (error propagation) Questions modeling	2	1505	September 7, 2022
Regressing through Latent Variables version agnostic modeling	1	67	February 3, 2025

Tutorial real work about simple and multiple linear regression bayesian

Related topics