Beyond linear regression with pymc

obs_data_generated.csv (660.4 KB)

My apologies! The code below should run with no problem generating the appropriate data. I have also uploaded one sample of such a run.
I had to bring in one more condition as the ratio to be able to generate x_0 and x_1. This physically makes sense in the scope of my research, I have never thought to put that into my pymc model. The ideal model should take into account the uncertainties in all the observed data. I tried this in my original model by letting the model learn the noise in my observation data. I’d really appreciate it if you have more suggestions.

# number of data points
n = 10000

x_0 = []
x_1 = []


### upper and lower limits for drawing a random value for x_2 from a normal distribution 
low_truncated_x_2 = 0
up_truncated_x_2 = 60
x_2 = []

### upper and lower limits for drawing random values for x_0 / x_1 ratio from a normal distribution
low_truncated_val_ratio = 0.2
up_truncated_val_ratio = 0.9
ratio_ = [] ### ratio = x_0 divided by x_1

### upper and lower limits for drawing a random value for F from a normal distribution 
low_truncated_F = 0.1
up_truncated_F = 1.5
F = []

### upper and lower limit for drawing a random value for C from a normal distribution 
low_truncated_C = 0
up_truncated_C = 20
C = []

### upper and lower limit for drawing a random value for dp from a normal distribution 
low_truncated_dp = 0.1
up_truncated_dp = 70
dp = []


### drawing random data to generate fake data

#### ratio_ 
while len(ratio_) < n:
    num_ = np.random.normal(0.4, 0.5, size=(1,))[0]
    if (num_>= low_truncated_val_ratio) & (num_ <= up_truncated_val_ratio):
        ratio_.append(num_)

#### x_2
while len(x_2) < n:
    num_ = np.random.normal(40, 20, size=(1,))[0]
    if (num_ >= low_truncated_x_2)&(num_<= up_truncated_x_2):
        x_2.append(num_)

#### F
while len(F) < n:
    num_ = np.random.normal(0.6, 0.4, size=(1,))[0]
    if (num_ >= low_truncated_F)&(num_<= up_truncated_F):
        F.append(num_)
#### C
while len(C) < n:
    num_ = np.random.normal(0, 25, size=(1,))[0]
    if (num_ >= low_truncated_C)&(num_<= up_truncated_C):
        C.append(num_)

#### dp
while len(dp) < n:
    num_ = np.random.normal(5, 1, size=(1,))[0]
    if (num_ >= low_truncated_dp)&(num_<= up_truncated_dp):
        dp.append(num_)

for j in range(n):

    x_1.append((dp[j] - C[j]/F[j] + x_2[j]) / (1-ratio_[j]/F[j])) 
    x_0.append(ratio_[j] * x_1[j])

df_data = pd.DataFrame({"x_0" : x_0, "x_1" : x_1, "x_2": x_2,"C": C, "F" : F, "dp": dp})
df_final = df_data[(df_data["x_0"] > 0)&(df_data["x_0"] < 90) & (df_data["x_1"] > 0) &(df_data["x_1"] < 210)]
df_final.to_csv("obs_data_generated.csv")