ValueError: design matrix must be real-valued floating point

Hi Team,
I am building a model automation using PYMC3. My code looks like this,

dict = {}
for idx, priors in prior_init.iterrows():
    dict[priors['pseudo_name']] = Uniform.dist(lower=priors['lower'], upper=priors['upper'])

print ("******************FORMULA IS: \n\n\n", formula, "\n\n")
#
print ("Prior is -------------\n",prior_init, "\n\n\n")
#
with Model() as model:
    pm.glm.GLM.from_formula(formula, pymc_data, priors = dict)

# print ("****************** Processing PYMC3 ************************\n")
with model:
    #step = pm.metropolis
    start = find_MAP(fmin=opt.fmin_powell)
    trace = sample(3000, step = pm.Metropolis(), start=start)
    
# print ('WAIC', waic(trace, model=model))
# print ('DIC' , dic(trace, model=model))
# print ('BPIC', bpic(trace, model=model))

z = pm.df_summary(trace)

dict looks like this------
{‘Intercept’: <pymc3.distributions.continuous.Uniform at 0x24c85fced68>,
‘var1’: <pymc3.distributions.continuous.Uniform at 0x24c862c90f0>,
‘var10’: <pymc3.distributions.continuous.Uniform at 0x24c867107b8>,
‘var11’: <pymc3.distributions.continuous.Uniform at 0x24c875f3208>,
‘var12’: <pymc3.distributions.continuous.Uniform at 0x24c87282748>,
‘var13’: <pymc3.distributions.continuous.Uniform at 0x24c878082b0>,
‘var14’: <pymc3.distributions.continuous.Uniform at 0x24c87808128>,
‘var15’: <pymc3.distributions.continuous.Uniform at 0x24c87282ac8>,
‘var16’: <pymc3.distributions.continuous.Uniform at 0x24c87810b00>,
‘var17’: <pymc3.distributions.continuous.Uniform at 0x24c87448c88>,
‘var18’: <pymc3.distributions.continuous.Uniform at 0x24c86decd68>,
‘var2’: <pymc3.distributions.continuous.Uniform at 0x24c86b550b8>,
‘var3’: <pymc3.distributions.continuous.Uniform at 0x24c872714e0>,
‘var4’: <pymc3.distributions.continuous.Uniform at 0x24c86763668>,
‘var5’: <pymc3.distributions.continuous.Uniform at 0x24c87273320>,
‘var6’: <pymc3.distributions.continuous.Uniform at 0x24c85e44dd8>,
‘var7’: <pymc3.distributions.continuous.Uniform at 0x24c86763a90>,
‘var8’: <pymc3.distributions.continuous.Uniform at 0x24c8711bfd0>,
‘var9’: <pymc3.distributions.continuous.Uniform at 0x24c8711b588>}

my pymc_data looks like this,
var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 var11 var12 var13 var14 var15 var16 var17 var18 Sales
0 65.968675 0.84 15.2701 2.1527 5.4806 0.489 1.460986567 1.741694175 0.0 3.910838084 1.092920548 1.79e-12 2.21e-08 0.28584655 0.24433583399999997 0.0 0.0 0.0 529.9619
1 67.5334 0.87 17.9691 1.6467 13.1201 4.6723 1.442039791 1.4960437830000002 21.818190700000002 0.0 6.651631717000001 9.82e-13 1.2099999999999999e-08 0.285974195 0.194164878 0.0 0.0 0.0 620.3295
2 67.3809 0.79 17.9222 2.5687 22.6372 9.0627 1.403944105 1.6282092 23.46538631 2.306421137 4.922758255 5.379999999999999e-13 6.64e-09 0.119166312 0.20174241 0.0 0.0 0.0 632.1304
3 65.38 0.85 31.9901 2.8427 20.4549 7.7223 1.348839268 1.650558451 24.08396163 0.0 3.643248735 2.95e-13 3.63e-09 0.209231403 0.211423745 0.0 0.0 0.0 638.6615
4 62.8118875 0.83 35.289 2.4745 10.1979 14.6548 1.3085214059999999 1.4479207159999998 23.92486445 0.0 2.696305742 1.6100000000000002e-13 1.99e-09 0.161398017 0.130766968 0.0 0.0 2.874286207 574.2733
5 61.20905 0.84 32.243 1.5991 14.9932 3.5393 1.30844891 1.39640381 15.92901664 0.0 1.995489515 8.84e-14 1.09e-09 0.06887924200000001 0.20856653600000002 0.0 0.0 3.186550455 556.783
6 60.2662625 1.0 18.7617 1.4808 7.09 3.9164 1.3073771109999999 1.257778041 6.291787886 0.0 1.4768274769999998 4.84e-14 5.97e-10 0.189544981 0.2463737 0.0 0.0 2.003988922 482.9076
7 59.0856375 0.87 22.6611 2.4573 11.7184 12.8453 1.3406977740000001 1.298065021 14.43566603 2.683821458 1.092974621 2.65e-14 3.27e-10 0.15776435 0.09539392 0.0 0.0 2.0255016169999998 554.5634
8 57.6705375 0.86 14.0938 1.9221 11.0503 5.2115 1.334292978 1.7044652530000002 11.76171245 2.512806419 0.808891722 1.45e-14 1.79e-10 0.062262187999999996 0.133790882 0.0 0.0 0.0 480.2632
9 56.11065 0.82 10.0776 1.7768 11.1672 0.7745 1.4004676180000002 1.5697799209999999 13.34864893 0.0 0.598646855 7.96e-15 9.820000000000001e-11 0.182504164 0.142478068 0.0 0.0 0.404979012 442.1669
10 56.774025 0.85 9.251 2.4151 3.7361 1.7296 1.383043651 1.4484744390000002 16.52311312 0.17758719 6.650080099999999 4.36e-15 5.379999999999999e-11 0.137670149 0.152970585 0.0 0.0 6.103887155 461.18
11 57.19335 0.86 8.0343 2.7391 29.1426 20.1989 1.370721055 1.491970911 17.76790582 2.581827508 4.92160993 2.3899999999999998e-15 2.95e-11 0.057027274 0.08660254 0.0 0.0 0.077089558 674.1602
12 58.681025 0.85 9.9183 3.4747 16.6473 8.2476 1.3226738409999999 1.4524709980000001 16.70324018 3.7268898630000002 3.64239888 1.31e-15 1.61e-11 0.207987355 0.28354893800000003 0.0 0.0 0.0 547.6149
13 58.6487625 0.83 18.516 3.1345 8.6557 3.2639 1.393997195 1.4744986269999998 16.91199278 3.169331635 2.695676778 7.16e-16 8.84e-12 0.14974695300000002 0.341320963 0.0 0.0 7.341655208 473.3629
14 58.5308625 0.89 22.5153 2.2564 2.3721 1.0061 1.4047940630000002 1.480850566 16.43127167 3.427617212 1.99502403 3.92e-16 4.84e-12 0.056095276 0.356089876 113.780051 1.747998856 76.16046244 468.2492
15 58.646275 0.95 16.261 2.0446 5.1886 2.5959 1.372736114 1.50966546 15.15973989 2.008482532 1.476482979 2.15e-16 2.65e-12 0.203493636 0.31685959 143.4190713 1.624761521 7.994356409 451.0772
16 57.79665 0.93 12.6764 3.9441 22.6553 10.5435 1.33000862 1.440223108 14.612008600000001 2.892944618 1.092719664 1.17e-16 1.45e-12 0.153200326 0.264007576 154.5817615 1.708610547 3.551517647 617.1936
17 57.9634875 0.87 10.7354 1.911 6.7582 1.963 1.252336508 1.367363668 15.60003017 2.6007946 0.808703033 6.44e-17 7.959999999999999e-13 0.053883020999999996 0.63387696 167.4185315 0.0 4.079910122 431.7602
18 57.3260375 0.92 15.0732 2.4045 8.2784 2.3125 1.4293630990000001 1.343116451 16.6373017 2.534635412 0.598507209 3.54e-17 4.36e-13 0.053353632000000005 0.557942649 140.072393 0.0 2.649222603 464.4708
19 57.3714 0.82 12.4054 2.0728 4.4784 1.662 1.327184407 1.379131031 16.58901056 3.35513271 0.442944894 1.93e-17 2.39e-13 0.210256058 0.440227214 117.19297209999999 0.0 0.46154934700000005 427.4129
20 56.567125 0.87 11.7413 2.5183 27.7579 16.0378 1.269451862 1.39145485 16.57995626 3.1103032280000003 6.650057162 1.0599999999999999e-17 1.31e-13 0.18728521 0.45310043 141.8056236 0.0 2.43688346 667.2775
21 56.91355 0.86 15.787 2.1519 7.2272 6.1375 1.213861763 1.2915862340000002 16.10218254 2.117947938 4.921592955 5.8e-18 7.16e-14 0.055440869000000004 0.257681975 151.4799803 0.0 0.08870400199999999 476.0278

my formula looks like this,

Sales~var1+var2+var3+var4+var5+var6+var7+var8+var9+var10+var11+var12+var13+var14+var15+var16+var17+var18

I am getting following error,

File “”, line 1, in
runfile(‘D:/#Delete/CFK/PYMC/automatic_bayesian_inference.py’, wdir=‘D:/#Delete/CFK/PYMC’)

File “C:\Users\csuman\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py”, line 710, in runfile
execfile(filename, namespace)

File “C:\Users\csuman\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py”, line 101, in execfile
exec(compile(f.read(), filename, ‘exec’), namespace)

File “D:/#Delete/CFK/PYMC/automatic_bayesian_inference.py”, line 84, in
pm.glm.GLM.from_formula(formula, pymc_data, priors = dict)

File “C:\Users\csuman\AppData\Local\Continuum\anaconda3\lib\site-packages\pymc3\glm\linear.py”, line 133, in from_formula
y, x = patsy.dmatrices(formula, data)

File “C:\Users\csuman\AppData\Local\Continuum\anaconda3\lib\site-packages\patsy\highlevel.py”, line 310, in dmatrices
NA_action, return_type)

File “C:\Users\csuman\AppData\Local\Continuum\anaconda3\lib\site-packages\patsy\highlevel.py”, line 203, in _do_highlevel_design
rhs, rhs_orig_index = _regularize_matrix(rhs, “x”)

File “C:\Users\csuman\AppData\Local\Continuum\anaconda3\lib\site-packages\patsy\highlevel.py”, line 202, in _regularize_matrix
return (DesignMatrix(m, di), orig_index)

File “C:\Users\csuman\AppData\Local\Continuum\anaconda3\lib\site-packages\patsy\design_info.py”, line 1057, in new
raise ValueError(“design matrix must be real-valued floating point”)

ValueError: design matrix must be real-valued floating point

Can you please let me know where I am going wrong?

I’m just guessing, but your var13 and var14 all have very small values, close to zero.
Maybe that leads to the errors?
You could try running your model without them.

@falk
I have tried removing those variables from the model, no help!

This is more a patsy error which generates the design matrix. I agree with @falk it is something to do with your input - try entering the predictors one by one in your linear model and check which one is causing the error.

@junpenglao, I will try now.

1 Like

@junpenglao
I have tried removing the variables from the model. My data looks like this,

     var1     var2    var3     Sales

0 65.968675 15.2701 2.1527 529.9619
1 67.533400 17.9691 1.6467 620.3295
2 67.380900 17.9222 2.5687 632.1304
3 65.380000 31.9901 2.8427 638.6615
4 62.811887 35.2890 2.4745 574.2733
5 61.209050 32.2430 1.5991 556.7830
6 60.266263 18.7617 1.4808 482.9076
7 59.085637 22.6611 2.4573 554.5634
8 57.670538 14.0938 1.9221 480.2632
9 56.110650 10.0776 1.7768 442.1669
10 56.774025 9.2510 2.4151 461.1800
11 57.193350 8.0343 2.7391 674.1602
12 58.681025 9.9183 3.4747 547.6149
13 58.648762 18.5160 3.1345 473.3629
14 58.530862 22.5153 2.2564 468.2492
15 58.646275 16.2610 2.0446 451.0772
16 57.796650 12.6764 3.9441 617.1936
17 57.963487 10.7354 1.9110 431.7602
18 57.326037 15.0732 2.4045 464.4708
19 57.371400 12.4054 2.0728 427.4129
20 56.567125 11.7413 2.5183 667.2775
21 56.913550 15.7870 2.1519 476.0278

my prior_dict looks like this,

{‘Intercept’: <pymc3.distributions.continuous.Uniform at 0x24c85ea3a58>,
‘var1’: <pymc3.distributions.continuous.Uniform at 0x24c85d57438>,
‘var2’: <pymc3.distributions.continuous.Uniform at 0x24c85d574e0>,
‘var3’: <pymc3.distributions.continuous.Uniform at 0x24c85d9b048>}

Still not any help, I am getting the same error.

It is not related to your prior, is this all your data? The error seems to suggest in your data table it contains non floating point value.
You can test better with below (which does not need to compile the pymc3 model)

import patsy
y, x = patsy.dmatrices(formula, data)

It might also be relevant how you import your data. Is it from text/csv, from excel, from a database? even if the var’s in pymc_data look like numbers, it might be they have the wrong data type.

Hi falk/junpenglao,
the issue with the data has been resolved. I am able to run model with few independent variables.

The problem arises when I am running with more number of variable. Model is getting run but with NO estimation,

input data ----------------------------

var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 var11 var12 var13 var14 var15 Sales
65.968675 15.2701 2.1527 5.4806 0.489 1.460987 1.741694 0 3.910838 1.092921 0.285847 0.244336 0 0 0 529.9619
67.5334 17.9691 1.6467 13.1201 4.6723 1.44204 1.496044 21.818191 0 6.651632 0.285974 0.194165 0 0 0 620.3295
67.3809 17.9222 2.5687 22.6372 9.0627 1.403944 1.628209 23.465386 2.306421 4.922758 0.119166 0.201742 0 0 0 632.1304
65.38 31.9901 2.8427 20.4549 7.7223 1.348839 1.650558 24.083962 0 3.643249 0.209231 0.211424 0 0 0 638.6615
62.811887 35.289 2.4745 10.1979 14.6548 1.308521 1.447921 23.924864 0 2.696306 0.161398 0.130767 0 0 2.874286 574.2733
61.20905 32.243 1.5991 14.9932 3.5393 1.308449 1.396404 15.929017 0 1.99549 0.068879 0.208567 0 0 3.18655 556.783
60.266263 18.7617 1.4808 7.09 3.9164 1.307377 1.257778 6.291788 0 1.476827 0.189545 0.246374 0 0 2.003989 482.9076
59.085637 22.6611 2.4573 11.7184 12.8453 1.340698 1.298065 14.435666 2.683821 1.092975 0.157764 0.095394 0 0 2.025502 554.5634
57.670538 14.0938 1.9221 11.0503 5.2115 1.334293 1.704465 11.761712 2.512806 0.808892 0.062262 0.133791 0 0 0 480.2632
56.11065 10.0776 1.7768 11.1672 0.7745 1.400468 1.56978 13.348649 0 0.598647 0.182504 0.142478 0 0 0.404979 442.1669
56.774025 9.251 2.4151 3.7361 1.7296 1.383044 1.448474 16.523113 0.177587 6.65008 0.13767 0.152971 0 0 6.103887 461.18
57.19335 8.0343 2.7391 29.1426 20.1989 1.370721 1.491971 17.767906 2.581828 4.92161 0.057027 0.086603 0 0 0.07709 674.1602
58.681025 9.9183 3.4747 16.6473 8.2476 1.322674 1.452471 16.70324 3.72689 3.642399 0.207987 0.283549 0 0 0 547.6149
58.648762 18.516 3.1345 8.6557 3.2639 1.393997 1.474499 16.911993 3.169332 2.695677 0.149747 0.341321 0 0 7.341655 473.3629
58.530862 22.5153 2.2564 2.3721 1.0061 1.404794 1.480851 16.431272 3.427617 1.995024 0.056095 0.35609 113.780051 1.747999 76.160462 468.2492
58.646275 16.261 2.0446 5.1886 2.5959 1.372736 1.509665 15.15974 2.008483 1.476483 0.203494 0.31686 143.419071 1.624762 7.994356 451.0772
57.79665 12.6764 3.9441 22.6553 10.5435 1.330009 1.440223 14.612009 2.892945 1.09272 0.1532 0.264008 154.581761 1.708611 3.551518 617.1936
57.963487 10.7354 1.911 6.7582 1.963 1.252337 1.367364 15.60003 2.600795 0.808703 0.053883 0.633877 167.418532 0 4.07991 431.7602
57.326037 15.0732 2.4045 8.2784 2.3125 1.429363 1.343116 16.637302 2.534635 0.598507 0.053354 0.557943 140.072393 0 2.649223 464.4708
57.3714 12.4054 2.0728 4.4784 1.662 1.327184 1.379131 16.589011 3.355133 0.442945 0.210256 0.440227 117.192972 0 0.461549 427.4129
56.567125 11.7413 2.5183 27.7579 16.0378 1.269452 1.391455 16.579956 3.110303 6.650057 0.187285 0.4531 141.805624 0 2.436883 667.2775
56.91355 15.787 2.1519 7.2272 6.1375 1.213862 1.291586 16.102183 2.117948 4.921593 0.055441 0.257682 151.47998 0 0.088704 476.0278

model - prior

{‘Intercept’: <pymc3.distributions.continuous.Uniform at 0x2c9b448f780>,
‘var1’: <pymc3.distributions.continuous.Uniform at 0x2c9b1b87438>,
‘var10’: <pymc3.distributions.continuous.Uniform at 0x2c9ab968eb8>,
‘var11’: <pymc3.distributions.continuous.Uniform at 0x2c9abae67b8>,
‘var12’: <pymc3.distributions.continuous.Uniform at 0x2c9b4482e80>,
‘var13’: <pymc3.distributions.continuous.Uniform at 0x2c9ab88e6a0>,
‘var14’: <pymc3.distributions.continuous.Uniform at 0x2c9ab7914a8>,
‘var15’: <pymc3.distributions.continuous.Uniform at 0x2c9b4482a90>,
‘var2’: <pymc3.distributions.continuous.Uniform at 0x2c9aaa252b0>,
‘var3’: <pymc3.distributions.continuous.Uniform at 0x2c9a5df7780>,
‘var4’: <pymc3.distributions.continuous.Uniform at 0x2c9b1ed77f0>,
‘var5’: <pymc3.distributions.continuous.Uniform at 0x2c9aafec080>,
‘var6’: <pymc3.distributions.continuous.Uniform at 0x2c9b43d0898>,
‘var7’: <pymc3.distributions.continuous.Uniform at 0x2c9b267af60>,
‘var8’: <pymc3.distributions.continuous.Uniform at 0x2c9b2b45780>,
‘var9’: <pymc3.distributions.continuous.Uniform at 0x2c9aa76b860>}

I am seeing following while running the models,

logp = nan:   4%|▍         | 200/5000 [00:00<00:04, 1179.25it/s]Optimization terminated successfully.
         Current function value: -10000000000000000159028911097599180468360808563945281389781327557747838772170381060813469985856815104.000000
         Iterations: 1
         Function evaluations: 205
logp = nan:   4%|▍         | 205/5000 [00:00<00:11, 399.62it/s] 
100%|██████████| 10500/10500 [01:21<00:00, 129.22it/s]
WAIC WAIC_r(WAIC=nan, WAIC_se=nan, p_WAIC=nan)
DIC nan
BPIC nan

this is the output I am getting at the end.

mean sd mc_error hpd_2.5 hpd_97.5
Intercept NaN NaN NaN NaN NaN
var1 NaN NaN NaN NaN NaN
var2 NaN NaN NaN NaN NaN
var3 NaN NaN NaN NaN NaN
var4 NaN NaN NaN NaN NaN
var5 NaN NaN NaN NaN NaN
var6 NaN NaN NaN NaN NaN
var7 NaN NaN NaN NaN NaN
var8 NaN NaN NaN NaN NaN
var9 NaN NaN NaN NaN NaN
var10 NaN NaN NaN NaN NaN
var11 NaN NaN NaN NaN NaN
var12 NaN NaN NaN NaN NaN
var13 NaN NaN NaN NaN NaN
var14 NaN NaN NaN NaN NaN
var15 NaN NaN NaN NaN NaN
sd 13.302194 5.33E-15 0 13.302194 13.302194

Can you please help me what could be the possible source of error?

Seems your model is badly conditioned, with almost as many rows as columns, and few columns with many 0s. Would you be able to run a linear regression using eg scipy?

Also, you should change your prior (Uniform priors are usually bad idea), change the sampler (Metropolis failed saliently sometimes), change the initialization (using MAP as starting value is a bad idea). :sweat_smile: In another word, try with the default:

with model:
    trace = sample(1000, tune=1000)

Thank you. I will update it at my end.