Bayesian model comparison between weighted and unweighted regression

Liam · December 27, 2018, 6:28am

I am attempting to use PyMC3 to model the 2018 Mississippi special Senate election using a beta regression model.

My dataset consists the %age of vote won by the Democrat and the %age turnout by county (2 separate models) in each round of the election, along with various demographic factors for each county. My goal was to use the demographic and 1st round election data, along with partial results from the 2nd round, to generate a forecast of the final 2nd round result (similar to the NYT needle). My frequentist model performed poorly on election night, so I am building a retrospective model in PyMC3 to see if it would have done better.

In order to decide between using simple regression and weighted regression (weights = county populations), I used Kruschke’s Bayesian model comparison with the first round data to decide between using simple regression and weighted regression (weights= county populations). However, my posterior probability on the unweighted model was 100% for predicting the margin and 0% for predicting the turnout; these results are intuitively implausible so I’m concerned I made a mistake.

Is this an appropriate use of Bayesian model comparison?
Did I do it correctly? My code for the turnout model is below; the margin model has an identical structure but different data

#avoid data snooping by dropping first round data
xTurnout1 = turnoutPredictors.drop(['TotalPop', '11/6/18', '11/6 Dem %', 'GOP %McDaniel'],axis=1)
yTurnout1 = turnoutPredictors['11/6/18']
n_predictors_Turnout1 = len(xTurnout1.columns)
predictor_names = list(xTurnout1.columns)

with pm.Model() as prelimTurnoutModel:
    #model comparison
    m = pm.Categorical('m',[np.asarray([.5,.5])])

    
  
    #the hyperprior distribution for the mean of t-distribution
    muB = pm.Normal('muB', 0, 1)
    #the hyperprior on the variance of the t-distribution
    #replacing gammas per Gelman; want heavy tails here b/c uncertain
    tauB = pm.HalfCauchy('tauB', 1)

    tdfB = pm.HalfCauchy('tdfB', 1)

    # define the priors
    
    #the mean y value
    #even though this is an uninformative prior, tau should be high because we know the mean margin will be between 0 and 1
    beta0 = pm.Normal('beta0', mu=0, tau=10)
    #the regression coefficients 
    beta1 = pm.StudentT('beta1', mu=muB, lam=tauB, nu=tdfB, shape=n_predictors_Turnout1)
    mu =  beta0 + pm.math.dot(beta1, xTurnout1.values.T)
    #affects <1% of sample values
    mu_clipped = T.clip(mu,.0000001,.9999999)
    
    #a scale parameter ("sample size") for the beta distribution

    kappa_log = pm.Exponential('kappa_log', lam=1.5)
    kappa = pm.Deterministic('kappa', T.exp(kappa_log))
    #nullpopweights = 1 for every county, popweights = the county population / mean county population
    omega = pm.math.switch(T.eq(m,0),T.as_tensor(nullpopweights),T.as_tensor(popweights))
    kappa_w = omega * kappa
    alphaY = mu_clipped * kappa_w
    betaY = (1-mu_clipped)*kappa_w
   
    yl = pm.Beta('yl', alpha=alphaY, beta=betaY, observed=yTurnout1)

 
    trace = pm.sample(2000,  tune=2000, cores=4, nuts_kwargs={'target_accept': 0.95} )

Apologies if this is an inappropriate question, or I’ve included too much/too little information (please let me know)

Topic		Replies	Views
[Beginner level question on modeling] Bayesian analysis of F1 scores from two ML models v5 modeling	5	417	January 24, 2023
Computation of Bayes Fractor Questions	3	707	April 20, 2020
Help with computing Bayes Factors version agnostic modeling	13	721	July 19, 2022
Computing Bayes Factor with Bernoulli Distribution v5 modeling	11	91	June 3, 2025
Comparing models - ranks vs weights	2	38	July 30, 2024

Bayesian model comparison between weighted and unweighted regression

Related topics