A statistical model that analyzes the results of 15 independent polling firms indicates Republicans lead Democrats by a margin of 6.8 percent on the so-called generic ballot--a measure of national voter preference between the two major congressional caucuses. The model suggests that, if the election were held today, Republicans would achieve a 35 seat majority over Democrats--a slim, but decisive majority.
Over the next eight months, I will update this model and estimate Republican electoral gains. The charts below will provide the change in voter preferences over time, along with the current probability distribution for Republican seats gained. I will also post an explanation of my methodology below.
Methodology:
I use a mixture of beta-binomial models. Let Xi by the percentage of the national electorate supporting the Republicans according to pollster i. Let theta be my parameter of interest (i.e. the percentage of the votes that prefer the Republican Party). Then {theta | X1, X2, ... Xm} ~ 1/m*{X1 + X2 + ... + Xm}. In other words, I calculate the probability distribution of theta by simply taking the unweighted mean of all of the pollsters' distributions. One advantage of this method is that each pollster's belief is given equal weight, regardless of the number of polls that they have conducted. This characteristic prevents frequent pollsters (who often use internet or automated polling) from being disproportionately influential on the results.
How do I find the distribution of each Xi? Ah, now we have arrived at the interesting part. For this step, I use the beta-binomial model from Bayesian statistics. That is, Xi | Yj ~ beta(a[j], b[j])*binomial(y[j] | p, n[j]) = beta(a + y[j], b + n[j]- y[j]), where a[j]=a[j-1] +y[j-1], b[j]=b[j-1] +n[j-1] - y[j-1]. Y[j] is the number of people in the sample of poll j by pollster i that support the Republicans, while n[j] is the sample size of poll j. The polls are ordered chronologically so that the prior (beta(a[j], b[j])) is the posterior produced by the preceding poll. By using this recursive formula, we will have obtained the posterior distribution of Xi when we have iterated through to the most recent poll.
The initial prior for all posteriors of {X1...Xm} is beta(7.2, 6.6), which corresponds roughly to the results of the 1994 election (though note that the weight on this prior is very low). Furthermore, I normalize the y[j]s and n[j]s by a vector of constants in order to account for the staleness of each poll, i.e. the more recent polls receive the most weight. I apply this Bayesian inference to the following 15 pollsters:
Rasmussen (11 polls since January 1, 2010)
CNN (3)
Quinnipiac (1)
Democracy Corps (4)
YouGov (6)
Gallop (3)
PPP (3)
National Review (2)
Ipsos (1)
Newsweek (1)
Public Opinion Strategies (1)
Pew (1)
ABC/Washington Post (1)
Franklin & Marshall (1)
NPR (1)
I estimate the number of seats held by Republicans after the election by using the following formula: # of seats = theta*435 + 5. Thus, the number of seats is roughly proportional to the level of national support for the Republican caucus. However, I do add five seats to the estimate in order to compensate for structural advantages that Republicans have in the Congressional elections.