J. Oliver's Statistics: April 2010

Based on polling data released over the last four months, Republicans are expected to gain between 25 and 60 seats in the U.S. House of Representatives (with a 90 percent level of credibility). According to the model, the most likely outcome is about a 45 seat gain by Republicans. Such an electoral accomplishment would be sufficient to unseat the Democratic Party from majority status in the lower chamber. The Republican gains would also represent the largest electoral landslide since 1994.

A statistical model that analyzes the results of 15 independent polling firms indicates Republicans lead Democrats by a margin of 1.8 percent on the so-called generic ballot--a measure of national voter preference between the two major congressional caucuses. This figure is little changed from the March estimate of 2.0 percent. The March figure, however, has been revised downward from its preliminary estimate of 6.8 percent as a result of a technical correction in the data, along with the incorporation of post-health care polling. The April estimate implies that, if the election were held today, Republicans would achieve a 13 seat majority over Democrats--a very modest advantage.

Over the next seven months, I will update this model and estimate Republican electoral gains. The charts below will provide the change in voter preferences over time, along with the probability distribution for the number of seats gained by Republicans on election day. I will also post an explanation of my methodology below.

Methodology:

I use a mixture of beta-binomial models. Let Xi by the percentage of the national electorate supporting the Republicans according to pollster i. Let theta be my parameter of interest (i.e. the percentage of the votes that prefer the Republican Party). Then {theta | X1, X2, ... Xm} ~ 1/m*{X1 + X2 + ... + Xm}. In other words, I calculate the probability distribution of theta by simply taking the unweighted mean of all of the pollsters' distributions. One advantage of this method is that each pollster's belief is given equal weight, regardless of the number of polls that they have conducted. This characteristic prevents frequent pollsters (who often use internet or automated polling) from being disproportionately influential on the results.

How do I find the distribution of each Xi? Ah, now we have arrived at the interesting part. For this step, I use the beta-binomial model from Bayesian statistics. That is, Xi | Yj ~ beta(a[j], b[j])*binomial(y[j] | p, n[j]) = beta(a + y[j], b + n[j]- y[j]), where a[j]=a[j-1] +y[j-1], b[j]=b[j-1] +n[j-1] - y[j-1]. Y[j] is the number of people in the sample of poll j by pollster i that support the Republicans, while n[j] is the sample size of poll j. The polls are ordered chronologically so that the prior (beta(a[j], b[j])) is the posterior produced by the preceding poll. By using this recursive formula, we will have obtained the posterior distribution of Xi when we have iterated through to the most recent poll.

The initial prior for all posteriors of {X1...Xm} is beta(7.2, 6.6), which corresponds roughly to the results of the 1994 election (though note that the weight on this prior is very low). Furthermore, I normalize the y[j]s and n[j]s by a vector of constants in order to account for the staleness of each poll, i.e. the more recent polls receive the most weight. I apply this Bayesian inference to the following 15 pollsters:

Rasmussen (14 polls since January 1, 2010)

CNN (5)

Quinnipiac (1)

Democracy Corps (5)

YouGov (12)

Gallop (8)

PPP (4)

National Review (2)

Ipsos (1)

Newsweek (1)

Public Opinion Strategies (2)

Pew (2)

ABC/Washington Post (3)

Franklin & Marshall (1)

NPR (1)

I estimate the number of seats held by Republicans after the election by using the following formula: # of seats = theta*435 + 5. Thus, the number of seats is roughly proportional to the level of national support for the Republican caucus. However, I do add five seats to the estimate in order to compensate for structural advantages that Republicans have in the Congressional elections.
______________________________________________
NB: I have adjusted one element of my model since March. I have altered the weight that is applied to each poll. Previously, I normalized the sample size of each poll according to the date at which the data was retrieved--such that the most recent polls received the most influence. I still give the most recent polls more weight. However, the precise algorithm for determining the weightings has changed. My previous strategy was to give the most recent polls a weight of one (and older polls a weigh in [0, 1]). Now, I give all polls a weight of (1/2)^(M), where M is the number of months between the collection date of the poll and the day of the election.

Why the change? Well, my ultimate objective is the produce a probability distribution for the number of seats gained by Republicans on election day. However, my former approach really created a probability distribution for the number of seats gained by Republicans if the election were held today. The mode (or expectation) of the distribution for the number of Republican acquisitions was correct, but the distributional precision was too high because I gave the most recent polls a full weight--even though the most recent polls are still months removed from the election. By weighting the sample sizes on the basis of the amount of time remaining before the election, I can provide a distribution that has a much larger and more plausible variance. In order words, the discrepancy between the spreads of the March distribution and April distribution is entirely due to this modification in my methodology.

I should finally mention that this alteration, nevertheless, has little to no effect on the central tendency of my predictions because it does not change the relative magnitudes of the weights. (The new algorithm also has the nice property that weight for a given poll does not change from month-to-month since the magnitude is based on the date of the election, rather than the post date of my projections.)

J. Oliver's Statistics

Friday, April 30, 2010

Revised Model Predicts 45 Seat Gain for Republicans

Blog Archive

About Me