Polling: The Basics, Why We Do it, and the Challenges Facing it Today
The 2016 election shocked many people across the country as Donald Trump defeated Hillary Clinton to become the president of the United States. Some of this shock wasn’t due to partisan or ideological allegiances but to polls that suggested that Clinton would be the victor. Political scientists and pundits began searching for the reasons why this happened, and countless outlets discussed the results. Although multiple reasons have been posited, it seems that the general agreement is that the polls missed whites who had lower levels of education, potentially being “shy” about their support of Trump.
It’s hardly the first time that political polls turned out to be off-base. Recent other polling missteps included the United Kingdom’s 2015 elections and the Israeli elections that same year. Critics once again faulted the polling industry. In the past, some have even called for a complete end to polling.
So, should we simply just disregard polling? Should we routinely assume that polling is deeply flawed?
The short answer: Absolutely not.
Certainly, polling has its shortcomings, which will also be discussed below. Yet there is also a widespread lack of understanding on how they work and why we even do it in the first place, especially amongst those who are most antagonistic to the polling industry as a whole. Today, we’re going to cover the basics of polling, major points in polling history, and current issues facing polling (although not comprehensively, as that would be an extremely lengthy discussion far too long for this post).
The Basics of Polling and Why We Do it
So, why do we even poll? Why not just ask everybody in whatever population you are trying to understand? Such an undertaking is impossible. There are far too many people to contact, and there isn’t enough time to do it. Polling also costs money, and there is only so much money that can be spent. It would also be exhausting on the people who are doing the polling.
Therefore, we have to take a sample to gauge the population. Over the years, polling has been refined to make it ever more scientific and as accurate as possible. Here are some of the basic characteristics to know.
The basic types of polling: Different kinds of polling include phone surveys, mail-in surveys, and Internet polling. Each brings their own advantages and disadvantages in terms of cost, response rates, and representativeness of samples.
Random selection: While there are a few different ways to do this, you ideally seek a randomly selected sample, where everyone in the population you are studying has an equal chance of being selected. This limits bias in the poll since you are not deliberately picking and choosing people for your survey.
Sample size: The more people you have in your sample, the better. Increasing sample size gets you inevitably closer to the total population size, meaning you are likely to get a more accurate picture. What is considered a good sample size is a bit of a gray area and partly depends on the population in question, but for most big surveys you see in the news (e.g. U.S. Senate) 1,000 responses is an ideal threshold.
Representative samples: Samples should reflect the population in question as much as possible. If a demographic category is over-accounted or under-accounted for, polling data could potentially be skewed. Demographic categories such as race, gender, and income often differ in opinions on candidates or political issues, and the differences could be underestimated or overestimated if the poll is not close to representative.
Margin of error: You will often see this statistic get the spotlight in the media when polling news comes out. Yet little context is given and the meaning isn’t discussed. The margin of error basically means that the numbers in a poll could go up or down in either direction by that number in the margin of error. For example, if Candidate A has 46% support in a poll where the margin of error is 3%, Candidate A’s support is highly likely to be somewhere within the 43%-49% range.
Why do I say “highly likely” though? Margin of error is closely related to another concept called confidence intervals. Without bogging this down in statistics too much: because we cannot get the complete picture of the entire population, a sample is inevitably going to have a little error no matter how big it is. The usual confidence interval used is 95%, but sometimes you will see 99% (rarely will you see other levels used). Taking our previous example, if we are using a 95% confidence interval, we are basically saying that we are 95% confident that Candidate A’s support is between 43% and 49%. It is highly unlikely that their true support is going to fall outside that range, but there is a very, very small chance it will.
Uses of Polling Data
Why polling data is important depends on who is doing the polling. Organizations such as Pew and Gallup do it expressly for understanding public opinion. They do not have political motivations, as it simply for research purposes.
Campaigns make extensive use of polls as well, and obviously they do have political motivations. Candidates for public office need to have as much information as they possibly can to know where they stand in a race. If they are behind in the polls, then a change in campaign strategy is likely necessary. Sometimes those polls will be released for the public strategically if it benefits the campaign.
Yet it’s mostly the well-funded campaigns, typically the statewide and federal-level ones, where you will see such polling. Why? Because polling gets quite expensive pretty easily. There are high-quality organizations expressly in the business of polling that pull in incredible amounts of money due to how vital polling is and how hard it is to make a truly well-done poll.
Major Missteps in Polling History
As odd as it might sound to say about something beginning in the early 20th century, polling isn’t that old of an industry. It has also been significantly refined over time, sometimes in response to major errors committed. A couple of such instances follow, along with some of the aftermath.
The 1936 Presidential Election
In the early 20th century, Literary Digest was an incredibly popular publication. It was a weekly magazine covering a wide variety of topics with news and opinion articles. Millions of Americans were subscribed.
It was also in the business of polling. Starting in 1916, Literary Digest had correctly predicted the presidential winner and had been close to predicting the margin of victory several times. So when 1936 rolled around, many readers expected it to be right again. Literary Digest predicted a victory for Alfred Landon, the challenger to then-incumbent President Franklin D. Roosevelt.
Except that’s not what happened. FDR soundly beat Landon 523 to 8 in electoral votes and 60.8% to 36.5% in the popular vote.¹ The significant error dealt a major blow to Literary’s Digest reputation.
How did this happen? For those who have played organized sports, here is an analogy. Let’s say you are in film study following weeks of wins where your team did not play as well as could be played. Regardless of what your coach says, your team doesn’t care all that much because, well, they won. And then the next opponent on the schedule dominates your team because your team didn’t take correcting previous mistakes seriously. It caught up to them in the end. Something similar is what happened to Literary Digest.
So what was this critical error on Literary Digest’s part? It was their sampling technique. The publication used phone and car records to sample responses, which at the time were very skewed toward the upper class. The upper class was very disapproving of FDR and his New Deal policies, which may raise the question of why they didn’t feel this way when FDR ran for his first term. The answer to that is that pretty much everyone hated then-incumbent Herbert Hoover, whom many blamed for the Great Depression. Prior to the Great Depression, America had experienced a great deal of prosperity during the “Roaring ‘20s,” where many people across economic classes were approving of the direction of the country. This helped mask the mistakes Literary Digest had been making in sampling, until the economic despair of the Great Depression changed that.
Yet not everyone was blindsided by the mistakes made by Literary Digest. As its reputation was tarnished and it was on its way out, newcomers to the polling game used their own methods to come up with more accurate polling. One such individual was George Gallup, the namesake of the Gallup public opinion research firm. Gallup was both able to accurately predict the election and point out the fatal flaws in Literary Digest’s methods. His success propelled him to the forefront of public opinion research.
“Dewey Defeats Truman”
But Gallup and others now on the polling scene, such as Elmo Roper and Archibald Crossley, would experience their own egregious errors in polling in the 1948 election. Polling indicated that Thomas E. Dewey was a heavy favorite to defeat incumbent Harry S. Truman, who had taken over for FDR when FDR died shortly into his fourth term. At the time, information did not circulate nearly as fast as it does nowadays. Newspapers like the Chicago Tribune had already assumed that Dewey would win, so they had already prepared the headline for the paper after the election.
Once again, polling turned out to be quite wrong. Election Day rolled around and produced a much different result than anticipated, with Truman winning the Electoral College 303 to 189 and the popular vote 49.5% to 45.1% (the third-party candidate, pro-segregationist Strom Thurmond, won 39 electoral votes and 2.4% of the popular vote).²
This led to the iconic photo of an elated Truman mockingly holding up the Chicago Tribune’s post-Election Day paper.
So what was the issue this time? Polling had ceased three weeks prior to the election. As Gallup said, “We had been lulled into thinking that not much changes in the last few weeks of the campaign.”³ In that time that polling stopped, there had been a significant change in voter preference that was missed.
While that was perhaps the biggest issue, there were two other major problems: the treatment of undecided voters and quota sampling. Pollsters treated undecided voters who leaned one way as if they were definitely going to vote that way, and they disregarded those who were more definitively undecided.⁴ Quota sampling involves getting a quota of a particular demographic in your sample (e.g. needing 500 female voters), but they were not randomly selected for the poll.
As great a debacle as it was, the silver lining was that the 1948 election results laid the foundation for modern polling techniques. Quota sampling was replaced by random sampling as the standard. Voters being undecided was much more appropriately accounted for going forward. Polling goes on up until Election Day. That’s just to name a few.
Polling started to truly come into its own, becoming an ever-bigger industry and an indispensable part of politics. It has had its ups and downs, but each time it has found a way to improve and refine itself.
The Challenges Facing Polling Today
But now there are some relatively new problems facing polling (along with some old ones).
One of the biggest problems is the very low response rates for traditional types of polling, namely phone surveys. A response rate is how many people respond out of how many people have been contacted. It is not unheard of to have as few as only 10% or 20% of people contacted to respond. This can potentially affect the representativeness of the sample. Since more people have to be contacted to make a large-enough sample, polling gets even more expensive. Pollsters have to be paid for the time they spend and how many people they contact.
It may not seem like a big deal at first glance, but the difference in cell phone users compared to those still with landline phones is another problem. The demographic differences for both tend to be pretty stark, and landline usage keeps decreasing. Pollsters have to account for this by having a certain number of both landline and cell phone users, keeping in mind how different the populations are.
There has been an increased number of Internet polls, many of which have been accurate. But this isn’t always the case, and representativeness issues are potentially more of a problem for the Internet. Although the Internet has become seemingly ubiquitous, there are still people who don’t use it or they don’t know how to use it well, such as some of the elderly. This has the potential to skew Internet polls toward the younger demographic, who tend to vote at lower rates than older people. There is also the issue of self-selection, as in most cases people can just pick themselves to participate instead of being directly contacted by a pollster to participate.
Some issues facing polling, however, aren’t because of the polls themselves, but their coverage. The media often only gives a cursory look at polls, which can lead to premature conclusions about the state of a race. To some extent, this is understandable. The media needs to make money, and to do so they need viewership, listeners, and readers. Getting bogged down in the details is likely going to turn away many people, as they won’t care and/or understand the really nitty-gritty aspects of polling. It isn’t necessary to go seriously in-depth, but at least giving some cue as to how there is more to the story than simply the surface level details provided by the media would be a step in the right direction.
I will readily admit this next point is pure opinion. I believe this surface level coverage partially contributes to the antagonism against polling. When only a few points, especially those that get sensationalized, are talked about and things turn out way differently, there will be a lack of understanding as to why polls were so wrong. It has often led to widespread criticism of polling, and some calls to even completely get rid of it or at least its coverage.
The media sometimes also has an issue with covering terrible or downright fake polls. A FiveThirtyEight article tells the story of one such fake poll, where a fake company made up numbers in a poll to say that Kid Rock would defeat incumbent U.S. Senator from Michigan Debbie Stabenow if an election between them were held on that day.⁵ Obviously the story will go into much more detail, but such a “finding” is bound to be picked up and circulated quickly if someone does not do their homework. Knowing who is doing the poll is a very important part of understanding polling.
If history is any indication, polling will find a way to overcome the issues it faces today. Pollsters are already employing their own ways to refine the profession. Time will tell if their efforts are successful.
- Gerhard Peters and John T. Woolley. The American Presidency Project. “1936 Presidential Election.” http://www.presidency.ucsb.edu/showelection.php?year=1936 (accessed September 26, 2018).
- Gerhard Peters and John T. Woolley. The American Presidency Project. “1948 Presidential Election.” http://www.presidency.ucsb.edu/showelection.php?year=1948 (accessed September 26, 2018).
- Will Lester. LA Times (via Associated Press). http://articles.latimes.com/1998/nov/01/news/mn-38174 (accessed September 26, 2018).
- Hans L. Zetterberg. 2004. “US Election 1948: The First Great Controversy about Polls, Media, and Social Science.” http://www.zetterberg.org/Lectures/l041115.htm (accessed September 27, 2018).
- Harry Enten. 2017. “Fake Polls Are A Real Problem.” https://fivethirtyeight.com/features/fake-polls-are-a-real-problem/ (accessed September 27, 2018).