On Generalizations of Conclusions Across Different Data Sets

7 min readSep 13, 2018

Everybody hates Congress.

While that is a bit of an exaggeration, you’d certainly be hard-pressed to find wide support for Congress these days. Gallup has been tracking the public’s approval of Congress since the 1970s. Their most recent poll, August 2018, shows a meager 17% approval rating and a staggering 78% disapproval rating.¹ With the exception of the late 1990s and early 2000s, the general public’s opinion of Congress has been lukewarm at best and outright opposed at worst. One of the few things that a politically divided public can agree on is how they view congressional performance.

At the same time, a substantial proportion of members of Congress get re-elected. The incumbency advantage is an oft-repeated talking point in political analysis, as most incumbents get re-elected. Even for incumbents that aren’t very well-liked, being the current holder of an office can still be a boost. Sometimes it is a matter of name recognition, where a challenger is little-known or the incumbent is significantly more recognizable. Sometimes it is a matter of funding. Sometimes it is because the incumbent is well-connected. There are plenty of other potential reasons for why they stay on.

Yet it still may seem strange (or downright frustrating, depending on your point of view) that so many members of Congress still retain their seats election after election despite such widespread disapproval. What gives?

Enter the work of Richard Fenno.

Fenno is a long-time political scholar and analyst, particularly well-regarded for his scholarly work on congressional studies.² His most famous work is arguably Home Style: House Members in Their Districts, where he traveled with members of Congress between D.C. and their homes in their districts. Fenno noted many differences between how these officeholders acted while on Capitol Hill and when they were home interacting with their constituents.

Perhaps the most notable observation he made is the disparity between these two views: how the public sees the institution of Congress and how they see their particular representative in it. The public tends to have a significantly higher opinion of who represents them than Congress as a whole. Put another way, everybody hates Congress but loves their Congress member.

This observation has since become known as “Fenno’s Paradox.” While we should keep in mind the time at which he made this discovery — the public held a somewhat more approving, though still somewhat poor, view of Congress compared to today, according to the aforementioned Gallup statistics— Fenno’s work is arguably even more applicable today given Congress’s current abysmal rating. Every so often, Fenno’s Paradox can be seen cropping back up in the news. Incumbents still often enjoy an advantage in elections despite greater dissatisfaction with the overall Congress.

Fenno’s Paradox provides a great case study for the overarching theme of this article: issues in generalizing data across different levels or in different cases. Those who have had some experience in social science analysis will likely know about differentiating between aggregate level data versus individual level data. For those who haven’t, it is basically the idea that what is true for a group isn’t necessarily the same for when you break it down to individuals within the group, or subgroups within the overall group.

Think about how it could be an issue to assume that the poor rating of Congress meant poor ratings for most members of Congress individually. That completely runs counter to Fenno’s Paradox, as many members of Congress have high ratings (or at least good enough to get them reelected). Incumbents would not be getting re-elected at the rates that they do if they were often as poorly viewed as the collective body of Congress. This would be an issue in generalization of data — that is, assuming what is true for one set or one level of data to be true of another, or true to the same degree.

Similarly, applying an observation for one set of data to another set that may appear to be similar can be problematic. A good example to illustrate what I mean would be to compare primary and general elections. Turnout in many primaries this year has noticeably increased. Does that mean that general election turnout will increase too?

Not necessarily. It is certainly a possibility, but we would be getting ahead of ourselves if we assumed this was the case. Much of the media speculation, however, has entertained the possibility almost as if it will definitively happen. Primary turnout could have just increased because voters who normally only vote in general elections were engaged enough that they felt they should participate in the primary — but it won’t necessarily bring in new voters for the general election. It could also just be that a primary was particularly competitive. Maybe certain aspects about the candidates themselves led to more people feeling the need to go vote earlier on in the overall election cycle than they normally would. Or, as many pundits and political observers assume, maybe Trump really is driving more people to the polls than normal.

This is an example of assuming what is true for one set of data to be the same case for another — i.e., that higher primary turnout means higher general turnout. But we just don’t know for certain.

Another example for this discussion is something I touched on in my article about a potential blue wave: each individual race is unique and they don’t always compare well to the national context. In fact, many of them don’t. We may assume that a “blue wave” will happen, but at the end of the day, Democrats need to win specific races to make that a reality. Even if Democratic enthusiasm and participation will increase nationally, it won’t have much of an effect on Congress if that participation is diffused across too many districts. What happens at the national level doesn’t necessarily translate to the state or local level — each party can only win so many seats. It’s almost guaranteed that Democrats will flip at least a few seats with an increase in Democratic turnout, but will it be widespread enough to take back control of one or both chambers of Congress? Maybe. Maybe not.

It is a similar case for specific issues. What is a prominent or salient issue at the national level won’t always be so at the state and local level. Let’s take gun control, for example. The Parkland shootings were the major impetus for pushing the gun control issue to the forefront of national politics. Is it going to be of the same magnitude for each district? It likely will be for the Florida state and congressional districts that have Marjorie Stoneman Douglas High School in them. It may be the same case for the Colorado state and congressional districts that have Columbine High School in them, or for any other districts that have had schools or universities that were the sites of mass shootings. But for some districts gun control won’t be even remotely considered to be of the same importance, as other issues will take its place.

Generalizations of data can potentially lead to poor analysis by academics, poor strategy for political practitioners, and poor coverage by media. The academic may make incorrect assumptions about data, and draw wrong conclusions that negatively affect understanding of a given topic. The practitioner may make the wrong move in a race that costs their candidate that they work for an election, or at least contributes to the loss. The media may feed the wrong information to viewers/readers/listeners that will have a negative impact on understanding what is going on in a given political context.

However, there will be some cases where generalizations across data sets won’t be entirely unwarranted. In some cases, it may be acceptable to an extent because it is the best we can do. A good general rule of thumb, though, is to avoid generalizations wherever possible, or at the very least include caveats when talking about them.

An example of what I am talking about here is turnout by independents versus Democrats and Republicans. I’ve mentioned in my article on the difficulties for independents in winning elections that independents generally turn out to vote at lower rates than voters affiliated with either major party. That is a very useful heuristic when studying politics or trying to win a race. While some caveats apply — such as the fact that independent turnout will fluctuate from state to state — knowing that independents still are generally harder to get out to vote is inherently going to influence how you go about analysis or campaigning. The academic who is studying the effects of, say, the effects of advertising knows that there could very well be a major difference in how it affects partisan voters versus independent voters since the latter tend to participate in politics less. The practitioner that wants to win a race for, say, an independent candidate knows that they will likely have to rely on “soft” Republicans and/or Democrats to offset the disadvantage in independent turnout.

Generalizing data isn’t always going to be an issue, but it has been in many cases. Sometimes it will be the best we can do due to limits on time or resources for further research. Yet knowing the potential pitfalls that come with it makes us less susceptible to its negative effects on how we understand a given topic.

Gallup. “Congress and the Public.” https://news.gallup.com/poll/1600/congress-public.aspx
A list of some of his works is available on the following source, along with a brief biography: National Archives. “Oral Histories and Interviews: Fenno Biographical Note.” https://www.archives.gov/legislative/research/special-collections/oral-history/fenno/biographical-note.html

On Generalizations of Conclusions Across Different Data Sets

Written by Paul Rader