|
|
Related links: |
Interpreting the Numbers. (under construction)
Statistical Fallacies. Arguments that draw conclusions from numerical evidence are subject to a variety of statistical fallacies. Many of these are a variant on the general argumentation fallacy of mistaking correlation as causation. Among the most common fallacies are:
In the U.S., states with the highest rates of political corruption tend have low rates of voter turnout (note: check the validity of the measure of political corruption). Does the low voter turnout foster greater corruption or vice versa? Similarly, countries with corrupt regimes tend to have higher poverty rates, but there is considerable debate over what causes what.
In the case of the relationship at left, the city of Chicago has a lower pass rate on the 8th grade math test than the state as a whole (note: Chicago is included in the state totals), but for every demographic group of students, Chicago outperforms the rest of the state. Here, Gerald Bracey describes a Simpson's paradox involving SAT scores. Cherry picking.Biased selection of social indicator data to support preconceived ideas is probably the most common statistical fallacy in public debate, a phenomenon fostered by the increasing availability of alternative social indicator measurements, especially annual time series data that permits the research to choose beginning and ending points for comparison. One brazen example of cherry picking was talk show host Bill O’Reilly’s attempt to argue that the Bush administration deserved credit for lowering the poverty rate. “The only fair comparison,” of poverty rates, O’Reilly insisted, “is halfway through Clinton's term, halfway through Bush's term”(Media Matters, 2005). O’Reilly was correct, sort of, but failed to see that a measure of the change in the poverty rate over the first four years of the presidents’ terms would tell an entirely different story. Edward Tufte is at his best in condemning the impact of cherry picking and related corruptions on our intellectual life. Cherry picking, he says, is “most serious threat to learning the truth from an evidence-based report (2006, 144). He cites evidence of the disproportionately high ratio of number of published studies that report relationship that are just barely statistically significant to the number that are barely insignificant. Most likely this is caused by researchers who fiddle with their regression equations until they get the results they want. For the most part cherry picking is an unintentional process: politicians and policy advocates readily embrace data that confirms their preconceived ideas and rigorously evidence that does not fit with their view of how the world works.
Throughout the 1970s and 80s, the FBI measure of violent crime, based on police reports of violent crime, increased, while the National Crime Victimization Survey, based on an annual survey of personal crime victimization, indicated that the rate was falling. The FBI measure is generally regarded as less reliable, however, as reporting of crimes to police has increased and police departments have improved their record keeping. Oddly, it is because the FBI data have become more reliable over time that time series analysis of the data has become suspect. Instrumentation can also affect conclusions drawn from cross sectional numerical comparisons. The high U.S. infant mortality rates, often cited as a product of the lack of universal health insurance, may also be at least partly due to the way the U.S. counts live births. Sampling error statistics and tests of statistical significance measure a form of measurement unreliability: the probability of error in the measurement of a single statistic and the likelihood that a numerical difference due to sample size. For the most part, however, error due to small sample is the least consequential aspect of measure error. Because most social indicator analysis does not include analysis of sampling error or statistical significance, two common statistical fallacies are avoided: concluding from a significant relationship that a meaningful relationship exists and concluding from an insignificant relationship that no relationship exists.
In 2006, Chicago public school officials trumpeted the apparent gain in student scores on the annual ISAT tests mandated by the No Child Left Behind law. The state pass rate on the exams had increased 8 points –most probably because of revisions to the test and changes passing score for the 8th grade math tests –while the Chicago pass rate increased 14 percentage points. But is an increase from a 48 to a 62% pass rate a bigger change than an increase from 69 to 77%? Consider a more extreme example: would a student who increased his test score from 40 to 64 be improving at a faster rate than one who increased his score from 91 to 99? One way of testing for the rate of change fallacy is to take the inverse of the data: using the failure rate rather than the pass rate. In the case of the Illinois data, both state and the city have seen similar percentage declines in their students’ rate of failure.
The report is also guilty of cherry picking. Note the oddity of choosing fourth grade reading and eighth grade math. The National Assessment of Educational Progress tests reading and math in both grades and has other subject matter tests for these grades and grade 12. Of ten possible subject-grade comparisons, none show as a great a discrepancy in decile score change as do the two comparisons selected.
Sociologist William S. Robinson coined the term in a 1950 article in which he observed that states with the highest rates of foreign born population also had the highest literacy rates, even though the foreign born had lower literacy rates than the native born population.
When voter turnout, traditionally measured as the percentage of the voting age population that votes, fell below 50 percent, in the 1996 election political commentators blamed turned-off voters, partisan politics (the only kind of politics), negative campaigning and the rise of conservative talk radio. In 2001, however, political scientist Michael McDonald compiled new data suggesting that the talk about the vanishing American voter was “a myth” (2001, 963). McDonald’s analysis called attention to the denominator in the voting turnout statistic: voting age population, and argued that we should instead use the voting-eligible population. Over recent elections, an increasing percentage of the American voting age population has not been eligible to vote. Mostly this is because of increasing immigration: both legal and non-legal non citizen residents are counted in the Census Bureau voting age population figures. In addition, in all but two states, prisoners are not allowed to vote and in 12 states even ex-felons are disenfranchised. Because the percentage of the American population that either is incarcerated or has ex-felon status has gone up dramatically since the 1980s, an increasing percentage of the voting age population cannot vote. Taking the votes cast as a percentage of the voting eligible population (the “VC/VEP” trend in figure 4.6) as our measure of turnout, we see no general decline in voter turnout since 1972, when 18 years olds were given the franchise. So which is the better measure of voter turnout? If you look at the voter turnout as a measure of how democratic a society is, the traditional voting-age numbers have greater validity. Although voting-eligible turnout is increasing and at a long time high, this is true because so many young black males (unlikely voters to begin with) have been put in jail and, in many states, denied the right to vote for the rest of their lives and because so many of our nation’s poor are not citizens. If all young voters (the age group least likely to vote) were incarcerated and all the poor (the economic group least likely to vote) were declared non-citizens, the American voter turnout rate would be among the highest in the world, but the United States would not be a more democratic society.
In 1983, the “A Nation at Risk” report began with disturbing evidence of the weak performance of American students on international academic achievement tests (National Commission). Among the studies cited, American high school seniors recorded the lowest grades on the First International Science Study, but at least part of the reason for the low U.S. scores had to do with the relatively high rates of U.S. students completing high school (Medrich and Griffith,1992). In some situations there may also be the reverse population mortality effect. Although black 4th grade and 8th grade reading scores have improved in recent years, 12th grade scores for black students have not. Part of the reason the 12th grade scores have not gone up, however, may be due to the decline in the black high school dropout rate (Klass, 2008, 108).
Advocates of the “Laffer Curve” –the idea that cutting taxes will increase government revenue –often cite the beneficial impact of the Reagan administration tax cuts that were partially implemented in the 1982 fiscal year and fully implemented in 1983. Heritage Foundation economist Daniel Mitchell (2003) argues the point: “Once the economy received an unambiguous tax cut in January 1983, income tax revenues climbed dramatically, increasing by more than …28 percent after adjusting for inflation.” Note, however, that Mitchell begins his calculation in 1983, near the bottom of the Reagan recession. Mitchell is also guilty of cherry picking. Instead of a before-and-after measurement, he has done an after-and- long-after measurement – ignoring the two years of income tax revenue reductions that took place while the tax cuts were at least partially in effect and the higher rates of revenue growth that took place in the Carter years.
If indeed the price of oil was driven up by a speculative bubble, it was a bubble that would eventually burst on its own. At the time Senate hearings were held in June, it was just a matter of time whether a short term correction in the bubble would be credited to the hearings or the actual enactment of the bill. In truth speculators are to blame for the price rise. They are speculating that Congress will do nothing to increase the supply or reduce the demand for energy and that the Congress’s budget deficits will continue to increase, driving down the value of the dollar. As Congress spends its time passing legislation such as this, it is wise speculation.
It seemed to be a good time to buy a home in 2006 (and in 2000). Adjustable no-interest, no down payment, loans were cheap and with rising home prices the prospects of getting another loan before the rate-adjustment kicked in were good. Some, but not enough, of those who profited off the expectation that the trend would continue upward are now on their way to jail. For the one most interesting academic debates about allegedly dubious trend forecasting see Julian Simon’s critique of the Club of Rome’s 1972 Limits to Growth report that forecast the exhaustion of much of the world’s resources and an ensuing worldwide economic crisis in the 1980s. Or read about Simon’s wager with Paul Erhlich, author of The Population Bomb. Unfortunately, discerning which trends are merely speculative bubbles soon to be corrected by market forces, and which are represent inexorable forces is no easy task. The upward trend in university tuition in a latter example is of the inexorable variety.
Note: read about
Simon’s wager with Paul Erhlich, (environmentalists'
interpretation |
economists' interpretation) author of The Population Bomb.
In 2004, A Wall Street Journal editorial, criticized the Clinton administration for cuts in the defense budget: “Bill Clinton and a GOP Congress balanced the budget by withdrawing a "peace dividend" at a time when al Qaeda was declaring war” (2004). Their evidence was a chart showing the declines in defense spending as a percent of GDP (Note the WSJ time series began in 1990, the ignoring earlier declines in the measure). From fiscal year 1993, the budget year before Clinton took office, to 2001 (the year before the 9/11 increases) the defense spending fell from 4.4% of GDP to 3 %, a dramatic and steady decline. Much of the decline, however, was due less to cuts in military spending (already underway after the end of the Cold War) and more to the dramatic economic growth and the increase in GDP in the Clinton years. In real dollars (adjust for the GDP price deflator), military spending actually increased in Clinton’s second term, a dramatic turnaround from the post Cold War decline that began in the first Bush administration. Similarly, throughout the 1990s Ireland report dramatic reductions in most categories of governmental revenues and outlays as a percent of GDP, largely due to the dramatic improvement in the country’s GDP. A general inattention to denominators, other than the basic per capita measure, is the cause for much statistical misinformation. The most commonly used adjustment for inflation, the consumer price index, overestimates inflation by an estimated one percent per year (BLS, 2007). As a result, measures of income and price growth are underestimated (most monetary measures –including income, spending, and prices are actually growing faster than the inflation adjusted measure would indicate). Because the CPI results in an artificially higher poverty threshold, the overestimates has the opposite effect on poverty, making it appear that poverty is higher over time.
A quick glance at the chart on the left would lead one to conclude that private university tuition and fees, which rose from a little over $3,800 in 1980 to almost $27,000 in 2006, is increasing at a much faster rate than public university charges (rising from $840 to $6,400). The chart illustrates a common graphical scaling distortion, similar in some respects to the rate of change fallacy. In fact, public universities are increasing their tuition at a slightly faster rate and both at rates are going up as faster than health care costs or gasoline prices (Gasoline hit $1 a gallon in 1980, had it increased at the same rate as tuition it would now be over $6 a gallon). Although much of the writing on graphic design focuses on distorted presentations of data, more often, poor graphical display and poor tabular construction work to hide critical relationships. De Moivre's paradox Interpreting the Numbers: |