Presenting Data: Tabular and graphic display of social indicators
 
Gary Klass
Illinois State University
© 2002

Note: The website will be discontinued shortly, to be replaced by the Just Plain Data Analysis site

Home Good Tables Good Charts Divide Analyzing Budgets Election Data Education Poverty References Course page Chart of the week your comments
Constructing Good Tables.

Principles of Tabular Display (powerpoint)

Introduction

Numerical tables generally serve one of two purposes.  Just about all of the numerical data included in this book was originally found in “look up” tables, databases or spreadsheets compiled by government statistical agencies or nongovernmental organizations.  The general and limited purpose of these database tabulations is to present all the numerical information available that might be relevant to a wide variety of data users.  For the most part, the textual discussion accompanying these tabulations serves only to describe how the data were obtained and to define what the numbers mean.  The analytical tables contained in social science research, however, serve a different purpose: the presentation of numerical evidence relevant to support specific conclusions contained in the text.  To serve this purpose much care must be given to the selection of the data and to the design of the table. 

Research reports and analyses based on numerical information should accommodate two different audiences: those who read the text and tend to ignore the data presented in tables and charts and those who skim the text and grasp the main ideas from the data presentation.  To serve the later audience, tables should be self-explanatory, conveying the critical ideas contained in the data without relying on the text to explain what the numbers mean. When done well, the tables will complement the textual discussion and the text will provide a general summary of the most important ideas to be derived from the data – without repeating each and every number, or many of the numbers, contained in the table. 

There are three general characteristics of a good tabular display.  The table should present meaningful data.  The data should be unambiguous.   The table should convey ideas about the data efficiently.

Whether or not the data are meaningful has to do with how closely the data relate to the main points that you are trying to make in your analysis or report.  The data and the relationships among the data contained in a table constitute the premises, or evidence, offered to support a conclusion. Ideally this should be an important conclusion, an essential part of the analysis you are making.  

Whether or not the information presented in a table is unambiguous depends largely on the descriptive text contained in the titles, headings, and notes.  The text should clearly and precisely define each number in the table. The titles, headings and footnotes should convey the general purpose of the table, explain coding, scaling and definition of the variables, and define relevant terms or abbreviations

An efficient tabular display will allow a reader to quickly discern the purpose and import of the data and to draw a variety of interesting conclusions from a large amount of information.  How quickly a reader can digest the information presented, discern the critical relationships among the data, and draw meaningful conclusions depends on how well the table is formatted.

Presenting Meaningful Data

Report the most meaningful data, data that measure something important about the cases you are analyzing.   Knowing just which data are meaningful and which are not requires an understanding both of the specific subject matter you are writing about and a good understanding of where the data come from and how they are collected.  When reporting social indicator data, whether data are meaningful or not depends on the appropriate counts, divisors and comparisons that will be represented in the table.[1]

The Count.  Most social indicators (measures of government spending being an exception) are based on counts derived from either survey questions or agency records.  In the case of US infant mortality data, for example, the “counts” of infant deaths are obtained from enumerations of death certificates (the divisors, from birth certificates).  In the case of unemployment data, the counts of the unemployed are derived from monthly random sample surveys involving several survey questions concerning the respondents’ employment status.  Some measures of voter turnout are based on counts of the votes cast in elections; others are based on counts derived from post election surveys –counts of respondents who claimed they have voted (for more than one reason, the later counts are higher).   Poverty rates may be based on counts of the number of persons living in poor families or counts of the number of poor families.  FBI crime rate data are based on crimes reported to local police departments; crime “victimization” data are based on respondents’ reports of crimes in household surveys.  Interpreting the social indicator data often requires a good understanding of the actual survey questions and definitions used in determining the counts.

The Divisors.  Although tabulations of raw count data are sometimes useful, statistics based on rates, ratios, and per capita measures are usually more meaningful than aggregate totals.  The conclusions drawn from a tabulation depend on the numerical comparisons represented in the table; in most instances, comparing the number of murders in Chicago and New York to the number in the number in San Antonio and San Diego (as in table 1a) does not support a meaningful conclusion.

Table 1. Murder counts vs. murder rates

 

For any given social indicator count – such as health care expenditures shown in table 2 – a variety of social indicators can be constructed using different numerators and denominators.    In table 2, measuring health care expenditures as a per cent of GDP is a standard method of adjusting the data for differences in countries’ currencies and the size of the national economy.  Health care expenditures are reported per capita, to adjust for the size of the countries’ populations; in US dollars, to adjust for differences in currency, and are weighted by OECD purchasing power parity index, to adjust for differences in prices.

Table 2: Various measures of health care finance

 

Common divisors used in the construction of social indicators are population (e.g. Murders per 100,000 population), gross domestic product (military expenditures as a percent of GDP) and median family income (university tuition and fees as a percent of median family income).  Other indicators use divisors tailored for specific counts. Highway fatalities are often measured per 100 million vehicle miles traveled, but also per 100,000 licensed drivers, per 100,000 vehicles registered and per 100,000 population.  Abortion rates measure the number of abortion per 1,000 women.  Abortion ratios measure the number of abortions per 1,000 live births.  A common mistake is to confuse raw counts reported in hundreds or thousands with rates: The number of  highway fatalities in thousands is not the same thing as the number of fatalities per thousand of population.

The comparisons.  The kinds of conclusions that can be drawn from a table depend on the comparisons and relationships that can be made among the data.  The more meaningful comparisons that a table permits; the more specific the conclusions can be drawn. 

Time.   When the data are available, tabulations that allow for comparisons across time usually say more about what is going on than those that do not.

Table 2. Two time points are better than one

 

Unless you are analyzing the before and after effect of something that happened at a single point in time, a one year difference in data often does not provide for meaningful comparisons. One-year changes in the city murder rates are subject to random fluctuations: 3, 5, and 10 year change intervals provide for a more reliable analysis.   An exception to this rule is budgetary data, where the analysis of annual changes and comparisons of past, current and forthcoming fiscal years is often crucial for agencies living on annual appropriations. 

When comparing two year’s of data, one should beware of arbitrary selections of a base year.  If the data in table 2 where presented to make a case for the effectiveness of the Los Angeles police force, one would have to check to see whether the 1995 data were the result of an unusually high jump in the city's murder rate.

Particularly with data that show substantial year-to-year variation (for example, murder rates in relatively small states or cities), calculating average score over three or five year intervals before and after a significant policy change (say, before and after the imposition of the death penalty) will prevent making too much out of random data fluctuations.

When reporting monetary data at more than one point in time, it is often best to report data adjusted for inflation (i.e., in constant dollars rather than current dollars) or with a monetary divisor that serves the similar purpose, such as GDP or median family income Note also that a repeated divisors serves no purpose: per capita expenditures as a percent of GDP-per-capita is the same thing as expenditures as a percent of GDP.

Controlling for spurious relationships.    Often the most important tabulations in a research report are those the control for various factors that might account for, or elaborate, an observed relationship.  Africans Americans are less likely to vote than white Americans, but when one compares voters with equal age, education and income, the voter turnout rates are very similar.    One school district may have lower math scores than another, but for students of similar family background it may actually be doing better. 

In table 3, we begin with a base relationship indicating that women’s earnings were only 70% of men’s.  For some, this measure of the disparity in earnings might be sufficient to support general conclusions about gender discrimination, but others might insist that age, education and a wide range of other factors would have to be taken into account.  As we further examine table 3, we see that the relationship is more complex: the earnings gap is lower for younger women and for less educated women.

 

Table 3: Controlling for age and education

 

A full understanding of the relationship between age, education, gender and earnings would require an even more elaborate breakdown, especially when we consider that the differences in education between men and women vary considerable by age. Although women’s earnings in the youngest age group 88% of men’s earnings, women in that age group are actually more likely to have earned college degrees than men.

Note that the 70% gender disparity would most certainly have been greater if the data were not restricted to “full time, year round” workers, as women are more likely than men to work part time.  

Presenting Unambiguous Data

The table titles, column headings and footnotes should precisely define what each data point in the table means.  When rates or ratios are reported, both the numerator and denominator should be clearly defined. Pay particular attention to whether the statistics are reported in hundreds, thousands or millions.  The amount of detail given to defining the data does depend on the audience.  In a paper written for economists, it would not be necessary to define terms like GDP (gross domestic product), unemployment rate (the percentage of the labor force seeking work), or a GINI index (most often, a measure of the inequality of an income distribution); for other audiences more detail may need to be provided.

A complete table (or chart) title fully defines the three components of the social indicator in the table: the Count, the Divisor, and the Comparisons, as in the following examples:

Public and Private Health Care Expenditures, OECD nations:
(% of GDP)

US Public Health Care Expenditures, Per Capita: 1975-2004
(constant 1999 dollars)

Murder Rates in Wealthy Nations, 1999
(Homicides per 100,000 population)

State Voter Turnout Rates, Presidential Elections: 1992-2004
(Votes cast\voting age population)

Percentages.   Percentages can usually be calculated in at least two different ways and are often a source of confusion.  Consider the difference between the two tables in table 4.  In table 4a, we see that 14% of poor families are headed by a householder under 24 years of age; in 4b, 31% of families headed by a householder under 24 years old are poor.  Although the table title clearly defines the difference, showing the 100% total and the "all families" rate helps convey the correct interpretation more quickly to the reader.   In general "composition" or "distribution" percentages (as in table 4a) are less meaningful than the rate statistics, shown in table 4b.  This is especially true when the categories are arbitrary: in table 4a, the 18 to 24 category has a six year range while others have a ten year range.

Table 4. Distribution percentages versus rates

 

Change in percentages.  Calculating changes in percentages, rates and ratios is also a source of confusion both in tabular display and in textual summaries of data.  The following is an example of a poorly defined data:

Table 5.  Poorly defined data

 

Consider the ambiguity of these data by trying to answer the following questions:

  • Does the teenage birth rate measure the percentage of all babies who were born to teenage mothers or the percentage of teenage mothers who gave birth?
     

  • Would an increase in the teenage birth rate from 20 to 26.7 be a 6.7% change or a 3.3% change?

In table 2 (above), change is reported as a net change, but it would have been possible to show the change as a percentage change.  The change in San Diego's murder rate from 7.9 to 3.5 (murders per 100,000 population) can be reported either as a 4.4 net decline in the murder rate or as a 56% drop in the rate.

A general theme of Andrew Hacker’s 1995 book, Two Nations, a widely respected analysis of race in American society, is that racism is the underlining cause of the worsening disparities social and economic affecting black America.  The book counters the claims of conservatives (such as in Charles Murray’s 1980 book, Losing Ground) that liberal social policies and the rise in black single parent families are to blame for the conditions in black America.  Throughout the book, Hacker includes a number of tables, similar to table 6, containing measures of black and white social conditions and, as a measure of the disparity between the two races, a “Black Multiple”, in this case measuring the ratio of black to white out-of-wedlock birth rates.

Table 6: Comparing changes in rates
source: Andrew Hacker (1995), 87.


From these data, Hacker concludes that “even though the number of births to unwed black women has ascended to an all-time high, white births outside of marriage have been climbing at an even faster rate” (86).  He doesn’t say it, but the implication is that the rise in single parent families should not be seen as a black problem, but as a general societal phenomenon.

The problem with Hacker’s conclusion is that it depends on what you mean by “climbing at a faster rate”.  It is true that the 1992 white rate is more than 10 times higher than the 1950 rate, while the 1992 black rate is only 3 times higher.  On the other hand, the white 1992 rate represents a net increase of only 17% since 1950, while the black 1992 rate is a net increase of almost 50%.  If the black out-of-wedlock birth rate had risen to 100%, Hacker’s analysis would still conclude that the white rate was climbing faster.

Another way of looking at this data, shown in table 7, is to consider the in-wedlock birth rate instead of the out-of-wedlock birth rate.

Table 7: The reciprocal of Hacker’s data


Had Hacker used these data, the reciprocal of his own numbers, he would have had to conclude that black births inside of marriage are falling at an even faster rate than white births inside marriage.  Hacker’s conclusion isn’t wrong so much as it is incomplete and misleading.  

Presenting Data Efficiently.

The measure of a tabulations' efficiency is the number of meaningful comparisons that can readily be drawn from the data presentation.  Efficiency is often a matter of balance: more data allows for more comparisons, but too much data can obscure meaningful comparison.  A properly formatted table, allows the reader to quickly draw the right conclusion.

Sorting.  Sort data by the most meaningful variable.  The “look-up” tables of most reference a sources generally list data for geographic units (countries, states, or cities) alphabetically. If you are using a table to make a point, the reader will almost always discern the point more quickly is the data are sorted on the most meaningful variable.  The alphabet is almost never the most meaningful variable.  Note how with the sorted data on the right hand side of table 8 the reader can immediately figure out with countries have there youth watching the most TV and the least TV and that Italy is the median country.

Table 8. Sort data on the most meaningful variable


To fully appreciate the advantages of sorting, consider table 9.

Table 9. The alphabet is not a meaningful variable


In the case of tables that present two years of data, such as table 2 (above), it is best to sort the data on the base year as this allows for a quicker assessment of which cases have changed the most.  In table 10, the countries are sorted on all three numerical variables in order to highlight the relative position of the United States.

Table 10:  Data sorted on more than one variable


Decimal places and rounding.
  For most purposes, limit the number of decimal places to what are needed to display the data to two or three significant digits.  It is usually not necessary to include dollar signs or percentage signs next to the numbers in a table, although this is sometimes done for the first number in a column.

Table 11.  Decimals and rounding


Howard Wainer, a leading authority on data presentation, insists that there is no reason to display more than two significant digits in most tabular displays.  He would, therefore, eliminate the decimal points in table 11 and round off the family income data to 49,000, 53,000 and 30,000 and 29,000.  Presumably, he would have Major League Baseball record the Cubs' winning percentage as 49 percent rather than the .486 proportion.  Hank Aaron's record of 755 home runs could be rounded to 760.

I think Wainer goes too far.  It's true that readers will look at the income data in table 8 and, in their minds, round off to thousands.  And the income data are based on estimates that make any conclusion based on differences of less than a hundred dollars practically meaningless.   Percentages are usually fine without decimal points.  But there are exceptions.  In recent years, the US poverty rate has ranged from 16 to 14 percent.  Reporting these rates without decimals might obscure many significant changes.  Reporting Major League batting averages with just 2 digits would result in many ties and fail to distinguish important differences.

Social scientists using correlation and regression analyses commonly display numbers with too many decimal places -- presumably to add an aura of scientific precision.  They also report far too many statistics in their tabulations. Again, the purpose seems to be to impress rather than explain and the effect is to obscure the most important data in the tables.  The is no need for any correlation coefficient, R-Square, or standardized regression coefficient to be displayed with more than two decimal places.

Defining rows and columns.  As a general rule, similar data ought to be presented in the columns.  Mixing data of different types in the same column is disorienting, as we see in table 12 and in the Welcome to Farmer City sign, below.

Table 12: Poor placement of cases and variables in rows and columns

 

Figure 1  Readers expect columns to add up to totals


Time:
 In tables where the time points define the columns, display years in adjacent columns, from left to right.  Where the time points are in a column, sort so that the most recent year is at the bottom  (see table 13).

Time series trend data of more than five time points is generally better displayed in a time series chart than in a table.  Times series charts convey trends more efficiently than tables, but with some loss of accuracy. 


Table 13.  Years sorted in the rows or columns


The professional education journal Phi Delta Kappan sponsors an annual poll of public attitudes concerning the nation's public schools (Rose and Gallup 2002).  Every year in numerous tables, their polling report displays data tables with the years backward, with the most recent year's data in the first column on the left (table 14).  Notice how difficult it is to discern whether the trend is increasing or decreasing:

Table 14. A backward table


The same principle applies in the case of other ordered categories such as age groups, years of education (or educational attainment), temperature ranges, height or weight: the categories representing the largest magnitudes should generally appear on the right or at the bottom of the table.

Consistency:  When a paper or report contains more than one table, the formatting ought to be consistent across tables: same fonts, same heading style, and same borders.  If the four branches of the armed services are displayed as they are in table 12, they ought to be sorted in the same order (despite the sorting rule, above) if the same categories are used in another table. (Note: to some extent, this rule is not followed in this report in order to show alternative formats).

Combining tables.  While cramming too much data and too many different kinds of data into a single table should be avoided, you should also look for opportunities to combine several tables into one.

Table 15. Efficient presentation of survey data


Table 15 is derived from Christina Hoff Sommers' War Against Boys nicely summarizes in a single table what could have been presented in six.  The basic format used here is ideal for presenting crosstabular survey data when a single variable is crosstabulated against several others. Sommers uses these data to make two points.  The first is that teachers favor girls over boys.  The second more subtle point is conveyed in the title: that the American Association of University Women who conducted the original survey (and who sponsored a report arguing that girls are ignored by teachers) suppressed the release of these data.   A less argumentative title for the table might have been, "Boys and Girls Perceptions of Teachers' Gender Partiality."

If the Sommers' table were included in an article in a social science journal it would no doubt have also included measures of statistical association and levels of significance for each of the 6 crosstabulations.  None of these numbers, however, would add anything to the evidence contained in the table and would serve only to impair a quick interpretation of what is going in with the data.

Table 16  illustrates the same principle, but with the demographic variables defining the rows of table.  These segmented tabulations provide for a very efficient presentation of the data.  Additional demographic categories do not complicate the data display, while adding a variety of interesting comparisons that would not be as easy to make if one used several tables.  See, for example, that gender is not as strong a determinant of support for Senator Clinton as is age or race.

 

Table 16. Presentation of survey data.
SOURCE: California Field Poll, 3/10/2006 http://field.com/fieldpollonline/subscribers/RLS2186.pdf


Highlighting comparisons. 
The purpose of properly sorting the data, correctly arranging the rows and columns, combining what could be multiple tables into one and other efficiency rules is to allow the reader to quickly grasp the most meaningful comparisons that the data allow. 

Table 17: Highlighting the important comparisons


Table 17 contains data on income mobility that were originally presented in two tables in an article by Katharine Bradbury and Jane Katz.  Their basic point is that the poor were somewhat more likely to escape poverty in the 1970s than they are in the 1990s, while the rich are more likely to remain rich.  The crucial evidence is in the diagonals of the tabulations and putting those data in bold allows the reader to discern the point more quickly. 

Borders:  A common and simple table format is used in most of the tables on these pages.  It includes a thin straight border under the title and heading cells and under the main body of data.  There is usually no need for vertical borders.  Often, the title is in bold.  Putting the headings in bold is advised only if they are very short headings, and not if it is inconsistent with the format for other tables in the report.  The tables include only horizontal lines; partly this is due to MLA style guidelines that were originally designed for manuscripts prepared with manual typewriters. 

MLA and APA style guidelines recommend that table titles be italicized (one of the few recent acknowledgements that manual typewriters are no longer in use) and aligned to the left with the text underlined and that the table number be placed (again aligned to the left) above the title.  These style recommendations, however, are for papers that are not in final form, i.e., manuscripts that will later be formatted by a publisher.   The MLA style guides also specify that tables (and the text of manuscripts) be double spaced and that the tables be placed at the end of the manuscript; this is for the convenience of manuscript proofreaders and not for readers.


References

Rose, Lowell C.  and Alec M. Gallup (2002). The 34th Annual Phi Delta Kappa/Gallup Poll of the Public's Attitudes Toward the Public Schools http://www.pdkintl.org/kappan/k0209pol.htm

Sommers, Chrstina Hoff (2000). The War Against Boys: How Misguided Feminism is Harming our Young Men (New York: Simon and Schuster, 2000)

also: (Field poll)

On tabular design:

Miller, Jane E. 2004. “Creating Effective Tables,” The Chicago Guide to Writing about Numbers,  (University of Chicago Press). Chapter 6.

Cuzzort, R. P.  and James S. Vrettos.  "Fundamentals: the Art of Tabular Design," Elementary Forms of Statistical Reasoning, (New York: St. Martin's Press, 1996), chapter 4.

Wainer, Howard. Improving tabular display: with NAEP tables as example and inspirations. Journal of Educational and Behavioral Statistics 22(1997), 1-30.


 

[1] The Count – Divide –Compare (C-D-C) framework is an acronym developed by epidemiologists at the Center for Disease Control.

 


Formatting tables with Microsoft Excel:

practice exercise with instructions.

Tips on using Excel.

  • Let Excel align your data.

    • Merge and word wrap cells for table title
      Format | Cells | alignment  * wrap text  * merge cells 
       

    • Center column headings, center data under headings.
       

    • To center data on the decimal point, use a custom format.
      Format | Cells | Number * custom
      examples:

      • in table 9 (above) the "Army" column is custom formatted to: ?,000

      • in table 6 (above) the custom format is ?0.0

(To center and align data in an html table, use a courier font and spaces.)

  • By default, Excel aligns the contents of cells on the bottom of the cell.  Changing the alignment so that the data is centered vertically is better, particularly with cells that have borders.
    use: Format | Cells | alignment  *vertical   *center
     

  • Avoid fancy fonts: I recommend that a consistent 10pt font for the table, bold for the titles and headings, perhaps an 8pt font for the source.  For tables that are to be printed in a paper, do everything in black and white. 
     

  • For tables that are used in PowerPoint or overhead presentations, use simpler tables with large bold fonts.  PowerPoint tables should generally have fewer than 24 data points.
     

  • Web page tables:  Most of the tables displayed here were constructed using MS Excel, using shift-Edit | Copy picture and pasted into a single-cell table in MS Front Page.  Tables 1, 4 and 7 are standard HTML tables.  Note the difficulties with HTML tables:
     

    • the font size is set by the browser that is viewing the page, headings, in particular, may display differently with different browser and different users.
       

    • there is very little control over the use of borders in an HTML table, and using horizontal lines to display borders doesn't look good.

     


References

Rose, Lowell C.  and Alec M. Gallup (2002). The 34th Annual Phi Delta Kappa/Gallup Poll
Of the Public's Attitudes Toward the Public Schools http://www.pdkintl.org/kappan/k0209pol.htm

Sommers, Chrstina Hoff (2000). The War Against Boys: How Misguided Feminism is Harming our Young Men (New York: Simon and Schuster, 2000)

also: (Field poll)