Constructing Good Tables.
Tabular Display (powerpoint)
Numerical tables generally serve one of two purposes. Just about all of the
numerical data included in this book was originally found in “look up”
tables, databases or spreadsheets compiled by government statistical
agencies or nongovernmental organizations. The general and limited purpose
of these database tabulations is to present all the numerical information
available that might be relevant to a wide variety of data users. For the
most part, the textual discussion accompanying these tabulations serves only
to describe how the data were obtained and to define what the numbers mean.
The analytical tables contained in social science research, however, serve a
different purpose: the presentation of numerical evidence relevant to
support specific conclusions contained in the text. To serve this purpose
much care must be given to the selection of the data and to the design of
Research reports and analyses based on numerical information should
accommodate two different audiences: those who read the text and tend to
ignore the data presented in tables and charts and those who skim the text
and grasp the main ideas from the data presentation. To serve the later
audience, tables should be self-explanatory, conveying the critical ideas
contained in the data without relying on the text to explain what the
numbers mean. When done well, the tables will complement the textual
discussion and the text will provide a general summary of the most important
ideas to be derived from the data – without repeating each and every number,
or many of the numbers, contained in the table.
are three general characteristics of a good tabular display. The table
should present meaningful data. The data should be unambiguous. The table
should convey ideas about the data efficiently.
Whether or not the data are meaningful has to do with how closely the
data relate to the main points that you are trying to make in your analysis
or report. The data and the relationships among the data contained in a
table constitute the premises, or evidence, offered to support a conclusion.
Ideally this should be an important conclusion, an essential part of the
analysis you are making.
Whether or not the information presented in a table is unambiguous
depends largely on the descriptive text contained in the titles, headings,
and notes. The text should clearly and precisely define each number in the
table. The titles, headings and footnotes should convey the general purpose
of the table, explain coding, scaling and definition of the variables, and
define relevant terms or abbreviations
efficient tabular display will allow a reader to quickly discern the
purpose and import of the data and to draw a variety of interesting
conclusions from a large amount of information. How quickly a reader can
digest the information presented, discern the critical relationships among
the data, and draw meaningful conclusions depends on how well the table is
Presenting Meaningful Data
Report the most meaningful data, data that measure something important about
the cases you are analyzing. Knowing just which data are meaningful and
which are not requires an understanding both of the specific subject matter
you are writing about and a good understanding of where the data come from
and how they are collected. When reporting social indicator data, whether
data are meaningful or not depends on the appropriate counts, divisors and
comparisons that will be represented in the table.
The Count. Most social indicators (measures of government spending
being an exception) are based on counts derived from either survey questions
or agency records. In the case of US infant mortality data, for example,
the “counts” of infant deaths are obtained from enumerations of death
certificates (the divisors, from birth certificates). In the case of
unemployment data, the counts of the unemployed are derived from monthly
random sample surveys involving several survey questions concerning the
respondents’ employment status. Some measures of voter turnout are based on
counts of the votes cast in elections; others are based on counts derived
from post election surveys –counts of respondents who claimed they have
voted (for more than one reason, the later counts are higher). Poverty
rates may be based on counts of the number of persons living in poor
families or counts of the number of poor families. FBI crime rate data are
based on crimes reported to local police departments; crime “victimization”
data are based on respondents’ reports of crimes in household surveys.
Interpreting the social indicator data often requires a good understanding
of the actual survey questions and definitions used in determining the
The Divisors. Although tabulations of raw count data are sometimes
useful, statistics based on rates, ratios, and per capita measures
are usually more meaningful than aggregate totals. The conclusions drawn
from a tabulation depend on the numerical comparisons represented in the
table; in most instances, comparing the number of murders in Chicago and New
York to the number in the number in San Antonio and San Diego (as in table
1a) does not support a meaningful conclusion.
Table 1. Murder counts vs. murder rates
For any given social indicator count – such as health care expenditures
shown in table 2 – a variety of social indicators can be constructed using
different numerators and denominators. In table 2, measuring health care
expenditures as a per cent of GDP is a standard method of adjusting the data
for differences in countries’ currencies and the size of the national
economy. Health care expenditures are reported per capita, to adjust for
the size of the countries’ populations; in US dollars, to adjust for
differences in currency, and are weighted by OECD purchasing power parity
index, to adjust for differences in prices.
Table 2: Various measures of health care finance
Common divisors used in the construction of social indicators are population
(e.g. Murders per 100,000 population), gross domestic product (military
expenditures as a percent of GDP) and median family income (university
tuition and fees as a percent of median family income). Other indicators
use divisors tailored for specific counts. Highway fatalities are often
measured per 100 million vehicle miles traveled, but also per 100,000
licensed drivers, per 100,000 vehicles registered and per 100,000
population. Abortion rates measure the number of abortion per 1,000 women.
Abortion ratios measure the number of abortions per 1,000 live births. A
common mistake is to confuse raw counts reported in hundreds or thousands
with rates: The number of highway fatalities in thousands is not the same
thing as the number of fatalities per thousand of population.
The comparisons. The kinds of conclusions that can be drawn from a
table depend on the comparisons and relationships that can be made among the
data. The more meaningful comparisons that a table permits; the more
specific the conclusions can be drawn.
Time. When the data are available, tabulations that allow for
comparisons across time usually say more about what is going on than those
that do not.
Table 2. Two time points are better than one
Unless you are analyzing the before and after effect of something that
happened at a single point in time, a one year difference in data often does
not provide for meaningful comparisons. One-year changes in the city murder
rates are subject to random fluctuations: 3, 5, and 10 year change intervals
provide for a more reliable analysis. An exception to this rule is
budgetary data, where the analysis of annual changes and comparisons of
past, current and forthcoming fiscal years is often crucial for agencies
living on annual appropriations.
When comparing two year’s of data, one should beware of arbitrary selections
of a base year. If the data in table 2 where presented to make a case for
the effectiveness of the Los Angeles police force, one would have to check
to see whether the 1995 data were the result of an unusually high jump in
the city's murder rate.
Particularly with data that show substantial year-to-year variation (for
example, murder rates in relatively small states or cities), calculating
average score over three or five year intervals before and after a
significant policy change (say, before and after the imposition of the death
penalty) will prevent making too much out of random data fluctuations.
When reporting monetary data at more than one point in time, it is often
best to report data adjusted for inflation (i.e., in constant dollars rather
than current dollars) or with a monetary divisor that serves the similar
purpose, such as GDP or median family income Note also that a repeated
divisors serves no purpose: per capita expenditures as a percent of
GDP-per-capita is the same thing as expenditures as a percent of GDP.
Controlling for spurious relationships. Often the most
important tabulations in a research report are those the control for various
factors that might account for, or elaborate, an observed relationship.
Africans Americans are less likely to vote than white Americans, but when
one compares voters with equal age, education and income, the voter turnout
rates are very similar. One school district may have lower math scores
than another, but for students of similar family background it may actually
be doing better.
In table 3, we begin with a base relationship indicating that women’s
earnings were only 70% of men’s. For some, this measure of the disparity in
earnings might be sufficient to support general conclusions about gender
discrimination, but others might insist that age, education and a wide range
of other factors would have to be taken into account. As we further examine
table 3, we see that the relationship is more complex: the earnings gap is
lower for younger women and for less educated women.
Table 3: Controlling for age and education
A full understanding of the relationship between age, education, gender and
earnings would require an even more elaborate breakdown, especially when we
consider that the differences in education between men and women vary
considerable by age. Although women’s earnings in the youngest age group 88%
of men’s earnings, women in that age group are actually more likely to have
earned college degrees than men.
Note that the 70% gender disparity would most certainly have been greater if
the data were not restricted to “full time, year round” workers, as women
are more likely than men to work part time.
Presenting Unambiguous Data
table titles, column headings and footnotes should precisely define what
each data point in the table means. When rates or ratios are reported, both
the numerator and denominator should be clearly defined. Pay particular
attention to whether the statistics are reported in hundreds, thousands or
millions. The amount of detail given to defining the data does depend on
the audience. In a paper written for economists, it would not be necessary
to define terms like GDP (gross domestic product), unemployment rate (the
percentage of the labor force seeking work), or a GINI index (most often, a
measure of the inequality of an income distribution); for other audiences
more detail may need to be provided.
A complete table (or chart) title fully defines the three components of the
social indicator in the table: the Count, the Divisor, and the Comparisons,
as in the following examples:
Public and Private Health Care Expenditures, OECD nations:
(% of GDP)
US Public Health Care Expenditures, Per Capita: 1975-2004
(constant 1999 dollars)
Murder Rates in Wealthy Nations, 1999
(Homicides per 100,000 population)
State Voter Turnout Rates, Presidential Elections: 1992-2004
(Votes cast\voting age population)
Percentages. Percentages can usually be calculated in at least two
different ways and are often a source of confusion. Consider the difference
between the two tables in table 4. In table 4a, we see that 14% of poor
families are headed by a householder under 24 years of age; in 4b, 31% of
families headed by a householder under 24 years old are poor. Although the
table title clearly defines the difference, showing the 100% total and the
"all families" rate helps convey the correct interpretation more quickly to
the reader. In general "composition" or "distribution" percentages (as in
table 4a) are less meaningful than the rate statistics, shown in table 4b.
This is especially true when the categories are arbitrary: in table 4a, the
18 to 24 category has a six year range while others have a ten year range.
Table 4. Distribution percentages versus rates
Change in percentages. Calculating changes in percentages, rates and
ratios is also a source of confusion both in tabular display and in textual
summaries of data. The following is an example of a poorly defined data:
Table 5. Poorly defined data
Consider the ambiguity of these data by trying to answer the following
Does the teenage birth rate measure the
percentage of all babies who were born to teenage mothers or the
percentage of teenage mothers who gave birth?
Would an increase in the teenage birth rate
from 20 to 26.7 be a 6.7% change or a 3.3% change?
In table 2 (above),
change is reported as a net change, but it would have been possible to show
the change as a percentage change. The change in San Diego's murder rate
from 7.9 to 3.5 (murders per 100,000 population) can be reported either as a
4.4 net decline in the murder rate or as a 56% drop in the rate.
A general theme of Andrew Hacker’s 1995 book, Two Nations, a widely
respected analysis of race in American society, is that racism is the
underlining cause of the worsening disparities social and economic affecting
black America. The book counters the claims of conservatives (such as in
Charles Murray’s 1980 book, Losing Ground) that liberal social
policies and the rise in black single parent families are to blame for the
conditions in black America. Throughout the book, Hacker includes a number
of tables, similar to table 6, containing measures of black and white social
conditions and, as a measure of the disparity between the two races, a
“Black Multiple”, in this case measuring the ratio of black to white
out-of-wedlock birth rates.
Table 6: Comparing changes in rates
source: Andrew Hacker (1995), 87.
From these data, Hacker concludes that “even though the number of births to
unwed black women has ascended to an all-time high, white births outside of
marriage have been climbing at an even faster rate” (86). He doesn’t say
it, but the implication is that the rise in single parent families should
not be seen as a black problem, but as a general societal phenomenon.
The problem with Hacker’s conclusion is that it depends on what you mean by
“climbing at a faster rate”. It is true that the 1992 white rate is more
than 10 times higher than the 1950 rate, while the 1992 black rate is only 3
times higher. On the other hand, the white 1992 rate represents a net
increase of only 17% since 1950, while the black 1992 rate is a net increase
of almost 50%. If the black out-of-wedlock birth rate had risen to 100%,
Hacker’s analysis would still conclude that the white rate was climbing
Another way of looking at this data, shown in table 7, is to consider the
in-wedlock birth rate instead of the out-of-wedlock birth rate.
Table 7: The reciprocal of Hacker’s data
Had Hacker used these data, the reciprocal of his own numbers, he would have
had to conclude that black births inside of marriage are falling at an even
faster rate than white births inside marriage. Hacker’s conclusion isn’t
wrong so much as it is incomplete and misleading.
Presenting Data Efficiently.
measure of a tabulations' efficiency is the number of meaningful comparisons
that can readily be drawn from the data presentation. Efficiency is often a
matter of balance: more data allows for more comparisons, but too much data
can obscure meaningful comparison. A properly formatted table, allows the
reader to quickly draw the right conclusion.
Sorting. Sort data by the most meaningful variable. The “look-up”
tables of most reference a sources generally list data for geographic units
(countries, states, or cities) alphabetically. If you are using a table to
make a point, the reader will almost always discern the point more quickly
is the data are sorted on the most meaningful variable. The alphabet is
almost never the most meaningful variable. Note how with the sorted data on
the right hand side of table 8 the reader can immediately figure out with
countries have there youth watching the most TV and the least TV and that
Italy is the median country.
Table 8. Sort data on the most meaningful
To fully appreciate the advantages of sorting, consider table 9.
Table 9. The alphabet is not a meaningful
In the case of tables that present two years of data, such as table 2 (above),
it is best to sort the data on the base year as this allows for a quicker
assessment of which cases have changed the most. In table 10, the countries
are sorted on all three numerical variables in order to highlight the
relative position of the United States.
Table 10: Data sorted on more than one variable
Decimal places and rounding. For most purposes, limit the number of
decimal places to what are needed to display the data to two or three
significant digits. It is usually not necessary to include dollar signs or
percentage signs next to the numbers in a table, although this is sometimes
done for the first number in a column.
Table 11. Decimals and rounding
Howard Wainer, a leading authority on data presentation, insists that there
is no reason to display more than two significant digits in most tabular
displays. He would, therefore, eliminate the decimal points in table 11 and
round off the family income data to 49,000, 53,000 and 30,000 and 29,000.
Presumably, he would have Major League Baseball record the Cubs' winning
percentage as 49 percent rather than the .486 proportion. Hank Aaron's
record of 755 home runs could be rounded to 760.
I think Wainer goes too far. It's true that readers will look at the income
data in table 8 and, in their minds, round off to thousands. And the income
data are based on estimates that make any conclusion based on differences of
less than a hundred dollars practically meaningless. Percentages are
usually fine without decimal points. But there are exceptions. In recent
years, the US poverty rate has ranged from 16 to 14 percent. Reporting
these rates without decimals might obscure many significant changes.
Reporting Major League batting averages with just 2 digits would result in
many ties and fail to distinguish important differences.
Social scientists using correlation and regression analyses commonly display
numbers with too many decimal places -- presumably to add an aura of
scientific precision. They also report far too many statistics in their
tabulations. Again, the purpose seems to be to impress rather than explain
and the effect is to obscure the most important data in the tables. The is
no need for any correlation coefficient, R-Square, or standardized
regression coefficient to be displayed with more than two decimal places.
Defining rows and columns. As a general rule, similar data ought to be
presented in the columns. Mixing data of different types in the same column
is disorienting, as we see in table 12 and in the Welcome to Farmer City
Table 12: Poor placement of cases and
variables in rows and columns
Figure 1 Readers expect columns to add up to
Time: In tables where the time points define the columns, display years
in adjacent columns, from left to right. Where the time points are in a
column, sort so that the most recent year is at the bottom (see table 13).
Time series trend data of more than five time points is generally better
displayed in a time series chart than in a table. Times series charts
convey trends more efficiently than tables, but with some loss of accuracy.
Table 13. Years sorted in the rows or columns
The professional education journal Phi Delta Kappan sponsors an annual poll
of public attitudes concerning the nation's public schools (Rose and Gallup
2002). Every year in numerous tables, their polling report displays data
tables with the years backward, with the most recent year's data in the
first column on the left (table 14). Notice how difficult it is to discern
whether the trend is increasing or decreasing:
Table 14. A backward table
The same principle applies in the case of other ordered categories such as
age groups, years of education (or educational attainment), temperature
ranges, height or weight: the categories representing the largest magnitudes
should generally appear on the right or at the bottom of the table.
Consistency: When a paper or report contains more than one table,
the formatting ought to be consistent across tables: same fonts, same
heading style, and same borders. If the four branches of the armed services
are displayed as they are in table 12, they ought to be sorted in the same
order (despite the sorting rule, above) if the same categories are used in
another table. (Note: to some extent, this rule is not followed in this
report in order to show alternative formats).
Combining tables. While cramming too much data and too many
different kinds of data into a single table should be avoided, you should
also look for opportunities to combine several tables into one.
Table 15. Efficient presentation of survey
Table 15 is derived from Christina Hoff Sommers' War Against Boys
nicely summarizes in a single table what could have been presented in six.
The basic format used here is ideal for presenting crosstabular survey data
when a single variable is crosstabulated against several others. Sommers
uses these data to make two points. The first is that teachers favor girls
over boys. The second more subtle point is conveyed in the title: that the
American Association of University Women who conducted the original survey
(and who sponsored a report arguing that girls are ignored by teachers)
suppressed the release of these data. A less argumentative title for the
table might have been, "Boys and Girls Perceptions of Teachers' Gender
If the Sommers' table were included in an article in a social science
journal it would no doubt have also included measures of statistical
association and levels of significance for each of the 6 crosstabulations.
None of these numbers, however, would add anything to the evidence contained
in the table and would serve only to impair a quick interpretation of what
is going in with the data.
Table 16 illustrates the same principle, but with the demographic variables
defining the rows of table. These segmented tabulations provide for a very
efficient presentation of the data. Additional demographic categories do
not complicate the data display, while adding a variety of interesting
comparisons that would not be as easy to make if one used several tables.
See, for example, that gender is not as strong a determinant of support for
Senator Clinton as is age or race.
Table 16. Presentation of survey data.
SOURCE: California Field Poll, 3/10/2006 http://field.com/fieldpollonline/subscribers/RLS2186.pdf
Highlighting comparisons. The purpose of properly sorting the data,
correctly arranging the rows and columns, combining what could be multiple
tables into one and other efficiency rules is to allow the reader to quickly
grasp the most meaningful comparisons that the data allow.
Table 17: Highlighting the important
Table 17 contains data on income mobility that were originally presented in
two tables in an article by Katharine Bradbury and Jane Katz. Their basic
point is that the poor were somewhat more likely to escape poverty in the
1970s than they are in the 1990s, while the rich are more likely to remain
rich. The crucial evidence is in the diagonals of the tabulations and
putting those data in bold allows the reader to discern the point more
Borders: A common and simple table format is used in most of the
tables on these pages. It includes a thin straight border under the title
and heading cells and under the main body of data. There is usually no need
for vertical borders. Often, the title is in bold. Putting the headings in
bold is advised only if they are very short headings, and not if it is
inconsistent with the format for other tables in the report. The tables
include only horizontal lines; partly this is due to MLA style guidelines
that were originally designed for manuscripts prepared with manual
MLA and APA style guidelines recommend that table titles be italicized (one
of the few recent acknowledgements that manual typewriters are no longer in
use) and aligned to the left with the text underlined and that the table
number be placed (again aligned to the left) above the title. These style
recommendations, however, are for papers that are not in final form, i.e.,
manuscripts that will later be formatted by a publisher.
The MLA style guides also specify that tables (and the text of manuscripts)
be double spaced and that the tables be placed at the end of the manuscript;
this is for the convenience of manuscript proofreaders and not for readers.
Lowell C. and Alec M. Gallup (2002). The 34th Annual Phi Delta Kappa/Gallup
Poll of the Public's Attitudes Toward the Public Schools
Sommers, Chrstina Hoff (2000). The War Against Boys: How Misguided
Feminism is Harming our Young Men (New York: Simon and Schuster, 2000)
Miller, Jane E. 2004. “Creating Effective Tables,” The Chicago Guide to
Writing about Numbers, (University of Chicago Press). Chapter 6.
Cuzzort, R. P. and James S. Vrettos. "Fundamentals: the Art of Tabular
Design," Elementary Forms of Statistical Reasoning, (New York: St.
Martin's Press, 1996), chapter 4.
Wainer, Howard. Improving tabular display: with NAEP tables as example and
inspirations. Journal of Educational and Behavioral Statistics
with Microsoft Excel:
practice exercise with instructions.
Tips on using Excel.
(To center and align data in an
html table, use a courier font and spaces.)
By default, Excel aligns the contents of cells on the
bottom of the cell. Changing the alignment so that the data is
centered vertically is better, particularly with cells that have borders.
use: Format | Cells | alignment *vertical
Avoid fancy fonts: I recommend that
a consistent 10pt font for the table, bold for the titles and headings,
perhaps an 8pt font for the source. For tables that are to be printed
in a paper, do everything in black and white.
For tables that are used in
PowerPoint or overhead presentations, use simpler tables with large bold
fonts. PowerPoint tables should generally have fewer than 24 data
Web page tables: Most of the
tables displayed here were constructed using MS Excel, using
shift-Edit | Copy picture and pasted into a single-cell table in MS
Front Page. Tables 1, 4 and 7 are standard HTML tables. Note the
difficulties with HTML tables:
the font size is set by the
browser that is viewing the page, headings, in particular, may display
differently with different browser and different users.
there is very little control over
the use of borders in an HTML table, and using horizontal lines to display
borders doesn't look good.
, Lowell C.
and Alec M. Gallup
(2002). The 34th Annual Phi Delta Kappa/Gallup Poll
Of the Public's Attitudes Toward the Public Schools
Sommers, Chrstina Hoff
The War Against Boys: How Misguided Feminism is Harming our Young Men
(New York: Simon and Schuster, 2000)
also: (Field poll)