|
Just Plain Data Analysis
This is the companion website
for the book:
Gary Klass,
Just Plain Data Analysis:
Finding, Presenting, and
Interpreting
Social Science Data (New York: Rowman
and Littlefield Publishers, 2008)
ISBN:
978-0-7425-6053-6
This site contains:
-
links to all of the
excel files used for the chart and tables in the book,
-
tips on constructing
charts with Microsoft Excel
-
links to the original data sources for
the tables and figures in the book
Content of this
Website:
What is Just Plain Data Analysis? [excerpt from
the Preface]
“Just plain data analysis” is, simply, compiling,
evaluating, and presenting numerical evidence to support and illustrate
arguments about politics and public affairs.
Just plain data analysis is the most common form of
quantitative social science methodology, although the statistical literacy
skills and knowledge it entails are often not presented, or presented well, in
social science research methods and statistics textbooks. These skills involve
finding, presenting, and interpreting numerical information in the form of
commonly used social, political, and economic indicators. They are practical
skills that students will find they can readily apply in both in their
subsequent coursework and in their future careers.
Just plain data analysis differs from what is
commonly regarded as quantitative social science methodology in that it usually
does not involve formal tests of theories, hypotheses, or null hypotheses.
Rather than relying on statistical analysis of a single dataset, just plain data
analysis, at its best, involves compiling and evaluating all the relevant
evidence from multiple data sources. Where conventional approaches to
quantitative social science analysis stress the statistical analysis of data to
model and test narrowly defined theories, just plain data analysis stresses
presenting and critically evaluating statistical data to support arguments about
social and political phenomenon.
There are
three tasks and skills involved in doing just plain data analysis that
traditional research methods courses and textbooks often neglect: finding,
presenting, and interpreting numerical evidence.
With the advances in information technology over the
past decade, there has been a revolution in the amount and availability of
statistical indicators provided by governments and nongovernmental public and
private organizations. In addition to the volumes of data provided by the U.S.
Census Bureau, many federal departments now have their own statistics agency,
such as the National Center for Education Statistics, the Bureau of Justice
Statistics, the National Center for Health Statistics, the Bureau of Labor
Statistics, and the Bureau of Transportation Statistics, providing convenient
online access to comprehensive data collections and statistical reports. In
recent years, the greatest growth in the shear quantity of statistical
indicators has been in the field of education. The mandated testing under the No
Child Left Behind law and the expansion of the Department of Education’s
National Assessment of Educational Progress have produced massive databases of
measures of the performance of the nation’s schools that, for better or worse,
fundamentally transformed the administration of educational institutions.
There has also been significant growth in the
quantity and quality of comparative international data. The Organisation for
Economic Co-operation and Development (OECD) now provides a comprehensive range
of governmental, social, and economic data for developed nations. For developing
nations, the World Bank’s development of poverty indicators and measures of
business and economic conditions and the United Nations’ Millennium Development
Goals database have contributed greatly to public debate and analysis of
national and international policies affecting impoverished people across the
world. With the Trends in International Math and Science Study (TIMSS) and the
Programme in International Student Assessment (PISA) both having completed
multiyear international educational achievement testing, rich databases of educational system conditions and student
performance are now easily accessible.
Similar growth has taken place in the availability of
social indicator data derived from nongovernmental public opinion surveys that
offer consistent times series and cross-national measures of public attitudes
and social behaviors. Time series indicators can be readily obtained online from
the U.S. National Elections Study and the National Opinion Research Center’s
annual General Social Survey and comparative cross-national data indicators can
be accessed from Comparative Study of Electoral Systems, the International
Social Survey Programme, and World Values Survey.
Finding the best data relevant to the analysis of
contemporary social and political issues requires a basic familiarity with the
kinds of data likely to be available from these sources. Social science research
methods courses often give short shrift to this crucial stage of the research
process that involves skills and expertise usually acquired by years of
experience in specific fields of study. Too often, the data are a “given”: the
instructor gives a dataset to the students and asks them to analyze it. The
concluding chapter of this book addresses the topic in some detail, but finding
the best data is the subtext for all of the chapters and the examples and
illustrations that follow.
Good data presentation skills are to data-based
analysis what good writing is to literature, and some of the same basic
principles apply to both. More important, poor graphical and tabular
presentations often lead both readers and writers to draw erroneous conclusions
from their data and obscure facts that better presentations would reveal. Some
of these practices involve deliberate distortions of data, but more commonly
they involve either unintentional distortions or simply ineffective approaches
to presenting numerical evidence.
The past two decades have seen the development of a
substantial literature on the art and science of data presentation, much of it
following Edward R. Tufte’s pathbreaking work, The Visual Display of
Quantitative Information.[viii] With
his admonitions to “show the data,” “minimize the ink-to-data ratio,” and avoid
“ChartJunk”, Tufte established many of the basic rules and principles of data
presentation and demonstrates over and over again how effective data
presentations combined with clear thinking can reveal truths hidden in the data.
Howard Wainer’s work extends Tufte’s standards and demonstrates the many errors
that have ensued from statistical fallacies and faulty tabular and graphic
design.[ix] Few research methods and
statistics texts address these standards of data presentation in more than a
cursory manner and many demonstrate some of the worst data presentation
practices.
Although the development of spreadsheet and other
software has greatly simplified the tasks of tabular and graphical data
presentation, it has also greatly facilitated some very bad data presentation
practices.
Good data analysis entails little more than finding
the best data relevant to a given research questions, making meaningful
comparisons among the data, and drawing sound conclusions from the comparisons.
To evaluate arguments based on numerical evidence, one must assess the
reliability and validity of the individual measures used and validity of
conclusions drawn from comparisons of the data.
Assessing the reliability and validity of social
indicator measurements requires that one understand how the data are collected
and how the indicators are constructed. Many research methods and statistics
texts address issues of measurement merely as matters of choosing the
appropriate level of measurement for variables (nominal, ordinal, or interval)
and of calculating sampling error. As a practical matter, such issues are
usually irrelevant or trivial when one undertakes just plain data analysis. With
just plain data analysis, almost all of the data are interval measures, in the
form of ratios, percentages, and means, even if the base question for the
indicator is nominal or ordinal. Measures of sampling error usually constitute
the least important aspect of measurement reliability. In chapter 1 we will see
that the least reliable measures of crime rates, based on the FBI Uniform Crime
reports, have far less sampling error (actually no sampling error) than the more
reliable measures based on the National Crime Victimization Surveys. The same
thing occurs with the measurement of educational achievement discussed in
chapter 5: the No Child Left Behind tests of all students are shown to be less
reliable than the National Assessment of Education Progress tests based on
national samples of students. Although assessing sampling error sometimes has a
crucial role in some data analysis, in both academic research and news reporting
the emphasis on sampling error often conveys a false sense of the reliability of
data and distracts attention from more serious measurement problems.
Often a fear of mathematics, combined with
nonsequential curricular requirements, leads students to take a research methods
and statistics course only in their last semester of study. In departments that
require freshmen to take introductory methods courses, the required course is
often the last time in students’ academic careers that they will actually do the
quantitative analysis that is taught. It may even be the last time they will
have to read research employing the methods that are taught.
Just
plain data analysis involves skills and expertise that students can readily
apply to the analysis of evidence presented in their course literature and in
conducting their own research for term papers and independent study projects.
Moreover, the data analysis and data presentation skills described here have
widespread application in a wide range of future careers in both government and
the private sector. It is a primary mode of communication in government and is
found in the studies, annual reports, and PowerPoint presentations of almost
every governmental agency and advocacy group or any career that requires writing
clearly and succinctly with numbers. It is not too late to read this text in the
last semester of your senior year of college, but it is later than it should
have been.
For departments that offer courses in both quantitative and qualitative
methodology, just plain data analysis fills the methodological chasm that
divides the social sciences. Those students who will go on to learn and apply
the knowledge of the central limit theorem, multiple regression, factor
analysis, and other less-plain statistical applications will discover that many
of the principles of just plain data analysis will greatly improve the quality
of the work. Those who embrace qualitative analysis out of a bewilderment at
the often tortuous mathematical complexities of contemporary quantitative social
science may find less madness in the methods presented here. Students in almost
every field of study encounter just plain data analysis all the time in the
charts and tables presented in their textbooks
In today’s world the exercise of effective
citizenship increasingly requires a public competent to evaluate arguments
grounded in numerical evidence. As the role of government has expanded to affect
almost every aspect of people’s daily lives, the role of statistics in shaping
governmental policies has expanded as well. To the extent the public lacks the
skills to critically evaluate the statistical analyses that shape public policy,
more crucial decisions that affect our daily lives will be made by technocrats
who have these statistical skills or by those who would use their mastery of
these skills to serve their own partisan or special interest ends.
[viii].
Edward Tufte, The Visual Display of Quantitative Information
(Cheshire, Conn.: Graphics Press, 1993).
[ix].
Howard Wainer, Visual Revelations: Graphical Tales of Fate and
Deception from Napoleon Bonaparte to Ross Perot (Mahwah, N.J.:
Lawrence Erlbaum, 1997).
|