Just Plain Data Analysis: Companion Website
Gary Klass
Department of Politics and Government
Illinois State University

Home Charting data Excel tips Interpreting data Finding the data References Chart of the week Study questions Bad Charts Acknwldgmnts Errata Comments
(under construction)
 

Home
Charting data
Excel tips
Interpreting data
Finding the data
References
Chart of the week
Study questions
Bad Charts
Acknwldgmnts
Errata
Comments

 

Related links:

Demtri Martin
DSI Insights
Freakonomics
Kelly O'Day
StatLit.org
Edward Tufte

Just Plain Data Analysis

This is the companion website for the book:

Gary Klass, Just Plain Data Analysis: Finding, Presenting, and Interpreting
Social Science Data
(New York: Rowman and Littlefield Publishers, 2008)
ISBN: 978-0-7425-6053-6

This site contains:

  • links to all of the excel files used for the chart and tables in the book,

  • tips on constructing charts with Microsoft Excel

  • links to the original data sources for the tables and figures in the book

Content of this Website:


What is Just Plain Data Analysis? [excerpt from the Preface]

“Just plain data analysis” is, simply, compiling, evaluating, and presenting numerical evidence to support and illustrate arguments about politics and public affairs.

Just plain data analysis is the most common form of quantitative social science methodology, although the statistical literacy skills and knowledge it entails are often not presented, or presented well, in social science research methods and statistics textbooks. These skills involve finding, presenting, and interpreting numerical information in the form of commonly used social, political, and economic indicators. They are practical skills that students will find they can readily apply in both in their subsequent coursework and in their future careers.

Just plain data analysis differs from what is commonly regarded as quantitative social science methodology in that it usually does not involve formal tests of theories, hypotheses, or null hypotheses. Rather than relying on statistical analysis of a single dataset, just plain data analysis, at its best, involves compiling and evaluating all the relevant evidence from multiple data sources. Where conventional approaches to quantitative social science analysis stress the statistical analysis of data to model and test narrowly defined theories, just plain data analysis stresses presenting and critically evaluating statistical data to support arguments about social and political phenomenon.

Finding, Presenting, and Interpreting the Data

     There are three tasks and skills involved in doing just plain data analysis that traditional research methods courses and textbooks often neglect: finding, presenting, and interpreting numerical evidence.

    Finding the Data

               With the advances in information technology over the past decade, there has been a revolution in the amount and availability of statistical indicators provided by governments and nongovernmental public and private organizations. In addition to the volumes of data provided by the U.S. Census Bureau, many federal departments now have their own statistics agency, such as the National Center for Education Statistics, the Bureau of Justice Statistics, the National Center for Health Statistics, the Bureau of Labor Statistics, and the Bureau of Transportation Statistics, providing convenient online access to comprehensive data collections and statistical reports. In recent years, the greatest growth in the shear quantity of statistical indicators has been in the field of education. The mandated testing under the No Child Left Behind law and the expansion of the Department of Education’s National Assessment of Educational Progress have produced massive databases of measures of the performance of the nation’s schools that, for better or worse, fundamentally transformed the administration of educational institutions.

             There has also been significant growth in the quantity and quality of comparative international data. The Organisation for Economic Co-operation and Development (OECD) now provides a comprehensive range of governmental, social, and economic data for developed nations. For developing nations, the World Bank’s development of poverty indicators and measures of business and economic conditions and the United Nations’ Millennium Development Goals database have contributed greatly to public debate and analysis of national and international policies affecting impoverished people across the world. With the Trends in International Math and Science Study (TIMSS) and the Programme in International Student Assessment (PISA) both having completed multiyear international educational achievement testing, rich databases of educational system conditions and student performance are now easily accessible.

Similar growth has taken place in the availability of social indicator data derived from nongovernmental public opinion surveys that offer consistent times series and cross-national measures of public attitudes and social behaviors. Time series indicators can be readily obtained online from the U.S. National Elections Study and the National Opinion Research Center’s annual General Social Survey and comparative cross-national data indicators can be accessed from Comparative Study of Electoral Systems, the International Social Survey Programme, and World Values Survey.

Finding the best data relevant to the analysis of contemporary social and political issues requires a basic familiarity with the kinds of data likely to be available from these sources. Social science research methods courses often give short shrift to this crucial stage of the research process that involves skills and expertise usually acquired by years of experience in specific fields of study. Too often, the data are a “given”: the instructor gives a dataset to the students and asks them to analyze it. The concluding chapter of this book addresses the topic in some detail, but finding the best data is the subtext for all of the chapters and the examples and illustrations that follow.

    Presenting the Data

        Good data presentation skills are to data-based analysis what good writing is to literature, and some of the same basic principles apply to both. More important, poor graphical and tabular presentations often lead both readers and writers to draw erroneous conclusions from their data and obscure facts that better presentations would reveal. Some of these practices involve deliberate distortions of data, but more commonly they involve either unintentional distortions or simply ineffective approaches to presenting numerical evidence.

The past two decades have seen the development of a substantial literature on the art and science of data presentation, much of it following Edward R. Tufte’s pathbreaking work, The Visual Display of Quantitative Information.[viii] With his admonitions to “show the data,” “minimize the ink-to-data ratio,” and avoid “ChartJunk”, Tufte established many of the basic rules and principles of data presentation and demonstrates over and over again how effective data presentations combined with clear thinking can reveal truths hidden in the data. Howard Wainer’s work extends Tufte’s standards and demonstrates the many errors that have ensued from statistical fallacies and faulty tabular and graphic design.[ix]  Few research methods and statistics texts address these standards of data presentation in more than a cursory manner and many demonstrate some of the worst data presentation practices.

Although the development of spreadsheet and other software has greatly simplified the tasks of tabular and graphical data presentation, it has also greatly facilitated some very bad data presentation practices.

    Interpreting the Data

         Good data analysis entails little more than finding the best data relevant to a given research questions, making meaningful comparisons among the data, and drawing sound conclusions from the comparisons. To evaluate arguments based on numerical evidence, one must assess the reliability and validity of the individual measures used and validity of conclusions drawn from comparisons of the data.

Assessing the reliability and validity of social indicator measurements requires that one understand how the data are collected and how the indicators are constructed. Many research methods and statistics texts address issues of measurement merely as matters of choosing the appropriate level of measurement for variables (nominal, ordinal, or interval) and of calculating sampling error. As a practical matter, such issues are usually irrelevant or trivial when one undertakes just plain data analysis. With just plain data analysis, almost all of the data are interval measures, in the form of ratios, percentages, and means, even if the base question for the indicator is nominal or ordinal. Measures of sampling error usually constitute the least important aspect of measurement reliability. In chapter 1 we will see that the least reliable measures of crime rates, based on the FBI Uniform Crime reports, have far less sampling error (actually no sampling error) than the more reliable measures based on the National Crime Victimization Surveys. The same thing occurs with the measurement of educational achievement discussed in chapter 5: the No Child Left Behind tests of all students are shown to be less reliable than the National Assessment of Education Progress tests based on national samples of students. Although assessing sampling error sometimes has a crucial role in some data analysis, in both academic research and news reporting the emphasis on sampling error often conveys a false sense of the reliability of data and distracts attention from more serious measurement problems.

Why We Should Teach Just Plain Data Analysis

     Often a fear of mathematics, combined with nonsequential curricular requirements, leads students to take a research methods and statistics course only in their last semester of study. In departments that require freshmen to take introductory methods courses, the required course is often the last time in students’ academic careers that they will actually do the quantitative analysis that is taught. It may even be the last time they will have to read research employing the methods that are taught.

        Just plain data analysis involves skills and expertise that students can readily apply to the analysis of evidence presented in their course literature and in conducting their own research for term papers and independent study projects. Moreover, the data analysis and data presentation skills described here have widespread application in a wide range of future careers in both government and the private sector. It is a primary mode of communication in government and is found in the studies, annual reports, and PowerPoint presentations of almost every governmental agency and advocacy group or any career that requires writing clearly and succinctly with numbers. It is not too late to read this text in the last semester of your senior year of college, but it is later than it should have been.

          For departments that offer courses in both quantitative and qualitative methodology, just plain data analysis fills the methodological chasm that divides the social sciences. Those students who will go on to learn and apply the knowledge of the central limit theorem, multiple regression, factor analysis, and other less-plain statistical applications will discover that many of the principles of just plain data analysis will greatly improve the quality of the work.  Those who embrace qualitative analysis out of a bewilderment at the often tortuous mathematical complexities of contemporary quantitative social science may find less madness in the methods presented here. Students in almost every field of study encounter just plain data analysis all the time in the charts and tables presented in their textbooks

In today’s world the exercise of effective citizenship increasingly requires a public competent to evaluate arguments grounded in numerical evidence. As the role of government has expanded to affect almost every aspect of people’s daily lives, the role of statistics in shaping governmental policies has expanded as well. To the extent the public lacks the skills to critically evaluate the statistical analyses that shape public policy, more crucial decisions that affect our daily lives will be made by technocrats who have these statistical skills or by those who would use their mastery of these skills to serve their own partisan or special interest ends.

 

<notes>

[viii]. Edward Tufte, The Visual Display of Quantitative Information (Cheshire, Conn.: Graphics Press, 1993).

[ix]. Howard Wainer, Visual Revelations: Graphical Tales of Fate and Deception from Napoleon Bonaparte to Ross Perot (Mahwah, N.J.: Lawrence Erlbaum, 1997). 


 

 

 

 

 

 

 

05/04/2008

 

Hit Counter