Information Technology and Politics Newsletter

  Fall, 2000

 

Software Review: JMP, V. 4.02,
Sas Institute, http://www.JMPdiscovery.com/
Micah Altman
Harvard-MIT Data Center
Harvard University
Micah_Altman@harvard.edu

 JMP is the SAS institute's package for exploratory data analysis.  JMP offers a plethora of features for distribution analysis (including contingency tables, outlier analysis, and nonparametric statistics), and graphs (including contour, profile and ternary charts), time series modeling, regression, {m}anova, PCA, logit analysis, cluster analysis, survival, design of experiments (added in version 4.0), and quality control. In addition, JMP makes it extremely quick and easy to perform "hands-on" data manipulation such as: creating derived and aggregated variables and tables, subsetting, stacking, and joining multiple tables together. And Version 4 adds facilities to import a wider variety of data formats, extract information from databases, and preview text file input. (Unlike SAS, however, the size of JMP tables is limited by the amount of memory available on your machine.)

Exploratory data analysis with JMP is extraordinarily easy and powerful, because of three design decisions that distinguish it from all of its competitors: (1) let the user describe the data, (2) organize the program my task rather than by statistical model, (3) and make everything displayed in the program "live."

Unlike most statistical packages, which will only keep track of the "data type" of each variable. You can tell JMP almost anything you know about a variable, including its data types (integer, floating point, string), its modeling type  (nominal, ordinal, continuous), its role in your model (dependent, independent, weight, frequency, time axis), and range or set constraints that apply to it. You can also attach notes (of arbitrary length, or formulas relating one variable to others. This makes it easy to build documentation into your research, or course, but JMP uses this information intelligently to guide analysis and visualization.

 Functions in JMP are organized by task, not statistical model. The broad task categories are "design of experiments",  "analysis", and "graphing", which are further divided by purpose. For example, the "analysis" category comprises "distribution", "fit Y by X", "matched pairs" "fit model", "nonlinear fit", "multivariate", clusters" ,"survival," and "time series."  When a user chooses an analysis, JMP usually "does the right thing" based upon the roles of the variables selected. For example, running "fit Y by Y" will produce contingency tables if both variables are nominal or ordinal but different analyses if either variable is continuous (see Fig 1). As befitting an EDA package, most analysis and graphing operations are performed quite quickly, although when an operation does take longer than expected one wishes that the designers had included a button to interrupt processing. Also, occasionally the arrangement of models can seem idiosyncratic -- e.g., scatter plots are performed through "Analyze…Distributions" while principle components analyses can be performed either through "Analyze…Multivariate" or  "Graph..Spinning Plots". 

In contrast to such competitors as Stata, SPSS, and Minitab, tables, scripts, formulas, and graphs are very well integrated, and the connections among these are "live." For example if you select data in a summary table, the corresponding entries are selected in the original table, and all graphs based on those tables instantly highlight points or regions to reflect the selection. (One sometimes wishes, however that summary statistics were updateable along with the graphs.)  Graphs can be directly rotated, rescaled, recolored, or otherwise manipulated in a "live" fashion. At a higher level, formulas and scripts are used to further link data and analysis: the user can attach formulas to variables, with the result that these variables are automatically generated from other data, and automatically change when the related data changes. In version 4, scripts can be attached to tables to perform similar automatic linking between tables and analyses. Scripting is made very easy, because the system allows you to run a set of analyses first, and to save the resulting analysis set as  a scripts or in a journal for later use. (See Fig. 2)

 

Figure 2: Tables, Scripts and Analyses are all dynamically linked in JMP

JMP is an excellent tool for hands-on data manipulation, and exploratory data analysis. For several reasons, however, many political scientists, will want to supplement it with another program before reaching the publication stage of analysis.  First, political scientists will notice the absence of discrete choice models, survey analysis, and other discipline specific techniques. Second, although the graphics in JMP are clear, and the ability to manipulate them instantly makes them ideal for data exploration, they still do not allow for the type of precise formatting that is necessary for many publications. Third, and most important, JMP is not accurate enough for some complex maximum likelihood estimations and simulations.

SAS "quality assurance" notwithstanding,  JMP does not seem to be as accurate as S-Plus, Gauss, Stata, or many other competitors. Standard tests are available for testing the accuracy of statistical software (see McCoullogh 1998) but SAS does not provide these results. Nor does the online help or documentation (see Sall 2000) contain information about the algorithms used.

Since, the demo version of JMP imposes limits on data input, I was able to perform only a subset of the tests for numerical accuracy comprising of  the eight NIST tests that require less than twenty-one observations, and the tests of random numbers (which require only generated data). The results were not encouraging: JMP's standard random number generator fails 3 out of 18 (the Overlapping 5 Permutations, Count the Ones-Specific, and Binary Rank for 31x31 matrices) of the "DIEHARD" tests of randomness, and although its period is unspecified seems to be less than 2^31. This argues against its use for any serious simulation.

More important, the module supplied for non-linear regression and maximum likelihood is numerically unreliable.  JMP reported wildly inaccurate results for several of the non-linear tests  (BOXBOD, MGH09, MGH10, MISRA1A), including some that are ranked at the "lowest" difficulty level. Anyone doing non-linear modeling or MLE's would be wise to avoid JMP altogether for this purpose, and should instead use the more reliable facilities in "R", "S-Plus" or Gauss. 

JMP 4.0: Sas Institute

Advantages

Disadvantages

  • Powerful and easy to use data manipulation capabilities

  • Extensive, natural exploratory data analysis capabilities for a wide variety of problems.

  • Seamless connections among data, graphics, scripting, and journaling.
  • Inexpensive and almost-full-functioning student version available.
  • Graphics output is not sufficiently configurable for publication

  • Operations limited to available memory
  • No functions for survey estimation, discrete choice models, and other more specialized methods for political scientists.
  • Numerically inaccurate, especially for non-linear regression and related complex models (e.g. simulation and maximum-likelihood)

References:

McCullough, Brian D., 1998. "Assessing the Reliability of Statistical Software: Part I," The American Statistician 52(4): 358-366

Sall, John, 2000,  JMP Start Statistics Version 4.0 for Windows : SAS Institute Inc.


last modified on
01/11/2002