Wednesday, December 4, 2019

Dynamic Graphics for Data Analysis Essay Example For Students

Dynamic Graphics for Data Analysis Essay Outline1 1. INTRODUCTION1.1 1.1 The Importance of Dynamic Methods  1.2 1.2 Two Early Systoms1.3 1.3 Contents of the Paper2 2. METHOOS2.1 2.1 Identification of Labeled Data 1. INTRODUCTION Dynamic graphical methods have two important properties direct manipulation and instantaneous change. The data analyst takes an action through manual manipulation of an input device and some thing happens, virtually instantaneously, on a com puter graphics screen. Figure 1 shows an example in which a dynamic method is used to turn point labels on and off. The data analyst moves a rectangle over the scatterplot by moving a mouse; the figure shows the rectangle in a sequence of positions. When the rectangle covers a point, its label appears and when the rectangle no longer covers the point, its label disappears. 1.1 The Importance of Dynamic Methods   In the future, dynamic graphical methods will be ubiquitous. There are two reasons One is the addition of dynamic capabilities to the methodology of tradi tional static data display provides an enormous in crease in the power of graphical methods to convey information about data—wholly new methods become possible and many capabilities that are cumbersome and time consuming in a static environment become simple and fast (Tukey and Tukcy, 1985). Huber (1983) aptly describes the importance of the dynamic environment: â€Å"We see more when we interact with the picture especially if it react* instantaneously than when we merely watch. This does not mean that current static methods will be discarded, but rather that there will be a much richer collection of methods. The second reason is that the price and availability of powerful statistical computing environments are rap idly evolving in a direction that will permit the use of dynamic graphics McDonald and Pedersen. 1985 . Thus, it seems likely that the methods described in this paper will be standard methodology in the future. Furthermore, because the number of people that have so far heen involved in research in dynamic methods is relatively small, the development of new dynamic methods should accelerate as the appropriate computing environments liecome more widely available. 1.2 Two Early Systoms A recognition of the potential of direct manipula tion, real-tiine graphics for data analysis goes back as far as the early 1960s when computer graphics systems   by a knob. Points could be deleted by positioning a cursor on them. The system demonstrated that dynamic graphical methods had the potential to be important tools for data analysis. Another early system was PRIM-9 (Fisherkcller, Friedman and Tukey, 1975), a set of dynamic tools for projecting, rotating, isolating, and masking multi dimensional data in up to nine dimensions Rotation was the central operation; this dynamic method allows the data analyst to study three-dimensional data by showing the points rotating on the computer screen. Isolation and masking were features that allowed point deletion in a lasting or in a transient way. PRIM-9 was an influential system; many subsequent systems were modeled after it and during the 1970 dynamic graphics und PRIM operations were nearly synonymous. (In fact, in the rush to impl ement PRIM systems. Fowlkes’ idea* were nearly forgotten.) As the reader will see from the descriptions to follow and their origins, it was not until the early 1980s that significantly different methods would begin appearing; this was stimulated in large measure by new comput ing techniques coming from computer scientists. 1.3 Contents of the Paper A variety of dynamic graphical methods are de scribed and illustrated in Section 2 of this paper. Sections 2.1 to 2.6 cover identification, deletion, link ing brushing, scaling, and rotation. Section 2.7 de scribes in a general way what many of the methods are doing—providing dynamic parameter control and thereby opens the door to a large collection of potential methods. Computing issue are discussed in Section 3 of the paper; hardware and software consid erations tend to be much more tightly bound to the success of dynamic methodologies than is the case for static graphical methods. Section 4 of the paper is a brief summary and discussion. 2. METHOOS 2.1 Identification of Labeled Data Identification has two directions. Suppose we have a collection of elements on a graph (e.g., points) and each element has a name or label. In one direction of identification we select a particular element and then find out what its label is; we will call this labeling. In the other direction, we select a label and then find the location on the graph of the element corresponding to this label. We will call this locating. Identifica- tion (asks, although seemingly mundane, are so all- pervasive that simple ways of performing them arc of enormous help to a data analyst. Labeling Points. Suppose x, and y, for to are measurements of two variables that have labels. Figure 2 shows an example. The data are measure ments of the bruin weight* and body weights of a collection of animal species (rile and Quiring, 1940). Biologists study the relationship between these two variables because the ratio (brain weight) (body weight)3 is a rough measure of intelligence (Gould. 1979; Jenson, 1955). In Figure 2, the data ore graphed on a log scale and the axes are scaled so that Ã'Æ' 2x/3 is a 46 line; thue, 45 lines are contours of constant intelligence under this measure. Each point on the graph has a label: the name of the species. In analyzing bivariate data with labels, we almost always want to know the labels for all or some of the points of the scatterplot. Buffy Media Analysis EssayFigure illustrates one simple use of deletion A scatterplot is made and there is an outlier that causes the remaining points on the graph a graph of the firet. sulwet appears on the screen for, say, 1 see, then it is replaced for 1 sec by a graph of the second subset, and so forth until we get to the last subset. Then the process repeats. Of course, the sub sets are all 6hown on common axes so that the scale of the pictures remains identical as various subsets are shown. Another technique is to show all of the data at all times and have the cycling consist of a highlighting of one subset at each stage. A third tech to be crammed into such a small region that their resolution is ruined; the analyst removes the point by touching it with a cursor, and after the deletion the graph is automatically rescaled and redrawn on the screen. For example, Fowlkes (1971) used this dynamic deletion of outliers for probability plots; after points -were selected for deletio n, the expected order statistics of the reduced sample were recomputed automatically and the graph redrawn. Deletion is actually a very general concept that can enter dynamic graphical methods in many way*. Its basic purpose is to eliminate certain graphical ele ments so that we can better study the remaining elements. For example, the outlier deletion lets us focus more incisively on the remaining data, and in alternagraphics, subsets can be temporarily deleted to allow better study of the remaining subsets. Other applications of deletion will be given in later sections   2.3 Linking Suppose we have n measurement on p variables and that scatterplots of certain pairs of the variables are made A linking method enables us to visually link corresjxmding points on different scatterplots. For example, suppose there are four variables, and and that we graph Ã'Æ' against and against . To link points on the two scatterplots means to see by some visual method that the point on the first plot corresponds to on the other plot. To illustrate this, consider the Anderson (1935) iris data made famous by Fisher (1936). There are 150 mens urements of four variables: sepal length, sepal width, petal length, and petal width. Two scatterplots are shown in Figure 5. The data have been jittered, that is. small amounts of noi »e added, to avoid the overlap of plotted symbols on the graph. F-nch ecattcrplot has two clusters, and we immediately find ourselves want ing to know if there is some correspondence b otwecn the clusters of separate plots. Linking is a concept that has long existed in the development of static display (Chamliers, Cleveland, Kleiner and Tukey, 1983; Diaconis and Friedman, 1980; Tufte, 1983). One method for linking is the M and N plot of Diaconis and Friedman (1990); lines are drawn between corresponding point* on the two scat terplots. Another method is to use a unique plotting symbol for each point (Chambers, Cleveland, Kleiner and Tukey, 1983) on a particular plot and to use the same symbol for corresponding observations on dif ferent plot. A third method is the scattcrplot matrix, all pairwise scatterplots arranged in a rectangular array, which arose, in part, because it provide a certain amount of linking. An example is Figure 6, which shows the iris data. To maximize the resolution of the plotted points, scale information is put inside the panels of the off diagonal of the matrix; the labels are the variable name and the numbers show the ranges of the vari- ables. Consider the cluster to the northwest in the sepal length and width plot of the   panel. Does this cluster correspond to one of the two clusters in the petal length and width scatterplot of panel   By scanning horizontally from the   panel to the   panel and then vertically to the   panel wre can see that the top half or so of the northwest sepal cluster corresponds to the top half or so of the north east petal cluster. By scanning vertically from the   panel to the   panel and then horizontally to the   panel we can see that the left half or so of the northwest sepal cluster corresponds to the bottom half or so of the northeast petal cluster. The union of these two scans shows that most of the northwest sepal cluster corresponds to most of the northeast peta l cluster, it is a good guess that the remaining pieces of the clusters also correspond.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.