One of the seminal papers establishing the importance of data visualization (as it is now called) was the 1973 paper by F J Anscombe in http://www.sjsu.edu/faculty/gerstman/StatPrimer/anscombe1973.pdf
It has probably the most elegant introduction to an advanced statistical analysis paper that I have ever seen-
1. Usefulness of graphsMost textbooks on statistical methods, and most statistical computer programs, pay too little attention to graphs. Few of us escape being indoctrinated with these notions:
(1) numerical calculations are exact, but graphs are rough;
(2) for any particular kind of statistical data there is just one set of calculations constituting a correct statistical analysis;
(3) performing intricate calculations is virtuous, whereas actually looking at the data is cheating.
A computer should make both calculations and graphs. Both sorts of output should be studied; each will contribute to understanding.
Of course the dataset makes it very very interesting for people who dont like graphical analysis too much.
From http://en.wikipedia.org/wiki/Anscombe%27s_quartet
The x values are the same for the first three datasets.
I | II | III | IV | ||||
---|---|---|---|---|---|---|---|
x | y | x | y | x | y | x | y |
10.0 | 8.04 | 10.0 | 9.14 | 10.0 | 7.46 | 8.0 | 6.58 |
8.0 | 6.95 | 8.0 | 8.14 | 8.0 | 6.77 | 8.0 | 5.76 |
13.0 | 7.58 | 13.0 | 8.74 | 13.0 | 12.74 | 8.0 | 7.71 |
9.0 | 8.81 | 9.0 | 8.77 | 9.0 | 7.11 | 8.0 | 8.84 |
11.0 | 8.33 | 11.0 | 9.26 | 11.0 | 7.81 | 8.0 | 8.47 |
14.0 | 9.96 | 14.0 | 8.10 | 14.0 | 8.84 | 8.0 | 7.04 |
6.0 | 7.24 | 6.0 | 6.13 | 6.0 | 6.08 | 8.0 | 5.25 |
4.0 | 4.26 | 4.0 | 3.10 | 4.0 | 5.39 | 19.0 | 12.50 |
12.0 | 10.84 | 12.0 | 9.13 | 12.0 | 8.15 | 8.0 | 5.56 |
7.0 | 4.82 | 7.0 | 7.26 | 7.0 | 6.42 | 8.0 | 7.91 |
5.0 | 5.68 | 5.0 | 4.74 | 5.0 | 5.73 | 8.0 | 6.89 |
For all four datasets:
Property | Value |
---|---|
Mean of x in each case | 9 exact |
Variance of x in each case | 11 exact |
Mean of y in each case | 7.50 (to 2 decimal places) |
Variance of y in each case | 4.122 or 4.127 (to 3 d.p.) |
Correlation between x and y in each case | 0.816 (to 3 d.p.) |
Linear regression line in each case | y = 3.00 + 0.500x (to 2 d.p. and 3 d.p. resp.) |
SAS Visual Data Discovery combines top-selling SAS products (Base SAS, SAS/STAT® and SAS/GRAPH®), along with two interfaces (SAS® Enterprise Guide® for guided tasks and batch analysis and JMP® software for discovery and exploratory analysis).
One thought on “Graphs in Statistical Analysis”