Writing Using Statistics

Freq - tabulation of observations

Average (Mean) - sum of observations divided by the number ofobservations.

Mode - most frequent observation

Median - point on a scale above which are exactly half the totalobservations and below which are the other half. 50th percentile. For example,with scores of 4, 5, 6, 8, and 25, the average is (4+5+6+8+25)/5 =48/5= 9.6, but the median is 6 (the value in the middle). This tells us thatthere are some values that skew the distribution. Median is often usedto report salaries.

Standard Deviation - The average difference between each observationand the mean. The probability that an observation is within 1 standarddeviation from the mean is 68%. The probability that it is within 2 standarddeviations is 95%. Standard deviation provides us with a measure to determinehow close to the mean most of the observations are. A small standard deviationindicates that most of the observations are near the mean. A large standarddeviation indicates that the observations are more widely dispersed. Inalmost all cases, you want to look at both the mean and the standard deviation.By looking at the mean and standard deviations of two samples, one cancome to a statistical statement as to whether there is a statistical differencebetween the two sample. If the distributions overlap significantly, thenthey are said to show no significant statistical difference. Just knowingthe averages does not give us that information.

Regression Analysis - provides us with a line through a seriesof points in which the difference between the points and the line are minimized.Used to compare two (or more) continuous variables (i.e.. height and weight).The general form of the line is y=mx+b where x and y are the variables(x is called the independent variable and y is called the depended variable)m is the slope of the line and b is where the line crosses the y axis.r is used to measure how well the line fits. r ranges from -1 (perfectnegative correlation) to 1 (perfect positive correlation) 0 indicates nocorrelation r-square gives a measure which tells us the percent of thevariation explained by the line. The higher the value, the better the correlation.

A WORD OF CAUTION: correlation does not mean that the two variablesare related as to cause and effect. You can get a correlation between thestock market and your GPA and it would have no meaning. In summary, whenlooking single valued data, you want to know the average and the standarddeviation. When looking at correlations between two (or more) variables,you want to know what the r or r-squared is. Without this additional measuresit is hard to judge the data.