Univariate Statistics | margo's Blog

1. Types of Scales, or types of codes for variables:

        1. Nominal: categories are exclusive but bear no other relationship to each other. Ex: (1) Jews; (2) Protestants;
(3) Catholics; (4) Muslims. 2P does not equal M.
        2. Ordinal: categories are continuous and in ascending sequence but distance between two categories is not equal. Ex: ordinal scales of wealth. wealthiest, second wealthiest
. . .n wealthiest. The wealthiest does not necessarily have 3 times the wealth of the 3rd wealthiest.
        3. Interval: True numbers are potentially continuous and in ascending sequence and the distance between any two units of value is equal. Ex: family size. 1,2,3,4, etc.; income.1000,2000,3000.
        4. Ratio: interval variable with a zero point.
WARNING: Most interval scales are imperfect and tend to become ordinal or nominal at the extremes. Ex: 1000 income means more at 5000 than at 500,000. Always try for the highest possible scale, but recognize the limitations. Do not use mean X for a nominal scale and be aware of when you are ‘stretching’ a coding scheme.

2. Descriptive Statistics: used to describe the shape of values for a variable. How many, how often, gaps, central tendency, dispersion and so forth.

1. The normal curve (bell shaped curve) is assumed to exist for many statistical problems. You will see why soon. Its key feature is that 68% of the cases fall within plus or minus one standard deviation (s) of the Mean (X) and 95% of the cases fall within + or – 2s of X.
2. Mean (X with a bar on top) – the sum of the values for a variable divided by the number of values (N). A measure of central tendency most accurate for normal curves.
3. Median – the point at which half of values are greater than and half the values are less than the point. A good measure of central tendency for skewed data (such as income) and for ordinal scales. Same goes for quartiles and percentiles.
4. Mode – the value occurring most frequently. A good measure of central tendency for small ordinal and nominal scales. Ex: Mean X for number of children per family might be 2.3 but the more important question in human research is whether the mode is 2 or 3.
5. Standard deviation (s) – the square root of the sum of the squared deviations from the mean divided by the number of cases. This has meaning primarily in terms of the normal curve (see above). Note that if many extreme scores are present, s is artifically large and percentiles may be a more appropriate measure of disperson.
6. Standard error – calculated as the standard deviation divided by the square root of the number of cases. A measure of the likely variance in means for second and subsequent samples. When reporting differences in means of subsamples, avoid making much ado about differences that are less than the standard error.
7. Variance – the square of the standard deviation. Used in more advanced procedures.
8. Kurtosis – the measure of the peakedness of a distribution of values
–i.e., is more or less peaked (greater or lesser central tendency) than the normal curve.
        1. leptokurtosis: more peaked, positive numbers
        2. platykurtosis: less peaked, negative numbers
9. Skewness – more extreme cases on one tail of the curve than the other. Skewness can be expected on the high side for
income or mobility, on the low side for number of children.
        ~ Skewed right: positive numbers (high values)
        ~ Skewed left: negative numbers (low values)
10. Range – the difference between the highest and lowest score
11. Minimum – lowest score
12. Maximum – highest score
13. Coefficient of variation (variability) – standard deviation divided by the mean. Gives the relative dispersion for samples or subsamples with different means.

3. Graphical Displays:

1. Histograms, density curves – display the shape of interval variables and allow you to compare the shape with a normal curve.
2. Stem and Leaf Diagrams – arrays the values of an interval variable vertically with a stem (the first one or more digits of
the value) and a leaf (the last digit of the value).
3. Boxplots – diplays an interval variable in a ‘box.’ with the median at the center of the box and the edges at the 25th and 75th percentiles
(hinges). Can be further specified by the inclusion of the whiskers, outliers and extreme values. More about this later.
4. Bar graphs – similar to a histogram but compares the size of different items. No measurement scale on the X axis.
5. Pie charts: Divides up categories of a universe into their pie slices.