Chapter7DescriptiveStatistics

Chapter 7 Descriptive StatisticsDescriptives is another frequently used SPSS procedure. Descriptive statistics are designed to give you information about the distributions of your variables. Within this broad category are measures of central tendency (Mean, Median, Mode), measures of variability around the mean (Std deviation and Variance), measures of deviation from normality (Skewness and Kurtosis), information concerning the spread of the distribution (Maximum, Minimum, and Range), and information about the stability or sampling error of certain measures, including standard error (S.E.) of the mean (S.E. mean), S.E. of the kurtosis, and S.E. of the skewness (included by default when skewness and kurtosis are requested). Using the Descriptives command, it is possible to access all of these statistics or any subset of them. In this introductory section of the chapter, we begin with a brief description of statistical significance (included in all forms of data analysis) and the normal distribution (because most statistical procedures require normally distributed data). Then each of the statistics identified above is briefly described and illustrated.7.1 Statistical SignificanceAll procedures in the chapters that follow involve testing the significance of the results of each analysis. Although statistical significance is not employed in the present chapter it was thought desirable to cover the concept of statistical significance (and normal distributions in the section that follows) early in the book.Significance is typically designated with words such as “significance,” “statistical significance,” or “probability.” The latter word is the source of the letter that represents significance, the letter “p.” The p value identifies the likelihood that a particular outcome may have occurred by chance. For instance, group A may score an average of 37 on a scale of depression while group B scores 41 on the same scale. If a t test determines that group A differs from group B at a p = .01 level of significance, it may be concluded that there is a 1 in 100 probability that the resulting difference happened by chance, and a 99 in 100 probability that the discrepancy in scores is a reliable finding.Regardless of the type of analysis the p value identifies the likelihood that a particular outcome occurs by chance. A Chi-square analysis identifies whether observed values differ significantly from expected values; a t test or ANOVA identifies whether the mean of one group differs significantly from the mean of another group or groups; correlations and regressions identify whether two or more variables are significantly related to each other. In all instances a significance value will be calculated identifying the likelihood that a particular outcome is or is not reliable. Within the context of research in the social sciences, nothing is ever “proved.” It is demonstrated or supported at a certain level of likelihood or significance. The smaller the p value, the greater the likelihood that the findings are valid.Social scientists have generally accepted that if the p value is less than .05 then the result is considered statistically significant. Thus, when there is less than a 1 in 20 probability that a certain outcome occurred by chance, then that result is considered statistically significant. Another frequently observed convention is that when a significance level falls between .05 and .10, the result is considered marginally significant. When the significance level falls far below .05 (e.g., .001, .0001, etc.) the smaller the value, the greater confidence the researcher has that his or her findings are valid.When one writes up the findings of a particular study, certain statistical information and p values are always included. Whether or not a significant result has occurred is the key focus of most studies that involve statistics.7.2 The Normal DistributionMany naturally occurring phenomena produce distributions of data that approximate a normal distribution. Some examples include the height of adult humans in the world, the weight of collie dogs, the scoring averages of players in the NBA, and the IQs of residents of the United States. In all of these distributions, there are many mid-range values (e.g., 60–70 inches, 22–28 pounds, 9–14 points, 90–110 IQ points) and few extreme values (e.g., 30 inches, 80 pounds, 60 points, 12 IQ points). There are other distributions that approximate normality but deviate in predictable ways. For instance, times of runners in a 10-kilometer race will have few values less than 30 minutes (none less than 26:17), but many values greater than 40 minutes. The majority of values will lie above the mean (average) value. This is called a negatively skewed distribution. Then there is the distribution of ages of persons living in the United States. While there are individuals who are 1 year old and others who are 100 years old, there are far more 1-year-olds, and in general the population has more values below the mean than above the mean. This is called a positively skewed distribution. It is possible for distributions to deviate from normality in other ways, some of which are described in this chapter.A normal distribution is symmetric about the mean or average value. In a normal distribution, 68% of values will lie between plus-or-minus (±) 1 standard deviation (described below) of the mean, 95.5% of values will lie between ± 2 standard deviations of the mean, and 99.7% of values will lie between ± 3 standard deviations of the mean. A normal distribution is illustrated in the figure below.A final example will complete this section. The average (or mean) height of an American adult male is 69 inches (5′ 9″) with a standard deviation of 4 inches. Thus, 68% of American men are between 5′ 5″ and 6′ 1″ (69 ± 4); 95.5% of American men are between 5′ 1″ and 6′ 5″ (69 ± 8), and 99.7% of American men are between 4′ 9″ and 6′9″ (69 ± 12) in height (don’t let the NBA fool you!).7.3 Measures of Central TendencyThe Mean is the average value of the distribution, or, the sum of all values divided by the number of values. The mean of the distribution [3 5 7 5 6 8 9] is:(3 + 5 + 7 + 5 + 6 + 8 + 9)/7 = 6.14The Median is the middle value of the distribution. The median of the distribution [3 5 7 5 6 8 9], is 6, the middle value (when reordered from small to large, 3 5 5 6 7 8 9). If there is an even number of values in a distribution, then there will be two middle values. In that case the average of those two values is the median.The Mode is the most frequently occurring value. The mode of the distribution [3 5 7 5 6 8 9] is 5, because 5 occurs most frequently (twice, all other values occur only once).7.4 Measures of Variability Around the MeanThe Variance is the sum of squared deviations from the mean divided by N − 1. The variance for the distribution [3 5 7 5 6 8 9] (the same numbers used above to illustrate the mean) is:[(3–6.14)2+ (5–6.14)2 + (7–6.14)2 + (5–6.14)2 + (6–6.14)2+ (8–6.14)2 + (9–6.14)2]/6 = 4.1429Variance is used mainly for computational purposes. Standard deviation is the more commonly used measure of variability.The Standard deviation is the positive square root of the variance. For the distribution [3 5 7 5 6 8 9], the standard deviation is the square root of 4.1429, or 2.0354.7.5 Measures of Deviation from NormalityKurtosis is a measure of the “peakedness” or the “flatness” of a distribution. A kurtosis value near zero (0) indicates a shape close to normal. A positive value for the kurtosis indicates a distribution more peaked than normal. A negative kurtosis indicates a shape flatter than normal. An extreme negative kurtosis (e.g., < −5.0) indicates a distribution where more of the values are in the tails of the distribution than around the mean. A kurtosis value between ±1.0 is considered excellent for most psychometric purposes, but a value between ±2.0 is in many cases also acceptable, depending on the particular application. Remember that these values are only guidelines. In other settings different criteria may arise, such as significant deviation from normality (outside ±2 × the standard error). Similar rules apply to skewness.Skewness measures to what extent a distribution of values deviates from symmetry around the mean. A value of zero (0) represents a symmetric or evenly balanced distribution. A positive skewness indicates a greater number of smaller values (sounds backward, but this is correct). A negative skewness indicates a greater number of larger values. As with kurtosis, a skewness value between ±1.0 is considered excellent for most psychometric purposes, but a value between ±2.0 is in many cases also acceptable, depending on your application.7.6 Measures for Size of the DistributionFor the distribution [3 5 7 5 6 8 9], the Maximum value is 9, the Minimum value is 3, and the Range is 9 − 3 = 6. The Sum of the scores is 3 + 5 + 7 + 5 + 6 + 8 + 9 = 43.7.7 Measures of Stability: Standard ErrorSPSS computes the Standard errors for the mean, the kurtosis, and the skewness. Standard error is designed to be a measure of stability or of sampling error. The logic behind standard error is this: If you take a random sample from a population, you can compute the mean, a single number. If you take another sample of the same size from the same population you can again compute the mean—a number likely to be slightly different from the first number. If you collect many such samples, the standard error of the mean is the standard deviation of this sampling distribution of means. A similar logic is behind the computation of standard error for kurtosis or skewness. A small value (what is “small” depends on the nature of your distribution) indicates greater stability or smaller sampling error.The file we use to illustrate the Descriptives command is our example described in the first chapter. The data file is called grades.sav and has an N = 105. This analysis computes descriptive statistics for variables gpa, total, final, and percent.7.8 Step by Step7.8.1 DescriptivesTo access the initial SPSS screen from the Windows display, perform the following sequence of steps:Mac users: To access the initial SPSS screen, successively click the following icons:After clicking the SPSS program icon, Screen 1 appears on the monitor. Step 2 Create and name a data file or edit (if necessary) an already existing file (see Chapter 3). Screens 1 and 2 (displayed on the inside front cover) allow you to access the data file used in conducting the analysis of interest. The following sequence accesses the grades.sav file for further analyses:Whether first entering SPSS or returning from earlier operations the standard menu of commands across the top is required. As long as it is visible you may perform any analyses. It is not necessary for the data window to be visible.After completion of Step 3 a screen with the desired menu bar appears. When you click a command (from the menu bar), a series of options will appear (usually) below the selected command. With each new set of options, click the desired item. The sequence to access Descriptive Statistics begins at any screen with the menu of commands visible:A new screen now appears (below) that allows you to select variables for which you wish to compute descriptives. The procedure involves clicking the desired variable name in the box to the left and then pasting it into the Variable(s) (or “active“) box to the right by clicking the right arrow () in the middle of the screen. If the desired variable is not visible, use the scroll bar arrows () to bring it to view. To deselect a variable (that is, to move it from the Variable(s) box back to the original list), click on the variable in the active box and the in the center will become a . Click on the left arrow to move the variable back. To clear all variables from the active box, click the Reset button.Screen 7.1 The Descriptives WindowThe only check box on the initial screen, Save standardized values as variables, will convert all designated variables (those in the Variable(s) box) to z scores. The original variables will remain, but new variables with a “z” attached to the front will be included in the list of variables. For instance, if you click the Save standardized values as variables option, and the variable final was in the Variable(s) box, it would be listed in two ways: final in the original scale and zfinal for the same variable converted to z scores. You may then do analyses with either the original variable or the variable converted to z scores. Recall that z scores are values that have been mathematically transposed to create a distribution with a mean of zero and a standard deviation of one. See the glossary for a more complete definition. Also note that for non-mouse users, the SPSS people have cleverly underlined the “z” in the word “standardized” as a gentle reminder that standardized scores and z scores are the same thing.To create a table of the default descriptives (mean, standard deviation, maximum, minimum) for the variables gpa and total, perform the following sequence of steps:If you wish to calculate more than the four default statistics, after selecting the desired variables, before clicking the OK, it is necessary to click the Options button (at the bottom of Screen 7.1). Here every descriptive statistic presented earlier in this chapter is included with a couple of exceptions: Median and mode are accessed through the Frequencies command only. See Chapter 6 to determine how to access these values. Also, the standard errors (“S.E.“) of the kurtosis and skewness are not included. This is because when you click either kurtosis or skewness, the standard errors of those values are automatically included. To select the desired descriptive statistics, the procedure is simply to click (so as to leave a in the box to the left of the desired value) the descriptive statistics you wish. This is followed by a click of Continue and OK. The Display order options include (a) Variable list (the default—in the same order as displayed in the data editor), (b) Alphabetic (names of variables ordered alphabetically), (c) Ascending means (ordered from smallest mean value to largest mean value in the output), and (d) Descending means (from largest to smallest).Screen 7.2 The Descriptives: Options WindowTo select the variables final, percent, gpa, and total, and then select all desired descriptive statistics, and perform the following sequence of steps. Press the Reset button if there are undesired variables in the active box.Upon completion of either Step 5 or Step 5a, Screen 7.3 will appear (below). The results of the just-completed analysis are included in the top window labeled Output#[Document#] – IBM SPSS Statistics Viewer. Click on the to the right of this title if you wish the output to fill the entire screen and then make use of the arrows on the scroll bar () to view the results. Even when viewing output, the standard menu of commands is still listed across the top of the window. Further analyses may be conducted without returning to the data screen. Partial output from this analysis is included in the Output section.Screen 7.3 SPSS Output Viewer Window7.9 Printing ResultsResults of the analysis (or analyses) that have just been conducted require a window that displays the standard commands (File Edit Data Transform Analyze …) across the top. A typical print procedure is shown in the following page beginning with the standard output screen (Screen 1, inside back cover).To print results, from the Output screen perform the following sequence of steps:To exit you may begin from any screen that shows the File command at the top.Note: After clicking Exit, there will frequently be small windows that appear asking if you wish to save or change anything. Simply click each appropriate response.7.10 Output7.10.1 Descriptive StatisticsWhat follows is output from sequence Step 5a, page 118. Notice that the statistics requested include the N, the Mean, the Standard Deviation, the Variance, the Skewness, and the Kurtosis. The Standard Errors of the Skewness and Kurtosis are included by default.IBM SPSS Statistics: Descriptive StatisticsFirst observe that in this display the entire output fits neatly onto a single page or is entirely visible on the screen. This is rarely the case. When more extensive output is produced, make use of the up, down, left, and right scroll bar arrows to move to the desired place. You may also use the index in the left window to move to particular output more quickly. Notice that all four variables fall within the “excellent” range as acceptable variables for further analyses; the skewness and kurtosis values all lie between ±1.0. All terms are identified and described in the introductory portion of the chapter. The only undefined word is listwise. This means that any subject that has a missing value for any variable has been deleted from the analysis. Since in the grades.sav file there are no missing values, all 105 subjects are included.ExercisesAnswers to selected exercises may be downloaded at www.spss-step-by-step.net.Notice that data files other than the grades.sav file are being used here. Please refer to the Data Files section starting on page 362 to acquire all necessary information about these files and the meaning of the variables. As a reminder, all data files are downloadable from the web address shown above. Using the grades.sav file select all variables except lastname, firstname, grade, and passfail. Compute descriptive statistics, including mean, standard deviation, kurtosis, and skewness. Edit so that you eliminate Std. Error (Kurtosis) and Std. Error (Skewness) making your chart easier to interpret. Edit the output to fit on one page. Draw a line through any variable for which descriptives are meaningless (either they are categorical or they are known to not be normally distributed). Place an “*” next to variables that are in the ideal range for both skewness and kurtosis. Place an X next to variables that are acceptable but not excellent. Place a ψ next to any variables that are not acceptable for further analysis. Using the divorce.sav file select all variables except the indicators (for spirituality, sp8–sp57, for cognitive coping, cc1–cc11, for behavioral coping, bc1–bc12, for avoidant coping, ac1–ac7, and for physical closeness, pc1–pc10). Compute descriptive statistics, including mean, standard deviation, kurtosis, and skewness. Edit so that you eliminate Std. Error (Kurtosis) and Std. Error (Skewness) and your chart is easier to interpret. Edit the output to fit on two pages. Draw a line through any variable for which descriptives are meaningless (either they are categorical or they are known to not be normally distributed). Place an “*” next to variables that are in the ideal range for both skewness and kurtosis. Place an X next to variables that are acceptable but not excellent. Place a ψ next to any variables that are not acceptable for further analysis. Create a practice data file that contains the following variables and values: Compute: the mean, the standard deviation, and variance and print out on a single page.George, Darren. IBM SPSS Statistics 23 Step by Step, 14th Edition. Routledge, 20160322. VitalBook file.The citation provided is a guideline. Please check each citation for accuracy before use.