Module 2: Summarizing Data Graphically and Numerically

# Putting It Together: Summarizing Data Graphically and Numerically

## Putting It Together: Summarizing Data Graphically and Numerically

##
**Let’s Summarize**

In *Summarizing Data Graphically and Numerically*, we focused on describing the *distribution of a quantitative variable*.

- To analyze the distribution of a quantitative variable, we describe the
*overall pattern of the data*(shape, center, spread) and any*deviations from the pattern*(outliers). We use three types of graphs to analyze the distribution of a quantitative variable: dotplots, histograms, and boxplots. - We described the
*shape*of a distribution as left-skewed, right-skewed, symmetric with a central peak (bell-shaped), or uniform. Not all distributions have a simple shape that fits into one of these categories. - The
*center*of a distribution is a typical value that represents the group. We have two different measurements for determining the center of a distribution: mean and median.- The
*mean*is the average. We calculate the mean by adding the data values and dividing by the number of individual data points. The*mean*is the*fair share*measure. The mean is also called the*balancing point*of a distribution. If we measure the distance between each data point and the mean, the distances are balanced on each side of the mean. - The
*median*is the physical center of the data when we make an ordered list. It has the same number of values above it as below it. **General Guidelines for Choosing a Measure of Center***Always plot the data.*We need to use a graph to determine the shape of the distribution. By looking at the shape, we can determine which measure of center best describes the data.- Use the mean as a measure of center
*only*for distributions that are reasonably symmetric with a central peak. When outliers are present, the mean is not a good choice. - Use the median as a measure of center for all other cases.

- The
- The
*spread*of a distribution is a description of how the data varies. We studied three ways to measure spread:*range*(max – min), the*interquartile range*(Q3 – Q1), and the*standard deviation*. When we use the median, Q1 to Q3 gives a typical range of values associated with the middle 50% of the data. When we use the mean, Mean ± SD gives a typical range of values.- The interquartile range (IQR) measures the variability in the middle half of the data.
- Standard deviation measures roughly the average distance of data from the mean.

*Outliers*are data points that fall outside the overall pattern of the distribution. When using the median and IQR to measure center and spread, we use the 1.5 * IQR interval to identify outliers. Specifically, points outside the interval Q1 – 1.5 * IQR to Q3 + 1.5 * IQR are labeled as outliers.

CC licensed content, Shared previously

- Concepts in Statistics.
**Provided by**: Open Learning Initiative.**Located at**: http://oli.cmu.edu.**License**:*CC BY: Attribution*