Академический Документы
Профессиональный Документы
Культура Документы
→ To summarize a data set numerically, we often use descriptive statistics, also known as summary statistics.
• Three values describe the center, or central tendency, of the data set:
→ The mean is equal to sum of all data points in the set divided by the number of data points:
! !! !! !!! !⋯!!!
𝑥= !!! ! =
!
→ The median is the middle value of the data set: half of the data set’s values lie below the median, and half
lie above the median.
→ The mode is the value that occurs most frequently in the data set. A data set may have multiple modes.
• The range, variance, and standard deviation measure the spread of the data.
→ The standard deviation is equal to the square root of the variance.
→ To compare variation in different data sets, we calculate the coefficient of variation. The coefficient of
variation measures the size of the standard deviation relative to the size of the mean (that is, coefficient of
!"#$%#&% !"#$%&$'(
variation= )
!"#$
→ We can also calculate a conditional mean. A conditional mean is the mean of a subset of the data that includes all
values satisfying a certain condition.
→ A percentile may be another value of interest. For example, 60% of the observations are less than or equal to the
th th
60 percentile. The median is by definition the 50 percentile of a data set.
→ We can quantify the strength of a linear relationship between two variables by calculating the correlation
coefficient.
• The value of the correlation coefficient ranges between -1 and +1.
• A correlation coefficient near zero indicates a weak or nonexistent linear relationship. A correlation coefficient
near zero does not mean there is no relationship between the two variables; it indicates only that any
relationship that does exist is not linear.
→ When one of the variables is time, the relationship is known as a time series. Cross-sectional data provide a
snapshot of data across multiple groups at a given point in time.
EXCEL SUMMARY
Recall the Excel functions and analyses covered in this course and make sure to familiarize yourself with all of the
necessary steps, syntax, and arguments. We have provided some additional information for the more complex
functions listed below. As usual, the arguments shown in square brackets are optional. The functions whose names
include “S” are applied to samples rather than populations.
→ =PERCENTILE.INC(array, k)
th
• Returns the k-th percentile of value in the specified array. For example, if we want to know the 95 percentile
for an array of data, k would be 0.95.
→ =SQRT(number)
→ =CORREL(array 1, array 2)